Kaz Kylheku <773-297-7223@kylheku.com> writes:
On 2020-04-08, stbalbach2@gmail.com <stbalbach2@gmail.com> wrote:
Given a unicode string:
/usr/bin/printf "\u041c\u043e\u0442\u0438\u043d"
Result: "D£D1ÑéD,D½"
Is there a native gawk way other than invoking /usr/bin/printf ?
Awk doesn't have \u escapes.
The following works for me with GNU Awk and Mawk on
Ubuntu 18:
$ awk 'BEGIN { print "D£D1ÑéD,D½" }'
"D£D1ÑéD,D½"
That is to say, the implementations appear to be 8 bit clean in the
handling of string literals, so you can write source code in UTF-8,
embedding the extended characters directly.
The gawk I use (4.2.1) appears to be UTF-8 aware, not just 8-bit clean:
$ awk '/caf[ÄCe]/ {print length($1)}'
cafe
4
cafÄC
4
Sysop: | Coz |
---|---|
Location: | Anoka, MN |
Users: | 2 |
Nodes: | 4 (0 / 4) |
Uptime: | 65:45:39 |
Calls: | 142 |
Files: | 5,063 |
Messages: | 219,304 |