Forum: The Computer Express

Character codes

From Holger Granholm@2:20/228 to Maurice Kinal on Tue Feb 19 09:53:00 2019

In a message on 02-18-19 Maurice Kinal said to Benny Pedersen:

Hi Maurice,

How does this look to you? I am testing a da_DK.utf8 MSG

As you might remember I have built into my editor conversion from-to
various character codes.

januar februar marts
ma ti on to fr lø sø ma ti on to fr lø sø ma ti on to fr lø

When I saw the above I got a bit confused because the character codes
for the lower case '�' didn't agree with the chr.codes I had.

However, after studying character code lists I found that there really
is a lowercase slashed o letter.

I didn't recall that and it is "hidden" among the mathematical symbols at
PC8 code 237 and the slashed capital O is to be found among the graphical
chrs of PC8 as code 216. This is often used as a replacement for zero.

Have a good night,

Holger

.. Virus found, Windows: Remove it (Y/y)
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Tue Feb 19 23:26:32 2019

Hej Holger!

didn't agree with the chr.codes I had

That is because it cannot be mapped out to PC8. latin1 will work and will be mapped out to 0xf8 (dec 248 if you prefer). In both utf8 and latin1, decimal 148 is an extended control code and not whatever PC8 shows it as since PC8 doesn't have any extended control codes and instead has 8 bit characters for that range (128-159). Near as I can figure dec 148 in PC8 would be the "LATIN SMALL LETTER O WITH DIAERESIS" which in latin1 is dec 246 or the ö character in utf8.

and the slashed capital O is to be found among the graphical chrs
of PC8 as code 216. This is often used as a replacement for zero.

I'll have to take your word for that since I have never found a map for PC8. I have seen specualtion that it is the same as CP437. Is it?

Livet är gott,
Maurice

... En Møøse hade en gång min syster ...
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Thu Feb 21 17:05:00 2019

In a message on 02-19-19 Maurice Kinal said to Holger Granholm:

Hej Maurice.

didn't agree with the chr.codes I had

That is because it cannot be mapped out to PC8. latin1 will work
and will be mapped out to 0xf8 (dec 248 if you prefer).

Near as I can figure dec 148 in PC8 would be the "LATIN SMALL LETTER
O WITH DIAERESIS" which in latin1 is dec 246 or the ö character in
utf8.

The expression 'diaeresis' doesn't exist in my vocabulary or dictionary. However, if diaeresis is the same as the 'divide' sign on the numeric
keyboard I agree. That comes out as the Umlaut 'o' in when translated
from Latin 1.

and the slashed capital O is to be found among the graphical chrs
of PC8 as code 216.

In Latin 1 it's represented by chr code D8 or dec.216 which happens to
be the same as in CP 437.

This chr is often used as a replacement for zero.

I'll have to take your word for that since I have never found a map
for PC8. I have seen speculation that it is the same as CP437.
Is it?

In that case I would recommend that you try to aquire the booklet
"IBM OS/2 Warp 4" "Keyboards and Code Pages" published in 1996 by IBM
with the Order No. 29H3183.

It contains a heap of keyboard layouts and Code Pages for Publishing,
EBCDIC, APL2 and of course Text. It's indispensable when figuring out
various keyboards and code pages.

'...' En Møøse hade en gång min syster ...

What is this .................^^ in Latin 1?

Ha en bra dag,

Holger

.. My spelling? Oh, it's just line noise.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Thu Feb 21 22:52:06 2019

Hej Holger!

However, if diaeresis is the same as the 'divide' sign

It is the 'o' character with two dots on top. The 'o' character with the 'divide' sign - I call it the slashed 'o' which hardcore encoding gurus call 'LATIN SMALL LETTER O WITH STROKE' - is decimal 248 in latin1 and doesn't exist in CP437.

In Latin 1 it's represented by chr code D8

That is 'LATIN CAPITAL LETTER O WITH STROKE' and also doesn't exist in CP437.

In Latin 1 it's represented by chr code D8 or dec.216 which
happens to be the same as in CP 437.

No it isn't. According to https://en.wikipedia.org/wiki/Code_page_437 D8 or dec.216 is a line drawing character and in latin1 it is 'LATIN CAPITAL LETTER O WITH STROKE' or character 'Ø' in utf8.

"IBM OS/2 Warp 4" "Keyboards and Code Pages"

I found a pdf online entitled "OS/2 Warp Server for e-business, Keyboards and Codepages" and do not see PC8 listed in there. It does have 'Codepage 437' and 'Codepage 819 - ISO 8859-1' and comparing them shows the same results I have stated above.

'...' En Møøse hade en gång min syster ...

What is this .................^^ in Latin 1?

F8 or dec.248 (not a character in CP437) for the second and third characters in Møøse, and E5 or dec.229 (86 or dec.134 in CP437) for the second character in gång. The encoding gurus call it "LATIN SMALL LETTER A WITH RING ABOVE" which I believe in Swedish is called the small letter angstrom. Please correct me if I am wrong.

Livet är gott,
Maurice

... En Møøse hade en gång min syster ...
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Mark Lewis@1:3634/12.73 to Holger Granholm on Fri Feb 22 11:07:20 2019

On 2019 Feb 21 17:05:00, you wrote to Maurice Kinal:

Near as I can figure dec 148 in PC8 would be the "LATIN SMALL LETTER
O WITH DIAERESIS" which in latin1 is dec 246 or the ö character in
utf8.

The expression 'diaeresis' doesn't exist in my vocabulary or dictionary. However, if diaeresis is the same as the 'divide' sign on the numeric keyboard I agree. That comes out as the Umlaut 'o' in when translated
from Latin 1.

https://www.google.com/search?q="O+WITH+DIAERESIS"

looking at the above, one can see that "diaeresis" is "two dots on top"...

the O or o with the forward slash like the divided-by symbol is its own separate vowel character/letter in Scandianiavian...

diaeresis and umlaut look the same (two dots on top) but they signify different pronounciations...

"The diaeresis and the umlaut are diacritics marking two distinct
phonological phenomena. The diaeresis represents the phenomenon
also known as diaeresis or hiatus in which a vowel letter is
pronounced separately from an adjacent vowel and not as part of a
digraph or diphthong. The umlaut (/'?mla?t/), in contrast, indicates
a sound shift. These two diacritics originated separately; the
diaeresis is considerably older."

in unicode, both are coded the same so something like HTML &#228 is both a-umlaut and a-diaeresis in the same way that the hyphen and minus are represented by the same character glyph...

the above gleaned in about 10 minutes research on the web ;)

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
... Dazed and confused - again!
---
* Origin: (1:3634/12.73)

From Holger Granholm@2:20/228 to Maurice Kinal on Sat Feb 23 12:33:00 2019

In a message on 02-21-19 Maurice Kinal said to Holger Granholm:

Hi Maurice,

However, if diaeresis is the same as the 'divide' sign

OK, the divide sign on the numerical keypad is a dash with dots above
and below the dash.

It is the 'o' character with two dots on top. The 'o' character

OK that's the umlaut 'o' that exists in swedish, finnish and german
languages.

with the 'divide' sign - I call it the slashed 'o' which hardcore
encoding gurus call 'LATIN SMALL LETTER O WITH STROKE' - ....

That's the letter in danish that represents the umlaut 'o' in swedish,
finnish and german.

In Latin 1 it's represented by chr code D8

Yep, that represents the capital umlaut 'O' of swedish, finnish and
german.

That is 'LATIN CAPITAL LETTER O WITH STROKE' and also doesn't exist
in CP437.

In Latin 1 it's represented by chr code D8 or dec.216 which
happens to be the same as in CP 437.

No it isn't. According to
https://en.wikipedia.org/wiki/Code_page_437 D8 or dec.216 is a line
drawing character and in latin1 it is 'LATIN CAPITAL LETTER O WITH
STROKE' or character 'Ø' in utf8.

Right. Thanks for that 'Ø' addition to my UTF conversion table.

"IBM OS/2 Warp 4" "Keyboards and Code Pages"

I found a pdf online entitled "OS/2 Warp Server for e-business,
Keyboards and Codepages" and do not see PC8 listed in there.

In my vocabulary PC8 is what is called ASCII 2 or extended ASCII and in
IBM's code pages 850. This CP 850 is also called 'Multilingual'.

It does have 'Codepage 437' and 'Codepage 819 - ISO 8859-1' and
comparing them shows the same results I have stated above.

'...' En Møøse hade en gång min syster ...

What is this .................^^ in Latin 1?

F8 or dec.248 (not a character in CP437). Yes it is and represents

the degree sign in code pages 437, 850 and in 819 as B0 dec.176.

When I want to insert the degree sign in a Windows DOC I use ALT+0176.
However, I haven't found that sign in Messenger.

..... the second and third characters in Møøse,
and E5 or dec.229 (86 or dec.134 in CP437) for the second character
in gång.
"LATIN SMALL LETTER A WITH RING ABOVE" which I believe in Swedish is
called the small letter angstrom. Please correct me if I am wrong.

Correct, but so far I can't recall having seen that letter in a danish
text, but I may be wrong. Let's hear what Benny says <BG>.

Have a good night,

Holger

.. FIRST listen to the missionary. THEN eat him.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Holger Granholm@2:20/228 to Mark Lewis on Sat Feb 23 12:33:00 2019

In a message on 02-22-19 mark lewis said to Holger Granholm:

Hi Mark,

The expression 'diaeresis' doesn't exist in my vocabulary or dictionary.

However, if diaeresis is the same as the 'divide' sign on the numeric keyboard I agree. That comes out as the Umlaut 'o' in when translated
from Latin 1.

https://www.google.com/search?q="O+WITH+DIAERESIS"

looking at the above, one can see that "diaeresis" is "two dots on
top"...

Yes.

the O or o with the forward slash like the divided-by symbol is its
own separate vowel character/letter in Scandianiavian...

Only in danish and norwegian. In other european countries the
'O' with two dots on top is used.

diaeresis and umlaut look the same (two dots on top) but they
signify different pronounciations...

"The diaeresis and the umlaut are diacritics marking two distinct
phonological phenomena. The diaeresis represents the phenomenon
also known as diaeresis or hiatus in which a vowel letter is
pronounced separately from an adjacent vowel and not as part of a
digraph or diphthong. The umlaut (/'?mla?t/), in contrast,
indicates a sound shift. These two diacritics originated separately;
the diaeresis is considerably older."

in unicode, both are coded the same so something like HTML &#228 is
both a-umlaut and a-diaeresis in the same way that the hyphen and
minus are represented by the same character glyph...

OK and thanks for the explanation.

Have a nice day,

Holger

.. A mainframe: The biggest PC peripheral available.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Sat Feb 23 22:09:48 2019

Hej Holger!

OK, the divide sign on the numerical keypad is a dash with dots
above and below the dash.

On my keyboard it is the '/' character but I've seen some keyboards that use that divide sign. I see it as F6 or dec.246 in CP850. That translates to the '÷' character in utf8 - usually written as U+00F7 or \u00f7 in bash.

-={ '<Esc>:read !echo -e "\u00f7"' starts }=-
÷
-={ '<Esc>:read !echo -e "\u00f7"' ends }=-

Gotta love bash ... and vim in this case but the same will happen on a bash commandline without vim by just typing; echo -e "\u00f7"

Thanks for that 'Ø' addition to my UTF conversion table.

You're welcome.

"LATIN SMALL LETTER A WITH RING ABOVE" which I believe in Swedish is called the small letter angstrom. Please correct me if I am wrong.

Correct, but so far I can't recall having seen that letter in a
danish text, but I may be wrong. Let's hear what Benny says <BG>.

It isn't a character in Dansk. If you look at the kludges in my reply to you it is Swedish (sv_SE.utf8) whereas in my replies to Benny are in Dansk (da_DK.utf8). Also the taglines are different other than the 'Møøse' part which is neither Swedish or Dansk. It is a bogus word for moose which requires the Norwegian slashed small 'o' characters to enhance the taglines. That will always be the same no matter what language. For example in German it would be "Ein Møøse hat meine Schwester einmal gebissen ..." while in Ukrainian it would be "А Møøse колись кусав мою сестру ...". So the samll angstrom is in the tagline below simply because I am replying to you and replies to you from the raspi3b+ will contain Swedish characters while to Benny they will be Danish characters. It is the way I configured it ... for now.

Livet är gott,
Maurice

... En Møøse hade en gång min syster ...
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Mark Lewis@1:3634/12.73 to Holger Granholm on Sun Feb 24 16:55:28 2019

On 2019 Feb 23 12:33:00, you wrote to Maurice Kinal:

However, if diaeresis is the same as the 'divide' sign

OK, the divide sign on the numerical keypad is a dash with dots above
and below the dash.

not on my keyboard... it is the "/" character...

)\/(ark

Always Mount a Scratch Monkey
Do you manage your own servers? If you are not running an IDS/IPS yer doin' it wrong...
... Sugar is 10 times more addictive than cocaine.
---
* Origin: (1:3634/12.73)

From Holger Granholm@2:20/228 to Mark Lewis on Mon Feb 25 16:15:00 2019

In a message on 02-24-19 mark lewis said to Holger Granholm:

On 2019 Feb 23 12:33:00, you wrote to Maurice Kinal:

OK, the divide sign on the numerical keypad is a dash with dots above
and below the dash.

not on my keyboard... it is the "/" character...

OK, see my reply to Maurice.

Have a good evening,

Holger

.. OS/2 ... Opens up Windows, shuts up Gates.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Holger Granholm@2:20/228 to Maurice Kinal on Mon Feb 25 16:15:00 2019

In a message on 02-23-19 Maurice Kinal said to Holger Granholm:

Hej Maurice,

OK, the divide sign on the numerical keypad is a dash with dots
above and below the dash.

On my keyboard it is the '/' character but I've seen some keyboards
that use that divide sign. I see it as F6 or dec.246 in CP850.

Correct on all my full size kbrds but on the separate numerical keypad
for my laptop it shows up as '/'.

That translates to the '÷' character in utf8 - usually written as
U+00F7 or \u00f7 in bash.

This is also a new sign for my UTF conversion whatever use I may have.

other than the 'Møøse' part which is neither Swedish or Dansk. It
is a bogus word for moose which requires the Norwegian slashed small
'o' characters to enhance the taglines. That will always be the
same no matter what language. For example in German it would be
"Ein Møøse hat meine Schwester einmal gebissen ..."

If I translate that german line to swedish, norwegian or danish it would
become "En �lg har en g�ng bitit min syster" or "Min syster har en g�ng
blivit biten av en �lg". There goes the 'Moose'!

However, the letter ä would be dec.145 if moose is �lg in danish.

That small angstrom exists in all scandinavian keyboards as noted in the
IBM keyboard manual but not in the german kbrd.

So the samll angstrom is in the tagline below simply because I am
replying to you .....

In german a moose becomes 'Elch'. That's the only language where verbs
and substantives are written with a capital first letter.

God natt min v�n,

Holger

.. Prayers are always answered. The answer is usually no!
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Mon Feb 25 22:36:56 2019

Hey Holger!

Note that I am replying on a different machine this time since I am in the middle of a major overhaul on the raspi3b+ which will take at least another 32 hours. The Swedish, Danish, and Spanish MøøSGing configurations are on that machine. This machine is the English jobber. I also have a Dutch jobber on a totally different machine - also a x86_64-pc-linux-gnu host - but I decided to reply on this one instead.

That translates to the '÷' character in utf8 - usually written
as U+00F7 or \u00f7 in bash.

This is also a new sign for my UTF conversion whatever use I may
have.

An excellent online source for utf8 characters is http://www.utf8-chartable.de/ as they give the 8 bit codes that show up in hex editors to the corresponding utf8 characters. For example U+00F7 will show up as a hex 'c3 b7' pair whereas the small slashed 'o' characters in Møøse show up as 'c3 b8' hex pairs. So in bash speak this should produce the utf8 division sign;

-={ '<Esc>:read !echo -e "\u00f7 \xc3\xb7"' starts }=-
÷ ÷
-={ '<Esc>:read !echo -e "\u00f7 \xc3\xb7"' ends }=-

Imagine that!!! It works!!!

Offhand I am thinking it might make the basis for converting utf8 to whatever 8 bit codepage you are using by just replacing 'c3 b8' combinations with a single hex character that represents the divide sign in your codepage of choice. So for CP850 a conversion vector would look something like this; 's/\xc3\xb7/\xf6/g' which will replace ALL U+00F7 characters with a single f6 character. For the Møøse that would be 's/\xc3\xb8/\x9b/g' and then everyone will be happy including the Germans. ;-)

If I translate that german line to swedish, norwegian or danish

Yes that can definetly screw things up. I've done it from english to whatever and then from whatever back to english and gotten strange results. Let's try it here;

-={ '<Esc>:read !trans -b -no-ansi -s english -t swedish "A Møøse once bit my sister ..."' starts }=-
En möse hade en gång min syster ...
-={ '<Esc>:read !trans -b -no-ansi -s english -t swedish "A Møøse once bit my sister ..."' ends }=-

We're already off to a bad start but let's leave the möse alone for now;

-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "En möse hade en gång min syster ..."' starts }=-
A muzzle once had my sister ...
-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "En möse hade en gång min syster ..."' ends }=-

That didn't work out so well but what about this?;

-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "En Møøse hade en gång min syster ..."' starts }=-
A mousse once had my sister ...
-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "En Møøse hade en gång min syster ..."' ends }=-

Not much better is it? Oh well ... despite this I am sticking with the Møøse. If you have a better Swedish translation I would like to see it but the Møøse stays, no matter what my sister or anyone else has to say about it.

Life is good,
Maurice

... A Møøse once bit my sister ...
--- GNU bash, version 5.0.2(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Wed Feb 27 09:42:00 2019

In a message on 02-25-19 Maurice Kinal said to Holger Granholm:

Hi Maurice,

Note that I am replying on a different machine this time since I am
in the middle of a major overhaul on the raspi3b+ which will take at
least another 32 hours.

That translates to the '÷' character in utf8 - usually written
as U+00F7

That character pair I see as C3B7.

An excellent online source for utf8 characters is http://www.utf8-chartable.de/

Thanks for that URL.

up in hex editors to the corresponding utf8 characters.
For example U+00F7 will show up as a hex 'c3 b7' pair

OK, Now I agree, however I don't see the use of U+00F7.

whereas the small slashed 'o' characters in Møøse show up as
'c3 b8' hex pairs.

-={ '<Esc>:read !echo -e "\u00f7 \xc3\xb7"' starts }=-

÷ ÷

-={ '<Esc>:read !echo -e "\u00f7 \xc3\xb7"' ends }=-

Imagine that!!! It works!!!

Don't tell more, I only get confused <BG>.

If I translate that german line to swedish, norwegian or danish

-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "En Møøse

If you have a better Swedish translation I would like to see it but
the Møøse stays, no matter what my sister or anyone

Well I can't give you a better translation than the ones I gave.
They are solely based on your german tag line.

Also, I think that tagline of yours has become shorter than the
'original'.

Have a good night,

Holger

.. I used to have a life, but I liked mail-reading so much better.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Wed Feb 27 22:14:06 2019

Hallo Holger!

That character pair I see as C3B7.

As it should be in a non-utf8 enviroment. Also in most European languages, the first byte of the pair will be C3. If you see the first byte being CE then you're likely dealing with Greek. With Russian and other Cyrillic based languages, D0 and/or D1 will be the first byte depending on the character. It is a handy way to narrow down the language you're dealing with.

Thanks for that URL.

It is the best site I have found for all things utf8. If you require graphical output then http://www.unicode.org/charts/ provides pdf's of all fonts that matter and then some.

Het leven is goed,
Maurice

... Een Møøse beet ooit in mijn zus ...
--- GNU bash, version 5.0.2(1)-release (x86_64-pc-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Fri Mar 1 17:23:56 2019

Hej Holger!

Well I can't give you a better translation than the ones I gave.
They are solely based on your german tag line.

Okay we're back to the Swedish raspi3b+ version of my replying thingy and have switched the tagline back to "Don't cry for me I have vi" as it contains both angstrom (c3 a5 = U+00E5) the double dotted 'o' character (c3 b6 - U+00F6). No Møøse were harmed in the making of it ... not to mention my sister.

In German I show;

-={ '<Esc>:read !trans -b -no-ansi -s english -t german "Don't cry for me I have vi."' starts }=-
Weine nicht um mich, ich habe vi.
-={ '<Esc>:read !trans -b -no-ansi -s english -t german "Don't cry for me I have vi."' ends }=-

Looks good so far except no multibyte characters. Everything is ascii. Now the reverse;

-={ '<Esc>:read !trans -b -no-ansi -s german -t swedish "Weine nicht um mich, ich habe vi."' starts }=-
Gråt inte för mig, jag har vi.
-={ '<Esc>:read !trans -b -no-ansi -s german -t swedish "Weine nicht um mich, ich habe vi."' ends }=-

Other than the addition of the comma it looks perfect from this angle. Now let's bring it back to English;

-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "Gråt inte för mig, jag har vi."' starts }=-
Don't cry for me, I have we.
-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "Gråt inte för mig, jag har vi."' ends }=-

Not too bad other than the 'we' which should still be 'vi' since it is the name of a program that is the same in all three languages. Going straight from English to Swedish produced the tagline shown below and 'vi' remains intact. Mind you the same happens in English to German but adds a comma. Without knowing for sure the tagline looks to be the most correct translation. What do you think?

Livet är gott,
Maurice

... Gråt inte för mig jag har vi.
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Fri Mar 1 09:30:00 2019

In a message on 02-27-19 Maurice Kinal said to Holger Granholm:

Good evening Maurice,

That character pair I see as C3B7.

As it should be in a non-utf8 enviroment. Also in most European
languages, the first byte of the pair will be C3.

Yes, that is true for letters but for various other characters the first
byte is DA.

Have a good night,

Holger

.. I quit school because it was interfering with my education.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Fri Mar 1 22:52:42 2019

Hej Holger!

Yes, that is true for letters but for various other characters
the first byte is DA.

The only characters that are prefixed (start) with DA, range from U+0680 (DA80) to U+06BF (DABF) and are all Arabic characters (letters).

-={ '<Esc>:read !echo -e "\u0680 is the same as \xda\x80"' starts }=-
ڀ is the same as ڀ
-={ '<Esc>:read !echo -e "\u0680 is the same as \xda\x80"' ends }=-

The encoding gurus call it 'ARABIC LETTER BEHEH'.

Livet är gott,
Maurice

... Gråt inte för mig jag har vi.
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Sun Mar 3 12:03:00 2019

In a message on 03-01-19 Maurice Kinal said to Holger Granholm:

Hi Maurice,

Yes, that is true for letters but for various other characters
the first byte is DA.

The only characters that are prefixed (start) with DA, range from
U+0680 (DA80) to U+06BF (DABF) and are all Arabic characters
(letters).

I was probably confused when interpreting dec/hex conversion.

I was looking for the hyphen and citation characters and the euro sign.

Even though sitting with a hex/dec conversion table in front of me, I
can't get my brain to understand today.

With decimal interpretation I get the citation mark as 218 128 157.

... Gråt inte för mig jag har vi.

Nej, jag gr�ter inte och jag saknar inte vi <BG>

Have a good night,

Holger

.. My God lets me eat what I want.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Sun Mar 3 23:06:38 2019

Hej Holger!

I was looking for the hyphen and citation characters and the
euro sign.

The unicode for it is "U+20AC" which is a 24 bit character and thus will show up as three hex characters in a hex editor; e2 82 ac

-={ '<Esc>:read !echo -e "\u20ac is the same as \xe2\x82\xac"' starts }=-
€ is the same as €
-={ '<Esc>:read !echo -e "\u20ac is the same as \xe2\x82\xac"' ends }=-

Does that help? I am not sure which 8 bit encoding has the euro sign other than latin9 and there it is a4 which is dec 164. It doesn't exist in either cp437 or cp850. Both the MS encodings cp1250 and cp1252 show it as dec 128.

With decimal interpretation I get the citation mark as 218 128
157.

Converting e2 82 ac to decimal gives me 226 130 172.

Livet är gott,
Maurice

... Gråt inte för mig jag har vi.
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Tue Mar 5 09:00:00 2019

In a message on 03-03-19 Maurice Kinal said to Holger Granholm:

God afton Maurice,

I was looking for the hyphen and citation characters and the
euro sign.

The unicode for it is "U+20AC" which is a 24 bit character and thus
will show up as three hex characters in a hex editor; e2 82 ac

OK, the code 218 128 162 that i interpreted as hyphen actually is the
longer 'dash'.

Thanks for the correction.

Does that help? I am not sure which 8 bit encoding has the euro
sign other than latin9 and there it is a4 which is dec 164.

Yep it did, thanks. The IBM kbd and CP book of 1996 doesn't help but in
the OS/2 FP15 the euro symbol is included as kbd 'right Alt and 5'.

It doesn't exist in either cp437 or cp850. Both the MS encodings
cp1250 and cp1252 show it as dec 128.

Simple, the Euro didn't exist then and dec 128 is a close interpretation

With decimal interpretation I get the citation mark as 218 128
157.

Yes, I did have that correct.

Converting e2 82 ac to decimal gives me 226 130 172.

Yep that's the normal hyphen.

God natt min v�n,

Holger

.. I know the answer, as long as you ask the right question.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Tue Mar 5 22:25:24 2019

Hola Holger!

OK, the code 218 128 162 that i interpreted as hyphen actually
is the longer 'dash'.

I am not sure what you mean but using 218 (DA) as the leading byte means you are restricted to a 2 byte or 16 bit character and not a 24 bit character that is required for euro sign in utf8. The way the leading byte works is like this;

dec 218 = bin 11011010
^
The first zero shows that there are two leading ones which means there is only one trailing byte following. So that means either 218 128 and 162 is ignored. A 24 bit character *must* be prefixed by at least 11100000 which is dec 224 or E0. For the utf8 euro character the prefix is;

dec 226 = bin 11100010
^
and as you can see the first zero yields three leading ones which is three bytes or 24 bits.

For the record 218 128 is U+0680 which we already know to be a 16 bit Arabic character. Also for the record is that all trailing byte(s) must be in the range of 80 - BF or dec 128 to dec 191 which both of your posted trailing bytes are despite the leading byte could only use one.

God natt min vän

Thank you. Buenas noches mi amigo. :-)

La vida es buena,
Maurice

... Un Møøse una vez mordió a mi hermana ...
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Sun Mar 10 16:15:00 2019

In a message on 03-05-19 Maurice Kinal said to Holger Granholm:

Hello Maurice,

Excuse the delay. I was in Stockholm, Sweden for the Boat Show.

OK, the code 218 128 162 that i interpreted as hyphen actually
is the longer 'dash'.

I am not sure what you mean but using 218 (DA) as the leading byte
means you are restricted to a 2 byte or 16 bit character and not a
24 bit character that is required for euro sign in utf8. The way
the leading byte works is like this;

I understand, but this is how the UTF codes are represented in PC8,
= 8bit ASCII, and I have come to the conclusion that I will, at least
try, to use only the two following bytes in the translation table.

That may be all that is needed but if not, I can always include the
leading byte. Kind of cut and try <BG>.

The first zero shows that there are two leading ones which means
there is only one trailing byte following.

So that means either 218 128 and 162 is ignored.

For the utf8 euro character the prefix is;

dec 226 = bin 11100010
^

According to my interpretation of how the chracter is presented in PC8
it's as 218 130 172. All normal umlaut characters are presented with
only two bytes, like 195 165 for the small angstrom character that is
included in your "Moose" tagline.

and as you can see the first zero yields three leading ones which is
three bytes or 24 bits.

I don't need more than 16-bit characters for that editor.
UTF characters ARE presented with two bytes in it.

For the record 218 128 is U+0680 which we already know to be a 16
bit Arabic character.

That third byte (first 218 or 226) comes only as a prefix for other
characters.

Thank you. Buenas noches mi amigo. :-)

Gracias mi amigo.

Have a good night,

Holger

.. Computers always win because they have inside information ;o)
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Holger Granholm@2:20/228 to Maurice Kinal on Sun Mar 10 16:51:00 2019

In a message on 03-05-19 Maurice Kinal said to Holger Granholm:

Hi Maurice,

Right after I had posted the previous reply I edited a bulletin that
included a letter that had to be translated from UTF to CP437 = PC8.

I knew that this letter should be the capital angstrom so I entered that 'unknown' chr as 197 to be translated to �. Worked perfectly.

God natt min vän

On the above line I knew of course that that coded character should be
'a' with two dots on to. To convert it I entered 195 184 to translate it

Guten abend mein freund,

Holger

.. Anything that isn't nailed down is a cat toy.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Mon Mar 11 09:17:36 2019

Hej Holger!

That may be all that is needed but if not, I can always include
the leading byte. Kind of cut and try <BG>.

How about something like this instead;

hex dec UTF8 hex dec
86 | 134 = 00E5 | C3 A5 | 195 165

The above matches the small angstrom. We could drop the UTF8 codes but I for one find them handy. Also I am using IBM437 for PC-8 and near as I can tell they match perfectly but I'll let you be the judge. As far as 24 bit characters those are mostly symbols and line drawing characters from what I see, and it looks like all the text characters are 16 bit and the leading byte is C3 (195). For the degree symbol found at the end of temperatures I get a 16 bit character except with a C2 (194) as the leading byte;

F8 | 248 = 00B0 | C2 B0 | 194 176

I'll wait until hearing back from you before taking this any further.

I don't need more than 16-bit characters for that editor.

Other than the occasional Euro sign I suspect so.

Livet är gott,
Maurice

... Gråt inte för mig jag har vi.
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Maurice Kinal@2:280/464.113 to Holger Granholm on Mon Mar 11 10:04:04 2019

Hej Holger!

I knew that this letter should be the capital angstrom

8F | 143 = 00C5 | C3 85 | 195 133

Does the above match?

'a' with two dots on to. To convert it I entered 195 184 to
translate it

Looks good from this angle except that I get 195 164 as shown below.

84 | 132 = 00E4 | C3 A4 | 195 164

195 184 (C3 84) is the capital A with two dots on top according to the below;

-={ '<Esc>:read !echo -e "\xc3\x84"' starts }=-
Ä
-={ '<Esc>:read !echo -e "\xc3\x84"' ends }=-

8E | 142 = 00C4 | C3 84 | 195 132

Could you please verify the above information?

Livet är gott,
Maurice

... Gråt inte för mig jag har vi.
--- GNU bash, version 5.0.2(1)-release (aarch64-raspi3b+-linux-gnu)
* Origin: Little Mikey's EuroPoint - Ladysmith BC, Canada (2:280/464.113)

From Holger Granholm@2:20/228 to Maurice Kinal on Tue Mar 12 20:52:00 2019

In a message on 03-11-19 Maurice Kinal said to Holger Granholm:

God morgon Maurice

How about something like this instead;

hex dec UTF8 hex dec

86 | 134 = 00E5 | C3 A5 | 195 165

The above matches the small angstrom.

The small angstrom is presented as dec 195 165 on the screen.
That's why I think those two character converted to PC8 fills the need.

Also I am using IBM437 for PC-8 and near as I can tell they match
perfectly but I'll let you be the judge.

Yes they do.

As far as 24 bit characters those are mostly symbols and line drawing characters from what I see, and it looks like all the text characters
are 16 bit and the leading byte is C3 (195).

Yes, 195 is but 165 is the spanish N with a wave on top for example.
The same goes for some other characters but as you say, 195 prefixes
most of normal characters while 218 and 226 prefix other symbols.

For the degree symbol found at the end of temperatures I get a 16 bit character except with a C2 (194) as the leading byte;

F8 | 248 = 00B0 | C2 B0 | 194 176

I'll have still to check that but thanks for the tip. We don't have that
symbol on the keyboards but in Windows
I hold the Alt while pressing 0176 on the numerical pad.

I don't need more than 16-bit characters for that editor.

Other than the occasional Euro sign I suspect so.

On this machine I have an IBM Warp4 that doesn't support the euro sign
but the other machine with OS/2 FP15 does.

Have a nice day,

Holger

.. Smokers are also humans .....though not for as long.
-- MR/2 2.30

--- PCBoard (R) v15.22 (OS/2) 2
* Origin: Coming to you from the Sunny Aland Islands. (2:20/228)

Who's Online
Recent Visitors
- Guest
  Thu Mar 19 01:49:18 2026
  from Afganistan, Miami via Telnet
- Guest
  Wed Mar 18 09:58:06 2026
  from Vilnius, Lithiania via Telnet
- Guest
  Sun Mar 8 08:55:47 2026
  from Jkl via SSH
- Guest
  Fri Jan 2 22:29:10 2026
  from Minneapolis, Mn via Telnet

System Info

Sysop:	Coz
Location:	Anoka, MN
Users:	2
Nodes:	4 (0 / 4)
Uptime:	37:20:21
Calls:	427
Files:	6,774
Messages:	244,593

Character codes

Who's Online

Recent Visitors

System Info