• CP850 (I think)

    From Ozz Nixon@1:1/123 to All on Tue Apr 23 14:01:34 2019
    The more I tinker with CP437 <-> UTF8, the more I realized, other
    Fidonet users may need/want such translation from the BBS. So, I
    started reading about CP850, and the description said 0x85 for example
    i (without the top dot) in one CP8?? code page is a different character
    all together - and this is how I could easily ask the user do you see:
    "said char" Y=CP850, N=CP855 (I think). As my vagueness suggests, I
    have no f'ing clue haha.

    So Ward, Bjorn, etc. what would be a good way to ask the user their
    code page, when they may not even know what a code page is?

    And... in CP850 can I still draw a single line box, or does that code
    page only support double line box? (would post a visual but this NNTP
    client does not like those characters).

    Thank you (anyone) for insight on CP <-> UTF8 feedback!
    Ozz

    --
    .. Ozz Nixon
    ... Author ExchangeBBS (suite)
    .... Since 1983 BBS Developer

    --- ExchangeBBS NNTP Server v3.1/Linux64
    * Origin: (1:1/123)
  • From Alan Ianson@1:153/757 to Ozz Nixon on Tue Apr 23 13:21:08 2019
    Thank you (anyone) for insight on CP <-> UTF8 feedback!
    Ozz

    Mystic has it's own way of dealing with this situation. Everything is stored on the BBS with CP437 (an asumption) but when I log onto my own mystic BBS on a UTF-8 terminal Mystic translates the CP437 to UTF-8 so I see the BBS as I would have in days gone by.

    It would be best if we moved to UTF-8 natively but that could get messy.. :)

    --- BBBS/Li6 v4.10 Toy-4
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From Maurice Kinal@1:153/7001 to Alan Ianson on Tue Apr 23 20:30:08 2019
    Hey Alan!

    It would be best if we moved to UTF-8 natively but that could get
    messy

    There is an understatement if I've ever seen one. So far I have only seen one web based BBS that can get the word wrapping right and it is in Russia. Others I've seen get the characters correctly but then count them as bytes instead of characters so that word wrapping will be thrown off. Add to that the fact that ansi BBSes cannot count higher than 80 and it gets even messier .... nevermind smart devices that cannot even count anywhere near 80 characters.

    If you were asking me, and I know that you weren't, the two digit year is a MUCH bigger issue and given it has been roughly 17 years since it was declared obsolete I don't hold out much faith in ANYTHING ever being fixed in Fidonet ... unless of course you're like me and totally take advantage of FTN crippleware and ignore the obvious weaknesses such as datetime stamps, CHRS and UTC offset kludges, etc. Also while I am ranting, all the crappy binary data in headers, especially the pktHeader which are nothing but wasted byes.

    Life is good,
    Maurice

    ... Don't cry for me I have vi.
    --- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)
  • From Ozz Nixon@1:1/123 to Maurice Kinal on Wed Apr 24 23:26:48 2019
    On 2019-04-23 20:30:09 +0000, Maurice Kinal -> Alan Ianson said:

    Hey Alan!

    It would be best if we moved to UTF-8 natively but that could get
    messy

    There is an understatement if I've ever seen one. So far I have only seen
    one
    web based BBS that can get the word wrapping right and it is in Russia. Others I've seen get the characters correctly but then count them as bytes instead of characters so that word wrapping will be thrown off. Add to that the fact that ansi BBSes cannot count higher than 80 and it gets even
    messier
    .... nevermind smart devices that cannot even count anywhere near 80 characters.

    LMAO! Sad but true...

    If you were asking me, and I know that you weren't, the two digit year is a MUCH bigger issue and given it has been roughly 17 years since it was
    declared
    obsolete I don't hold out much faith in ANYTHING ever being fixed in Fidonet ... unless of course you're like me and totally take advantage of FTN crippleware and ignore the obvious weaknesses such as datetime stamps, CHRS and UTC offset kludges, etc. Also while I am ranting, all the crappy binary data in headers, especially the pktHeader which are nothing but wasted byes.

    Well, I am attacking it the other way around ... I am using
    TRichEdit.com's Editor component - which is UTF8 only now. And working
    on Character sets, without fully understanding the variances. But, I am reading, and asking. ;-)

    Per the PKT header(s), I do not mind them - my challenge was these
    lines have ^a kludge markers, these lines don't, then the end has
    ^aPATH. To me, it would be best if all the kludge lines, including
    tear, origin, seen, path all preface the text... and the text simply
    ends with a null ^@. Parsing would be a snap, tossers would be
    lightning fast, ridiculously easy to code. Yeah, we could do it like
    Email and NNTP headers vs bodies - but, then it wouldn't be fidonet, it
    would be Internet. ;-)

    The challange is all of the obsolete binaries that are still
    online/involved. I spent the past year building FMTP (Fidonet Mail
    Transport Protocol), a hybrid of SMTP/NNTP as far as
    communication/commands - even to the point that a BBS would simply use
    the FMTP server as their local message base. Spoke with a few sysops,
    and noone wanted to give up having the messages on their local system.
    Even when I pointed out, it would be 100% cloud based backend on public
    CDN networks... e.g. No one in control, the 6 FMTP server would do
    real-time replication of posts and the messages could be access as pull
    down on demand (so it could be displayed in a full-screen viewer, or a
    text based stream, or a web forum... i have not given up, just put it
    on the back burner for now...

    PS. Do you know in CP850/CP855 - the highbit characters for drawing
    boxes - is it the double line or single line version that is missing
    elements? (Somewhere I read, one of these two does not support all of
    the edges/corners for drawing boxes. However, both Wiki pages for them
    all:
    https://en.wikipedia.org/wiki/Code_page_850
    and
    https://en.wikipedia.org/wiki/Code_page_855

    I know storing everything in CP437 and trying to translate to 850, 855
    is not 1 to 1. And in Linux, I can store everything is UTF8 and be done
    with it. I have learned that the first 3 bytes help me know what I am
    dealing with - even in source code:

    If Copy(Ws,1,3)=#239#187#191 then ... it's UTF8 encoded (be it message, source, web, text file).

    Not sure if everyone's Linux does that - but every machine I have
    contains those 3 bytes its a UTF8 stream.

    (Still learning)
    --
    .. Ozz Nixon
    ... Author ExchangeBBS (suite)
    .... Since 1983 BBS Developer

    --- ExchangeBBS NNTP Server v3.1/Linux64
    * Origin: (1:1/123)
  • From Maurice Kinal@1:153/7001 to Ozz Nixon on Thu Apr 25 12:00:14 2019
    Hey Ozz!

    Well, I am attacking it the other way around ... I am using TRichEdit.com's Editor component - which is UTF8 only now.

    Can it count multibyte characters properly? I've been seeing too many utf8 apps that can't, especially on web based BBSes.

    Yeah, we could do it like Email and NNTP headers vs bodies - but,
    then it wouldn't be fidonet, it would be Internet. ;-)

    I've never cared much for email. Besides eliminating binary in headers has nothing to do with the internet. It is file formatting and has been going on for longer than the internet has been in existance.

    noone wanted to give up having the messages on their local system

    That doesn't surprise me.

    Do you know in CP850/CP855 - the highbit characters for drawing
    boxes - is it the double line or single line version that is
    missing elements?

    No I don't but I was planning to check CP866 and cross-reference that to the CP437 table I have now that I believe I have that properly cased. :::knock on wood::: I've asked around in the past but nobody seems to want to go out on a limb over it and I assumed that since they are all IBM the graphical characters would indeed match up.

    If Copy(Ws,1,3)=#239#187#191 then ... it's UTF8 encoded (be it
    message, source, web, text file).

    That's an old school MS idea which can safely be ignored as far as utf8 is concerned. For utf16 it might matter especially for big and little endian systems. To be honest I had forgotten completely about that but it doesn't harm anything if there. I have run across it in fidonet messages in the past.

    Not sure if everyone's Linux does that - but every machine I
    have contains those 3 bytes its a UTF8 stream.

    Let me guess, Windows machines? Anyhow for utf8 it makes no difference but some MS apps might disagree, especially utf16 ones. Anyhow I still haven't run across a linux utf8 app that uses it but it probably doesn't hurt anything if it exists.

    Life is good,
    Maurice

    ... Don't cry for me I have vi.
    --- GNU bash, version 5.0.7(1)-release (x86_64-pc-linux-gnu)
    * Origin: Little Mikey's Brain - Ladysmith BC, Canada (1:153/7001)