Hello Everyone!
After much tinkering, I've been unable to get translations to be 100%. The biggest issue being CP866 -> UTF8. It seems that I can't get Golded+ to really do translation.
The bits from my golded.cfg [which I've tried on Linux and macOS]
-paste-
XLATPATH /fido/etc/golded/
XLATLOCALSET UTF-8
XLATCHARSETALIAS UTF-8 UTF8
XLATCHARSET CP1125 UTF-8 1125_u8.chs
XLATCHARSET CP437 UTF-8 437_u8.chs
XLATCHARSET CP850 UTF-8 850_u8.chs
XLATCHARSET CP865 UTF-8 865_u8.chs
XLATCHARSET CP866 UTF-8 866_u8.chs
XLATCHARSET LATIN-1 UTF-8 iso1_u8.chs
XLATCHARSET KOI8-R UTF-8 koi8_u8.chs
-end-
I thought it was just the messages, so I wrote a PHP library to read JAM files and translate the message body text to UTF8 and then output that to the terminal [the same terminal I use for Golded+, etc etc]. So my terminal (Apple's macOSX Terminal.app) does indeed display characters correctly, it just seems I can't get GoldEd+ to do it as well.
PHP code bits for reference:
-paste-
$xlated = mb_convert_encoding($line, "UTF-8", $msg_encoding);
-end-
$xlated is the body line string after mb_convert_encoding() takes the raw bytes ( $line ) and converts them based on $msg_encoding, which is the message's CHRS (or CHRSET) value, which was translated earlier to a PHP native character set. See
https://www.php.net/manual/en/function.mb-convert-encoding.php for more info on the PHP function.
In addition: I'm using the included translation files, the most troubling display is from users with CP866 character sets.
-file 866_u8.chs-
;
; This file is a charset conversion module in text form.
;
; Source file:
;
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP866.TXT
;
100000 ; ID number (when >65535, all 255 chars will be translated)
0 ; version number
;
2 ; level number
;
CP866
UTF-8
;
\0 \0 ; NULL
\0 \d1 ; START OF HEADING
\0 \d2 ; START OF TEXT
\0 \d3 ; END OF TEXT
\0 \d4 ; END OF TRANSMISSION
\0 \d5 ; ENQUIRY
\0 \d6 ; ACKNOWLEDGE
\0 \d7 ; BELL
\0 \d8 ; BACKSPACE
\0 \d9 ; HORIZONTAL TABULATION
\0 \d10 ; LINE FEED
\0 \d11 ; VERTICAL TABULATION
\0 \d12 ; FORM FEED
\0 \d13 ; CARRIAGE RETURN
\0 \d14 ; SHIFT OUT
\0 \d15 ; SHIFT IN
\0 \d16 ; DATA LINK ESCAPE
\0 \d17 ; DEVICE CONTROL ONE
\0 \d18 ; DEVICE CONTROL TWO
\0 \d19 ; DEVICE CONTROL THREE
\0 \d20 ; DEVICE CONTROL FOUR
\0 \d21 ; NEGATIVE ACKNOWLEDGE
\0 \d22 ; SYNCHRONOUS IDLE
\0 \d23 ; END OF TRANSMISSION BLOCK
\0 \d24 ; CANCEL
\0 \d25 ; END OF MEDIUM
\0 \d26 ; SUBSTITUTE
\0 \d27 ; ESCAPE
\0 \d28 ; FILE SEPARATOR
\0 \d29 ; GROUP SEPARATOR
\0 \d30 ; RECORD SEPARATOR
\0 \d31 ; UNIT SEPARATOR
\0 \d32 ; SPACE
\0 \d33 ; EXCLAMATION MARK
\0 \d34 ; QUOTATION MARK
\0 \d35 ; NUMBER SIGN
\0 \d36 ; DOLLAR SIGN
\0 \d37 ; PERCENT SIGN
\0 \d38 ; AMPERSAND
\0 \d39 ; APOSTROPHE
\0 \d40 ; LEFT PARENTHESIS
\0 \d41 ; RIGHT PARENTHESIS
\0 \d42 ; ASTERISK
\0 \d43 ; PLUS SIGN
\0 \d44 ; COMMA
\0 \d45 ; HYPHEN-MINUS
\0 \d46 ; FULL STOP
\0 \d47 ; SOLIDUS
\0 \d48 ; DIGIT ZERO
\0 \d49 ; DIGIT ONE
\0 \d50 ; DIGIT TWO
\0 \d51 ; DIGIT THREE
\0 \d52 ; DIGIT FOUR
\0 \d53 ; DIGIT FIVE
\0 \d54 ; DIGIT SIX
\0 \d55 ; DIGIT SEVEN
\0 \d56 ; DIGIT EIGHT
\0 \d57 ; DIGIT NINE
\0 \d58 ; COLON
\0 \d59 ; SEMICOLON
\0 \d60 ; LESS-THAN SIGN
\0 \d61 ; EQUALS SIGN
\0 \d62 ; GREATER-THAN SIGN
\0 \d63 ; QUESTION MARK
\0 \d64 ; COMMERCIAL AT
\0 \d65 ; LATIN CAPITAL LETTER A
\0 \d66 ; LATIN CAPITAL LETTER B
\0 \d67 ; LATIN CAPITAL LETTER C
\0 \d68 ; LATIN CAPITAL LETTER D
\0 \d69 ; LATIN CAPITAL LETTER E
\0 \d70 ; LATIN CAPITAL LETTER F
\0 \d71 ; LATIN CAPITAL LETTER G
\0 \d72 ; LATIN CAPITAL LETTER H
\0 \d73 ; LATIN CAPITAL LETTER I
\0 \d74 ; LATIN CAPITAL LETTER J
\0 \d75 ; LATIN CAPITAL LETTER K
\0 \d76 ; LATIN CAPITAL LETTER L
\0 \d77 ; LATIN CAPITAL LETTER M
\0 \d78 ; LATIN CAPITAL LETTER N
\0 \d79 ; LATIN CAPITAL LETTER O
\0 \d80 ; LATIN CAPITAL LETTER P
\0 \d81 ; LATIN CAPITAL LETTER Q
\0 \d82 ; LATIN CAPITAL LETTER R
\0 \d83 ; LATIN CAPITAL LETTER S
\0 \d84 ; LATIN CAPITAL LETTER T
\0 \d85 ; LATIN CAPITAL LETTER U
\0 \d86 ; LATIN CAPITAL LETTER V
\0 \d87 ; LATIN CAPITAL LETTER W
\0 \d88 ; LATIN CAPITAL LETTER X
\0 \d89 ; LATIN CAPITAL LETTER Y
\0 \d90 ; LATIN CAPITAL LETTER Z
\0 \d91 ; LEFT SQUARE BRACKET
\0 \d92 ; REVERSE SOLIDUS
\0 \d93 ; RIGHT SQUARE BRACKET
\0 \d94 ; CIRCUMFLEX ACCENT
\0 \d95 ; LOW LINE
\0 \d96 ; GRAVE ACCENT
\0 \d97 ; LATIN SMALL LETTER A
\0 \d98 ; LATIN SMALL LETTER B
\0 \d99 ; LATIN SMALL LETTER C
\0 \d100 ; LATIN SMALL LETTER D
\0 \d101 ; LATIN SMALL LETTER E
\0 \d102 ; LATIN SMALL LETTER F
\0 \d103 ; LATIN SMALL LETTER G
\0 \d104 ; LATIN SMALL LETTER H
\0 \d105 ; LATIN SMALL LETTER I
\0 \d106 ; LATIN SMALL LETTER J
\0 \d107 ; LATIN SMALL LETTER K
\0 \d108 ; LATIN SMALL LETTER L
\0 \d109 ; LATIN SMALL LETTER M
\0 \d110 ; LATIN SMALL LETTER N
\0 \d111 ; LATIN SMALL LETTER O
\0 \d112 ; LATIN SMALL LETTER P
\0 \d113 ; LATIN SMALL LETTER Q
\0 \d114 ; LATIN SMALL LETTER R
\0 \d115 ; LATIN SMALL LETTER S
\0 \d116 ; LATIN SMALL LETTER T
\0 \d117 ; LATIN SMALL LETTER U
\0 \d118 ; LATIN SMALL LETTER V
\0 \d119 ; LATIN SMALL LETTER W
\0 \d120 ; LATIN SMALL LETTER X
\0 \d121 ; LATIN SMALL LETTER Y
\0 \d122 ; LATIN SMALL LETTER Z
\0 \d123 ; LEFT CURLY BRACKET
\0 \d124 ; VERTICAL LINE
\0 \d125 ; RIGHT CURLY BRACKET
\0 \d126 ; TILDE
\0 \d127 ; DELETE
\d208 \d144 ; CYRILLIC CAPITAL LETTER A
\d208 \d145 ; CYRILLIC CAPITAL LETTER BE
\d208 \d146 ; CYRILLIC CAPITAL LETTER VE
\d208 \d147 ; CYRILLIC CAPITAL LETTER GHE
\d208 \d148 ; CYRILLIC CAPITAL LETTER DE
\d208 \d149 ; CYRILLIC CAPITAL LETTER IE
\d208 \d150 ; CYRILLIC CAPITAL LETTER ZHE
\d208 \d151 ; CYRILLIC CAPITAL LETTER ZE
\d208 \d152 ; CYRILLIC CAPITAL LETTER I
\d208 \d153 ; CYRILLIC CAPITAL LETTER SHORT I
\d208 \d154 ; CYRILLIC CAPITAL LETTER KA
\d208 \d155 ; CYRILLIC CAPITAL LETTER EL
\d208 \d156 ; CYRILLIC CAPITAL LETTER EM
\d208 \d157 ; CYRILLIC CAPITAL LETTER EN
\d208 \d158 ; CYRILLIC CAPITAL LETTER O
\d208 \d159 ; CYRILLIC CAPITAL LETTER PE
\d208 \d160 ; CYRILLIC CAPITAL LETTER ER
\d208 \d161 ; CYRILLIC CAPITAL LETTER ES
\d208 \d162 ; CYRILLIC CAPITAL LETTER TE
\d208 \d163 ; CYRILLIC CAPITAL LETTER U
\d208 \d164 ; CYRILLIC CAPITAL LETTER EF
\d208 \d165 ; CYRILLIC CAPITAL LETTER HA
\d208 \d166 ; CYRILLIC CAPITAL LETTER TSE
\d208 \d167 ; CYRILLIC CAPITAL LETTER CHE
\d208 \d168 ; CYRILLIC CAPITAL LETTER SHA
\d208 \d169 ; CYRILLIC CAPITAL LETTER SHCHA
\d208 \d170 ; CYRILLIC CAPITAL LETTER HARD SIGN
\d208 \d171 ; CYRILLIC CAPITAL LETTER YERU
\d208 \d172 ; CYRILLIC CAPITAL LETTER SOFT SIGN
\d208 \d173 ; CYRILLIC CAPITAL LETTER E
\d208 \d174 ; CYRILLIC CAPITAL LETTER YU
\d208 \d175 ; CYRILLIC CAPITAL LETTER YA
\d208 \d176 ; CYRILLIC SMALL LETTER A
\d208 \d177 ; CYRILLIC SMALL LETTER BE
\d208 \d178 ; CYRILLIC SMALL LETTER VE
\d208 \d179 ; CYRILLIC SMALL LETTER GHE
\d208 \d180 ; CYRILLIC SMALL LETTER DE
\d208 \d181 ; CYRILLIC SMALL LETTER IE
\d208 \d182 ; CYRILLIC SMALL LETTER ZHE
\d208 \d183 ; CYRILLIC SMALL LETTER ZE
\d208 \d184 ; CYRILLIC SMALL LETTER I
\d208 \d185 ; CYRILLIC SMALL LETTER SHORT I
\d208 \d186 ; CYRILLIC SMALL LETTER KA
\d208 \d187 ; CYRILLIC SMALL LETTER EL
\d208 \d188 ; CYRILLIC SMALL LETTER EM
\d208 \d189 ; CYRILLIC SMALL LETTER EN
\d208 \d190 ; CYRILLIC SMALL LETTER O
\d208 \d191 ; CYRILLIC SMALL LETTER PE
\d226 \d150 \d145 ; LIGHT SHADE
\d226 \d150 \d146 ; MEDIUM SHADE
\d226 \d150 \d147 ; DARK SHADE
\d226 \d148 \d130 ; BOX DRAWINGS LIGHT VERTICAL
\d226 \d148 \d164 ; BOX DRAWINGS LIGHT VERTICAL AND LEFT
\d226 \d149 \d161 ; BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE
\d226 \d149 \d162 ; BOX DRAWINGS VERTICAL DOUBLE AND LEFT SINGLE
\d226 \d149 \d150 ; BOX DRAWINGS DOWN DOUBLE AND LEFT SINGLE
\d226 \d149 \d149 ; BOX DRAWINGS DOWN SINGLE AND LEFT DOUBLE
\d226 \d149 \d163 ; BOX DRAWINGS DOUBLE VERTICAL AND LEFT
\d226 \d149 \d145 ; BOX DRAWINGS DOUBLE VERTICAL
\d226 \d149 \d151 ; BOX DRAWINGS DOUBLE DOWN AND LEFT
\d226 \d149 \d157 ; BOX DRAWINGS DOUBLE UP AND LEFT
\d226 \d149 \d144 ; BOX DRAWINGS DOUBLE HORIZONTAL
\d226 \d148 \d148 ; BOX DRAWINGS LIGHT UP AND RIGHT
\d226 \d148 \d180 ; BOX DRAWINGS LIGHT UP AND HORIZONTAL
\d226 \d148 \d172 ; BOX DRAWINGS LIGHT DOWN AND HORIZONTAL
\d226 \d148 \d156 ; BOX DRAWINGS LIGHT VERTICAL AND RIGHT
\d226 \d148 \d128 ; BOX DRAWINGS LIGHT HORIZONTAL
\d226 \d148 \d188 ; BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
\d226 \d149 \d158 ; BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE
\d226 \d149 \d159 ; BOX DRAWINGS VERTICAL DOUBLE AND RIGHT SINGLE
\d226 \d149 \d154 ; BOX DRAWINGS DOUBLE UP AND RIGHT
\d226 \d149 \d148 ; BOX DRAWINGS DOUBLE DOWN AND RIGHT
\d226 \d149 \d169 ; BOX DRAWINGS DOUBLE UP AND HORIZONTAL
\d226 \d149 \d166 ; BOX DRAWINGS DOUBLE DOWN AND HORIZONTAL
\d226 \d149 \d160 ; BOX DRAWINGS DOUBLE VERTICAL AND RIGHT
\d226 \d149 \d144 ; BOX DRAWINGS DOUBLE HORIZONTAL
\d226 \d149 \d172 ; BOX DRAWINGS DOUBLE VERTICAL AND HORIZONTAL
\d226 \d149 \d167 ; BOX DRAWINGS UP SINGLE AND HORIZONTAL DOUBLE
\d226 \d149 \d168 ; BOX DRAWINGS UP DOUBLE AND HORIZONTAL SINGLE
\d226 \d149 \d164 ; BOX DRAWINGS DOWN SINGLE AND HORIZONTAL DOUBLE
\d226 \d149 \d165 ; BOX DRAWINGS DOWN DOUBLE AND HORIZONTAL SINGLE
\d226 \d149 \d153 ; BOX DRAWINGS UP DOUBLE AND RIGHT SINGLE
\d226 \d149 \d152 ; BOX DRAWINGS UP SINGLE AND RIGHT DOUBLE
\d226 \d149 \d146 ; BOX DRAWINGS DOWN SINGLE AND RIGHT DOUBLE
\d226 \d149 \d147 ; BOX DRAWINGS DOWN DOUBLE AND RIGHT SINGLE
\d226 \d149 \d171 ; BOX DRAWINGS VERTICAL DOUBLE AND HORIZONTAL SINGLE
\d226 \d149 \d170 ; BOX DRAWINGS VERTICAL SINGLE AND HORIZONTAL DOUBLE
\d226 \d148 \d152 ; BOX DRAWINGS LIGHT UP AND LEFT
\d226 \d148 \d140 ; BOX DRAWINGS LIGHT DOWN AND RIGHT
\d226 \d150 \d136 ; FULL BLOCK
\d226 \d150 \d132 ; LOWER HALF BLOCK
\d226 \d150 \d140 ; LEFT HALF BLOCK
\d226 \d150 \d144 ; RIGHT HALF BLOCK
\d226 \d150 \d128 ; UPPER HALF BLOCK
\d209 \d128 ; CYRILLIC SMALL LETTER ER
\d209 \d129 ; CYRILLIC SMALL LETTER ES
\d209 \d130 ; CYRILLIC SMALL LETTER TE
\d209 \d131 ; CYRILLIC SMALL LETTER U
\d209 \d132 ; CYRILLIC SMALL LETTER EF
\d209 \d133 ; CYRILLIC SMALL LETTER HA
\d209 \d134 ; CYRILLIC SMALL LETTER TSE
\d209 \d135 ; CYRILLIC SMALL LETTER CHE
\d209 \d136 ; CYRILLIC SMALL LETTER SHA
\d209 \d137 ; CYRILLIC SMALL LETTER SHCHA
\d209 \d138 ; CYRILLIC SMALL LETTER HARD SIGN
\d209 \d139 ; CYRILLIC SMALL LETTER YERU
\d209 \d140 ; CYRILLIC SMALL LETTER SOFT SIGN
\d209 \d141 ; CYRILLIC SMALL LETTER E
\d209 \d142 ; CYRILLIC SMALL LETTER YU
\d209 \d143 ; CYRILLIC SMALL LETTER YA
\d208 \d129 ; CYRILLIC CAPITAL LETTER IO
\d209 \d145 ; CYRILLIC SMALL LETTER IO
\d208 \d132 ; CYRILLIC CAPITAL LETTER UKRAINIAN IE
\d209 \d148 ; CYRILLIC SMALL LETTER UKRAINIAN IE
\d208 \d135 ; CYRILLIC CAPITAL LETTER YI
\d209 \d151 ; CYRILLIC SMALL LETTER YI
\d208 \d142 ; CYRILLIC CAPITAL LETTER SHORT U
\d209 \d158 ; CYRILLIC SMALL LETTER SHORT U
\d194 \d176 ; DEGREE SIGN
\d226 \d136 \d153 ; BULLET OPERATOR
\d194 \d183 ; MIDDLE DOT
\d226 \d136 \d154 ; SQUARE ROOT
\d226 \d132 \d150 ; NUMERO SIGN
\d194 \d164 ; CURRENCY SIGN
\d226 \d150 \d160 ; BLACK SQUARE
\d194 \d160 ; NO-BREAK SPACE
END
-file end-
My primary example message is from FIDONEWS, MsgID "2:5030/1081.117 61f6e5cd"
My PHP script correctly converts the CP866 characters into UTF-8; but Golded+ just makes a mess of it.
The tagline of the message translates to "- And you would do art. Poetry, right?"
and the origin: (loosly) "I advise you to rub with ant alcohol"
Which appears to be posted by a version of GoldEd running on Windows-32bit - So I have to believe proper character translation can be done!
Sorry for the fairly large post; just tried to give as much information as possible in one shot.
Any help is greatly appreciated!
Scott
---
* Origin: -={ The Digital Post }=- (1:266/420.1)