diff options
Diffstat (limited to 'Documentation/unicode.txt')
-rw-r--r-- | Documentation/unicode.txt | 139 |
1 files changed, 139 insertions, 0 deletions
diff --git a/Documentation/unicode.txt b/Documentation/unicode.txt new file mode 100644 index 000000000..b9f44a6eb --- /dev/null +++ b/Documentation/unicode.txt @@ -0,0 +1,139 @@ +The Linux kernel code has been rewritten to use Unicode to map +characters to fonts. By downloading a single Unicode-to-font table, +both the eight-bit character sets and UTF-8 mode are changed to use +the font as indicated. + +This changes the semantics of the eight-bit character tables subtly. +The four character tables are now: + +Map symbol Map name Escape code (G0) + +LAT1_MAP Latin-1 (ISO 8859-1) ESC ( B +GRAF_MAP DEC VT100 pseudographics ESC ( 0 +IBMPC_MAP IBM code page 437 ESC ( U +USER_MAP User defined ESC ( K + +In particular, ESC ( U is no longer "straight to font", since the font +might be completely different than the IBM character set. This +permits for example the use of block graphics even with a Latin-1 font +loaded. + +In accordance with the Unicode standard/ISO 10646 the range U+F000 to +U+F8FF has been reserved for OS-wide allocation (the Unicode Standard +refers to this as a "Corporate Zone", since this is inaccurate for +Linux we call it the "Linux Zone"). U+F000 was picked as the starting +point since it lets the direct-mapping area start on a large power of +two (in case 1024- or 2048-character fonts ever become necessary). +This leaves U+E000 to U+EFFF as End User Zone. + +The Unicodes in the range U+F000 to U+F1FF have been hard-coded to map +directly to the loaded font, bypassing the translation table. The +user-defined map now defaults to U+F000 to U+F1FF, emulating the +previous behaviour. This range may expand in the future should it be +warranted. + +Actual characters assigned in the Linux Zone +-------------------------------------------- + +In addition, the following characters not present in Unicode 1.1.4 (at +least, I have not found them!) have been defined; these are used by +the DEC VT graphics map: + +U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1 +U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3 +U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7 +U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9 + +The DEC VT220 uses a 6x10 character matrix, and these characters form +a smooth progression in the DEC VT graphics character set. I have +omitted the scan 5 line, since it is also used as a block-graphics +character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL. +However, I left U+F802 blank should the need arise. + +Klingon language support +------------------------ + +Unfortunately, Unicode/ISO 10646 does not allocate code points for the +language Klingon, probably fearing the potential code point explosion +if many fictional languages were submitted for inclusion. There are +also political reasons (the Japanese, for example, are not too happy +about the whole 16-bit concept to begin with.) However, with Linux +being a hacker-driven OS it seems this is a brilliant linguistic hack +worth supporting. Hence I have chosen to add it to the list in the +Linux Zone. + +Several glyph forms for the Klingon alphabet has been proposed. +However, since the set of symbols appear to be consistent throughout, +with only the actual shapes being different, in keeping with standard +Unicode practice these differences are considered font variants. + +Klingon has an alphabet of 26 characters, a positional numeric writing +system with 10 digits, and is written left-to-right, top-to-bottom. +Punctuation appears to be only used in Latin transliteration; it is +appears customary to write each sentence on its own line, and +centered. Space has been reserved for punctuation should it prove +necessary. + +This encoding has been endorsed by the Klingon Language Institute. +For more information, contact them at: + + http://www.kli.org/ + +Since the characters in the beginning of the Linux CZ have been more +of the dingbats/symbols/forms type and this is a language, I have +located it at the end, on a 16-cell boundary in keeping with standard +Unicode practice. + +U+F8D0 KLINGON LETTER A +U+F8D1 KLINGON LETTER B +U+F8D2 KLINGON LETTER CH +U+F8D3 KLINGON LETTER D +U+F8D4 KLINGON LETTER E +U+F8D5 KLINGON LETTER GH +U+F8D6 KLINGON LETTER H +U+F8D7 KLINGON LETTER I +U+F8D8 KLINGON LETTER J +U+F8D9 KLINGON LETTER L +U+F8DA KLINGON LETTER M +U+F8DB KLINGON LETTER N +U+F8DC KLINGON LETTER NG +U+F8DD KLINGON LETTER O +U+F8DE KLINGON LETTER P +U+F8DF KLINGON LETTER Q + - Written <q> in standard Okrand Latin transliteration +U+F8E0 KLINGON LETTER QH + - Written <Q> in standard Okrand Latin transliteration +U+F8E1 KLINGON LETTER R +U+F8E2 KLINGON LETTER S +U+F8E3 KLINGON LETTER T +U+F8E4 KLINGON LETTER TLH +U+F8E5 KLINGON LETTER U +U+F8E6 KLINGON LETTER V +U+F8E7 KLINGON LETTER W +U+F8E8 KLINGON LETTER Y +U+F8E9 KLINGON LETTER GLOTTAL STOP + +U+F8F0 KLINGON DIGIT ZERO +U+F8F1 KLINGON DIGIT ONE +U+F8F2 KLINGON DIGIT TWO +U+F8F3 KLINGON DIGIT THREE +U+F8F4 KLINGON DIGIT FOUR +U+F8F5 KLINGON DIGIT FIVE +U+F8F6 KLINGON DIGIT SIX +U+F8F7 KLINGON DIGIT SEVEN +U+F8F8 KLINGON DIGIT EIGHT +U+F8F9 KLINGON DIGIT NINE + +Other Fictional and Artificial Scripts +-------------------------------------- + +Since the assignment of the Klingon Linux Unicode block, a registry of +fictional and artificial scripts has been established by John Cowan, +<cowan@ccil.org>. The ConScript Unicode Registry is accessible at +http://locke.ccil.org/~cowan/csur/; the ranges used fall at the bottom +of the End User Zone and can hence not be normatively assigned, but it +is recommended that people who wish to encode fictional scripts use +these codes, in the interest of interoperability. For Klingon, CSUR +has adopted the Linux encoding. + + H. Peter Anvin <hpa@zytor.com> |