1 files changed, 139 insertions, 0 deletions
diff --git a/Documentation/unicode.txt b/Documentation/unicode.txt
new file mode 100644
index 000000000..b9f44a6eb
--- /dev/null
+++ b/Documentation/unicode.txt
@@ -0,0 +1,139 @@
+The Linux kernel code has been rewritten to use Unicode to map
+characters to fonts.  By downloading a single Unicode-to-font table,
+both the eight-bit character sets and UTF-8 mode are changed to use
+the font as indicated.
+
+This changes the semantics of the eight-bit character tables subtly.
+The four character tables are now:
+
+Map symbol	Map name			Escape code (G0)
+
+LAT1_MAP	Latin-1 (ISO 8859-1)		ESC ( B
+GRAF_MAP	DEC VT100 pseudographics	ESC ( 0
+IBMPC_MAP	IBM code page 437		ESC ( U
+USER_MAP	User defined			ESC ( K
+
+In particular, ESC ( U is no longer "straight to font", since the font
+might be completely different than the IBM character set.  This
+permits for example the use of block graphics even with a Latin-1 font
+loaded.
+
+In accordance with the Unicode standard/ISO 10646 the range U+F000 to
+U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
+refers to this as a "Corporate Zone", since this is inaccurate for
+Linux we call it the "Linux Zone").  U+F000 was picked as the starting
+point since it lets the direct-mapping area start on a large power of
+two (in case 1024- or 2048-character fonts ever become necessary).
+This leaves U+E000 to U+EFFF as End User Zone.
+
+The Unicodes in the range U+F000 to U+F1FF have been hard-coded to map
+directly to the loaded font, bypassing the translation table.  The
+user-defined map now defaults to U+F000 to U+F1FF, emulating the
+previous behaviour.  This range may expand in the future should it be
+warranted.
+
+Actual characters assigned in the Linux Zone
+--------------------------------------------
+
+In addition, the following characters not present in Unicode 1.1.4 (at
+least, I have not found them!) have been defined; these are used by
+the DEC VT graphics map:
+
+U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
+U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
+U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
+U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
+
+The DEC VT220 uses a 6x10 character matrix, and these characters form
+a smooth progression in the DEC VT graphics character set.  I have
+omitted the scan 5 line, since it is also used as a block-graphics
+character, and hence has been coded as U+2500 FORMS LIGHT HORIZONTAL.
+However, I left U+F802 blank should the need arise.  
+
+Klingon language support
+------------------------
+
+Unfortunately, Unicode/ISO 10646 does not allocate code points for the
+language Klingon, probably fearing the potential code point explosion
+if many fictional languages were submitted for inclusion.  There are
+also political reasons (the Japanese, for example, are not too happy
+about the whole 16-bit concept to begin with.)  However, with Linux
+being a hacker-driven OS it seems this is a brilliant linguistic hack
+worth supporting.  Hence I have chosen to add it to the list in the
+Linux Zone.
+
+Several glyph forms for the Klingon alphabet has been proposed.
+However, since the set of symbols appear to be consistent throughout,
+with only the actual shapes being different, in keeping with standard
+Unicode practice these differences are considered font variants.
+
+Klingon has an alphabet of 26 characters, a positional numeric writing
+system with 10 digits, and is written left-to-right, top-to-bottom.
+Punctuation appears to be only used in Latin transliteration; it is
+appears customary to write each sentence on its own line, and
+centered.  Space has been reserved for punctuation should it prove
+necessary.
+
+This encoding has been endorsed by the Klingon Language Institute.
+For more information, contact them at:
+
+	http://www.kli.org/
+
+Since the characters in the beginning of the Linux CZ have been more
+of the dingbats/symbols/forms type and this is a language, I have
+located it at the end, on a 16-cell boundary in keeping with standard
+Unicode practice.
+
+U+F8D0	KLINGON LETTER A
+U+F8D1	KLINGON LETTER B
+U+F8D2	KLINGON LETTER CH
+U+F8D3	KLINGON LETTER D
+U+F8D4	KLINGON LETTER E
+U+F8D5	KLINGON LETTER GH
+U+F8D6	KLINGON LETTER H
+U+F8D7	KLINGON LETTER I
+U+F8D8	KLINGON LETTER J
+U+F8D9	KLINGON LETTER L
+U+F8DA	KLINGON LETTER M
+U+F8DB	KLINGON LETTER N
+U+F8DC	KLINGON LETTER NG
+U+F8DD	KLINGON LETTER O
+U+F8DE	KLINGON LETTER P
+U+F8DF	KLINGON LETTER Q
+	- Written <q> in standard Okrand Latin transliteration
+U+F8E0	KLINGON LETTER QH
+	- Written <Q> in standard Okrand Latin transliteration
+U+F8E1	KLINGON LETTER R
+U+F8E2	KLINGON LETTER S
+U+F8E3	KLINGON LETTER T
+U+F8E4	KLINGON LETTER TLH
+U+F8E5	KLINGON LETTER U
+U+F8E6	KLINGON LETTER V
+U+F8E7	KLINGON LETTER W
+U+F8E8	KLINGON LETTER Y
+U+F8E9	KLINGON LETTER GLOTTAL STOP
+
+U+F8F0	KLINGON DIGIT ZERO
+U+F8F1	KLINGON DIGIT ONE
+U+F8F2	KLINGON DIGIT TWO
+U+F8F3	KLINGON DIGIT THREE
+U+F8F4	KLINGON DIGIT FOUR
+U+F8F5	KLINGON DIGIT FIVE
+U+F8F6	KLINGON DIGIT SIX
+U+F8F7	KLINGON DIGIT SEVEN
+U+F8F8	KLINGON DIGIT EIGHT
+U+F8F9	KLINGON DIGIT NINE
+
+Other Fictional and Artificial Scripts
+--------------------------------------
+
+Since the assignment of the Klingon Linux Unicode block, a registry of
+fictional and artificial scripts has been established by John Cowan,
+<cowan@ccil.org>.  The ConScript Unicode Registry is accessible at
+http://locke.ccil.org/~cowan/csur/; the ranges used fall at the bottom
+of the End User Zone and can hence not be normatively assigned, but it
+is recommended that people who wish to encode fictional scripts use
+these codes, in the interest of interoperability.  For Klingon, CSUR
+has adopted the Linux encoding.
+
+	H. Peter Anvin <hpa@zytor.com>