1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
3 "http://www.w3.org/TR/html4/loose.dtd">
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta http-equiv="Content-Language" content="en-us">
10 <title>UCD: Unicode NamesList File Format</title>
11 <link rel="stylesheet" type="text/css" href="http://www.unicode.org/reports/reports-v2.css">
14 <body bgcolor="#ffffff">
16 <table class="header">
18 <td class="icon"><a href="http://www.unicode.org"><img border="0" src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle" alt="[Unicode]" width="34" height="33"></a> <a class="bar" href="http://www.unicode.org/ucd/">Unicode
19 Character Database</a></td>
22 <td class="gray"> </td>
26 <h1>UnicodeĀ® NamesList File Format</h1>
27 <table class="simple" width="90%">
30 <td valign="top" width="144">Revision</td>
31 <td valign="top">12.1.0</td>
34 <td valign="top" width="144">Authors</td>
35 <td valign="top">Asmus Freytag, Ken Whistler</td>
38 <td valign="top" width="144">Date</td>
39 <td valign="top">2019-03-08</td>
42 <td valign="top" width="144">This Version</td>
44 <a href="http://www.unicode.org/Public/12.1.0/ucd/NamesList.html">
45 http://www.unicode.org/Public/12.1.0/ucd/NamesList.html</a></td>
48 <td valign="top" width="144">Previous Version</td>
50 <a href="http://www.unicode.org/Public/12.0.0/ucd/NamesList.html">
51 http://www.unicode.org/Public/12.0.0/ucd/NamesList.html</a></td>
54 <td valign="top" width="144">Latest Version</td>
55 <td valign="top"><a href="http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html">http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html</a></td>
60 <h3><i>Summary</i></h3>
62 <p>This file describes the format and contents of NamesList.txt</p>
64 <h3><i>Status</i></h3>
66 <p><i>The file and the files described herein are part of the <a href="http://www.unicode.org/ucd/">Unicode
67 Character Database</a> (UCD). The Unicode <a href="http://www.unicode.org/terms_of_use.html">
68 Terms of Use</a> apply.</i></p>
72 <h2>1.0 <a name="Introduction" href="#Introduction">Introduction</a></h2>
74 <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
75 text file used to drive the layout of the character code charts in the Unicode
76 Standard. The information in this file is a combination of several fields from
77 the UnicodeData.txt and Blocks.txt files, together with additional annotations
78 for many characters.</p>
79 <p>This document describes the syntax rules for the file
80 format, but also gives brief information on how each construct is rendered
81 when laid out for the code charts. Some of the syntax elements are used only in
82 preparation of the drafts of the code charts and are not present in the final,
83 released form of the NamesList.txt file.</p>
85 <p>Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode
86 5.0. The syntax for marginal sidebar comments is utilized extensively in
87 draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
88 declaration in a comment at the head of the file were introduced after Unicode
89 6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
90 in comments and aliases in the names list format was loosened from the prior
91 limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</p>
93 <p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC
94 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
95 information in the name list file that is not needed (and in fact removed
96 during parsing) for the Unicode code charts.</p>
98 <p>With access to the layout program (<a href="http://www.unicode.org/unibook/">Unibook</a>) it is a simple matter of
99 creating name lists for the purpose of formatting working drafts or other documents containing
100 proposed characters.</p>
101 <p>The content of the NamesList.txt file is optimized for code chart creation.
102 Some information that can be inferred by the reader from context has been
103 suppressed to make the code charts more readable. See the chapter on Code
104 Charts in the <a href="http://www.unicode.org/versions/latest">Unicode
107 <h3>1.1 <a name="Overview" href="#Overview">NamesList File Overview</a></h3>
109 <p>The NamesList files are plain text files which in their most simple form look
112 <p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br>
113 ; this is a file comment (ignored)<br>
114 0020<tab>SPACE<br>
115 0021<tab>EXCLAMATION MARK<br>
116 0022<tab>QUOTATION MARK<br>
118 007F<tab>DELETE</p>
120 <p>The semicolon (as first character), @ and <tab> characters are used
121 by the file syntax and must be provided as shown. Hexadecimal digits must be
122 in UPPERCASE. A double @@ introduces a block header, with the title, and
123 start and ending code of the block provided as shown.</p>
125 <p>For a minimal name list, only the NAME_LINE and BLOCKHEADER and
126 their constituent syntax elements are needed.</p>
128 <p>The full syntax with all the options is provided in the following sections.</p>
130 <h2>2.0 <a name="FileStructure" href="#FileStructure">NamesList File Structure</a></h2>
132 <p>This section defines the overall file structure</p>
134 <pre><strong>NAMELIST: TITLE_PAGE* EXTENDED_BLOCK*
136 <strong>TITLE_PAGE: TITLE
137 | TITLE_PAGE SUBTITLE
138 | TITLE_PAGE SUBHEADER
139 | TITLE_PAGE IGNORED_LINE
140 | TITLE_PAGE EMPTY_LINE
141 | TITLE_PAGE NOTICE_LINE
142 | TITLE_PAGE COMMENT_LINE
143 | TITLE_PAGE PAGEBREAK
144 | TITLE_PAGE FILE_COMMENT
148 EXTENDED_BLOCK: BLOCK
153 | BLOCKHEADER INDEX_TAB
165 CHAR_ENTRY: NAME_LINE | RESERVED_LINE
166 | CHAR_ENTRY ALIAS_LINE
167 | CHAR_ENTRY FORMALALIAS_LINE
168 | CHAR_ENTRY COMMENT_LINE
169 | CHAR_ENTRY CROSS_REF
170 | CHAR_ENTRY DECOMPOSITION
171 | CHAR_ENTRY COMPAT_MAPPING
172 | CHAR_ENTRY IGNORED_LINE
173 | CHAR_ENTRY EMPTY_LINE
174 | CHAR_ENTRY NOTICE_LINE
175 | CHAR_ENTRY FILE_COMMENT
176 | CHAR_ENTRY VARIATION_LINE
179 <p>In other words:</p>
181 Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
182 <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE,
183 EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.</p>
185 <li>CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, VARIATION_LINE, ALIAS and FORMALALIAS_LINE lines
186 occurring before the first block header are treated as if they were
189 <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted
190 sequence of the following lines may occur (in any order and repeated as often
191 as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE,
192 EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.</p>
194 <li>The conventional order of elements in a char entry: NAME_LINE,
195 FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally
196 ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program
197 (<a href="http://www.unicode.org/unibook/">Unibook</a>). </li>
199 <p>Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and
200 FILE_COMMENT, none of these lines may
201 occur in any other place.</p>
203 <li>A NOTICE_LINE or CROSS_REF displays differently depending on whether it follows a header or title
204 or is part of a CHAR_ENTRY</li>
206 <p>A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY.
207 A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may
208 appear after any block header.</p>
209 <p>If the first line of a file is a file comment, it may contain a UTF-8
210 charset declaration (see below). Alternatively, or in addition, a BOM may be
211 present at the very beginning of the file, forcing the encoding to be
212 interpreted as UTF-16 (little-endian only) or UTF-8. When
213 declared as UTF-8, the names list format will support use of characters in
214 the range U+0020..U+02FF in LINE and LABEL elements. Otherwise,
215 the supported repertoire is limited to Latin-1, and attempted use of characters outside
216 the Latin-1 range will result in data corruption.</p>
217 <p>Several of these elements, while part of the formal definition of the
218 file format, do not occur in final published versions of
219 NamesList.txt in the UCD.</p>
221 <h4>Blocks followed by Summaries</h4>
222 <p>A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:</p>
224 SUMMARY: ALTGLYPH_SUMMARY
226 | ALTGLYPH_SUMMARY VARIATION_SUMMARY
229 ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER
230 | ALTGLYPH_SUMMARY SUMMARY_LINE
232 VARIATION_SUMMARY: VARIATION_SUBHEADER
233 | VARIATION_SUMMARY SUMMARY_LINE
235 MIXED_SUMMARY: MIXED_SUBHEADER
236 | MIXED_SUMMARY SUMMARY_LINE
238 SUMMARY_LINE: SUBHEADER
241 | EMPTY_LINE</strong>
244 <p>When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements
245 of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and
246 preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately
247 follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are
248 interspersed between items in the summary.</p>
250 <p>These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are
251 omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements
252 as described below, Unibook will automatically generate any required summaries using a default format for the headers.</p>
254 <p>Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to
255 provide specific contents for these summary titles as well as allow the ability to add additional
256 information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list
257 is machine generated and will always explicitly provide any summary subheaders.</p>
259 <h3>2.1 <a name="FileElements" href="#FileElements">NamesList File Elements</a></h3>
261 <p>This section provides the details of the syntax for the individual elements.</p>
263 <pre><strong>ELEMENT SYNTAX</strong> // How rendered
265 <strong>NAME_LINE: CHAR TAB NAME LF</strong>
266 // The CHAR and the corresponding image are echoed,
267 // followed by the name as given in NAME
269 <strong> | CHAR TAB "<" LCNAME ">" LF</strong>
270 // Control and noncharacters use this form of
271 // lowercase, bracketed pseudo character name
273 <strong> | CHAR TAB NAME SP COMMENT LF</strong>
274 // Names may have a comment, which is stripped off
275 // unless the file is parsed for an ISO style list
277 <strong> | CHAR TAB "<" LCNAME ">" SP COMMENT LF</strong>
278 // Control and noncharacters may also have comments
280 <strong>RESERVED_LINE: CHAR TAB "<reserved>" LF</strong>
281 // The CHAR is echoed followed by an icon for the
282 // reserved character and a fixed string e.g. "<reserved>"
284 <strong>COMMENT_LINE: TAB "*" SP EXPAND_LINE</strong>
285 // * is replaced by BULLET, output line as comment
287 <strong> | TAB EXPAND_LINE</strong>
288 // Output line as comment
290 <strong>ALIAS_LINE: TAB "=" SP LINE</strong>
291 // Replace = by itself, output line as alias
293 <strong>FORMALALIAS_LINE:
294 TAB "%" SP NAME LF</strong>
295 // Replace % by U+203B, output line as formal alias
297 <strong>CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
298 | TAB "x" SP CHAR SP "<" LCNAME ">" LF</strong>
299 // x is replaced by a right arrow
301 <strong> | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
302 | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF</strong>
303 // x is replaced by a right arrow;
304 // (second type as used for control and noncharacters)
306 // In the forms with parentheses the "(","-" and ")" are removed
307 // and the order of CHAR and LCNAME is reversed;
308 // i.e. all inputs result in the same order of output
310 <strong> | TAB "x" SP CHAR LF</strong>
311 // x is replaced by a right arrow
312 // (this type is the only one without LCNAME
313 // and is used for ideographs)
315 <strong>VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
316 | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")"LF</strong>
317 // output standardized variation sequence or simply the char code in case of alternate
318 // glyphs, followed by the alternate glyph or variation glyph and the label and context
320 <strong>FILE_COMMENT: ";" LINE</strong>
322 <strong>EMPTY_LINE: LF</strong>
323 // Empty and ignored lines as well as
324 // file comments are ignored
326 <strong>IGNORED_LINE: TAB ";" LINE</strong>
329 <strong>SIDEBAR_LINE: ";;" LINE</strong>
330 // Output LINE as marginal note
332 <strong>DECOMPOSITION: TAB ":" SP EXPAND_LINE</strong>
333 // Replace ':' by EQUIV, expand line into
336 <strong>COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
337 | TAB "#" SP "<" TAG ">" SP EXPAND_LINE</strong>
338 // Replace '#' by APPROX, output line as mapping;
339 // check for balanced < >
341 <strong>NOTICE_LINE: "@+" TAB LINE</strong>
342 // Output LINE as notice
344 <strong> | "@+" TAB * SP LINE</strong>
345 // Output LINE as notice
346 // "*" expands to a bullet character
347 // Notices following a character code apply to the
348 // character and are indented. Notices not following
349 // a character code apply to the page/block/column
350 // and are italicized, but not indented
352 <strong>TITLE: "@@@" TAB LINE</strong>
353 // Output LINE as text
354 // Title is used in page headers
356 <strong>SUBTITLE: "@@@+" TAB LINE</strong>
357 // Output LINE as subtitle
359 <strong>SUBHEADER: "@" TAB LINE</strong>
360 // Output LINE as column header
362 <strong>VARIATION_SUBHEADER:</strong> <strong>"@~" TAB LINE</strong>
363 // Output LINE as column header (summary subheader)
364 <strong>| "@~"</strong>
365 // Output a default standard variation sequences summary subheader
366 <strong>| "@~" TAB "!"</strong>
367 // Suppress output of a default standard variant sequences summary subheader
368 // and disable display of summary
369 <strong>| "@~" TAB "!" VARSEL_LIST</strong>
370 <strong>| "@~" TAB "!" VARSEL_LIST LINE</strong>
371 // Output a standard summary subheader, using default or LINE respectively
372 // Suppress any std variation sequences using selectors from the list
374 <strong>ALTGLYPH_SUBHEADER:</strong> <strong>"@@~" TAB LINE</strong>
375 // Output LINE as column header (summary subheader)
376 <strong>| "@@~"</strong>
377 // Output a default alternate glyph summary subheader
378 <strong>| "@@~" TAB "!"</strong>
379 // Suppress output of a default alternate glyph summary subheader
380 // and disable display of summary
382 <strong>MIXED_SUBHEADER: </strong><strong>"@@@~" TAB LINE</strong>
383 // Output LINE as column header (summary subheader)
384 <strong>| "@@@~"</strong>
385 // Output a default combined variation and alternate glyph summary subheader
386 <strong>| "@@@~" TAB "!"</strong>
387 // Suppress output of a default alternate glyph summary subheader
388 // and disable display of summary
389 <strong>| "@@@~" TAB "!" VARSEL_LIST</strong>
390 <strong>| "@@@~" TAB "!" VARSEL_LIST LINE</strong>
391 // Output a combined summary subheader, using default or LINE respectively
392 // Suppress any std variation sequences using selectors from the list
394 <strong>BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF</strong>
395 // Cause a page break and optional
396 // blank page, then output one or more charts
397 // followed by the list of character names.
398 // Use BLOCKSTART and BLOCKEND to define
399 // what characters belong to a block.
400 // Use BLOCKNAME in page and table headers
402 <strong>BLOCKNAME: LABEL
403 | LABEL SP "(" LABEL ")"</strong>
404 // If an alternate label is present it replaces
405 // the BLOCKNAME when an ISO-style names list is
406 // laid out; it is ignored in the Unicode charts
408 <strong>BLOCKSTART: CHAR</strong> // First character position in block
409 <strong>BLOCKEND: CHAR</strong> // Last character position in block
410 <strong>PAGEBREAK: "@@"</strong> // Insert a (column) break
411 <strong>INDEX_TAB: "@@+"</strong> // Start a new index tab at latest BLOCKSTART
413 <strong>EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF</strong>
414 // Instances of CHAR (see Notes) are replaced by
415 // CHAR NBSP x NBSP where x is the single Unicode
416 // character corresponding to CHAR.
417 // If character is combining, it is replaced with
418 // CHAR NBSP <circ> x NBSP where <circ> is the
424 <li>Blocks must be aligned on 16-code point boundary and contain an integer
425 multiple of 16-code point columns. The exception to that rule is for blocks of
426 ideographs, <i>etc.</i>, for which no names are listed in the file. The BLOCKEND for such blocks
427 must correspond to the last assigned character, and not the actual end of the block.</li>
428 <li>Blocks must be non-overlapping and in ascending order. NAME_LINEs
429 must be in ascending order and follow the block header for the block to
430 which they belong. </li>
431 <li>Reserved entries are optional, and will normally be supplied automatically. They are
432 required whenever followed by ALIAS_LINE, COMMENT_LINE, NOTICE_LINE or CROSS_REF.
434 <li>An empty alternative glyph summary subheader expression will result in default header "Selected Alternative Glyphs"</li>
435 <li>An empty standard variation subheader expression will result in the default header "Standardized Variation Sequences"</li>
436 <li> A VARSEL_LIST may only contain code points for standard variation selectors (including script specific ones)</li>
437 <li>When displaying a VARIATION_LINE for alternate glyphs, the "ALTn" selector is not displayed. </li>
438 <li>If a glyph is unavailable for the variant glyph in a VARIATION_LINE it is replaced by the glyph for LIGHT SCREEN.</li>
442 <h3>2.2 <a name="FilePrimitives" href="#FilePrimitives">NamesList File Primitives</a></h3>
444 <p>The following are the primitives and terminals for the NamesList syntax.</p>
446 <pre><strong>LINE</strong>: <strong>STRING LF
447 COMMENT: "(" LABEL ")"
448 | "(" LABEL ")" SP "*"
449 | "*"</strong>
451 <strong>NAME</strong>: <sequence of uppercase ASCII letters, digits, space and hyphen>
452 <strong>LCNAME</strong>: <sequence of lowercase ASCII letters, digits, space and hyphen>
453 <strong>| LCNAME "-" CHAR</strong>
455 <strong>TAG</strong>: <sequence of ASCII letters>
456 <strong>LCTAG</strong>: <sequence of lowercase ASCII letters>
457 <strong>STRING</strong>: <sequence of characters in the range U+0020..U+02FF, except controls>
458 <strong>LABEL</strong>: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
459 <strong>VARSEL</strong>: <strong>CHAR
460 | ALT ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )</strong>
461 <strong>VARSEL_LIST</strong>: <strong>"{" CHAR_LIST "}"</strong>
462 <strong>CHAR_LIST</strong>: <strong>CHAR
463 | CHAR_LIST SP CHAR</strong>
464 <strong>CHAR</strong>: <strong>X X X X</strong>
465 <strong>| X X X X X </strong>
466 <strong>| X X X X X X </strong>
467 <strong>X</strong>: <strong>"0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"</strong>
468 <strong>ESC_CHAR</strong>: <strong>ESC CHAR</strong>
469 <strong>ESC</strong>: <strong>"\"</strong>
470 // Special semantics of backslash (\) are supported
471 // only in EXPAND_LINE.
472 <strong>TAB</strong>: <sequence of one or more ASCII tab characters 0x09>
473 <strong>SP</strong>: <ASCII 20>
474 <strong>LF</strong>: <any sequence of ASCII 0A and 0D>
479 <li>Multiple or leading spaces, multiple or leading hyphens, as well as
480 word-initial digits in NAMEs or LCNAMEs are illegal.</li>
481 <li>The French version of the names list uses French rules, which allow
482 apostrophe and accented letters in character names.</li>
483 <li>When names containing code points are lowercased to make them LCNAMEs,
484 the code point values remain uppercase. Such code points by convention
485 follow a hyphen and are the last element in the name.</li>
486 <li>Special lookahead logic prevents a 4 digit number for a standard, such
487 as ISO 9999 from being misinterpreted as ISO CHAR. Currently recognized are
488 "ISO", "DIN", "IEC" and "S X" as well as "S C" for the JIS X and JIS C series of
489 standards. For other standards, or for four-digit years in a comment, use a
490 NOTICE_LINE instead, which prevents expansion, or use '\" to escape the digits.</li>
491 <li>Single and double straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
492 Smart apostrophes are supported, but nested quotes are not.
493 Single quotes can only be applied around a single word.</li>
494 <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the
495 code value is not echoed.</li>
496 <li>Inside an EXPAND_LINE, backslash is treated as an escape character that
497 removes the special meaning of any literal character and also prevents
498 the following digit sequence from being expanded. A backslash character in
499 isolation is never displayed. A sequence of two backslash characters results
500 in display of a single backslash, but has no effect on the interpretation
501 of following characters.</li>
502 <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
504 <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a
505 FILE_COMMENT containing the declaration "UTF-8" or any casemap variation
506 thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
507 detecting the charset declaration (typically: "; charset=utf-8") the
508 remainder of that comment is ignored.
509 If the file is not encoded as
510 UTF-8, the character repertoire for running text (anything
511 other than CHAR) is effectively restricted to the repertoire of Latin-1.
512 Otherwise, characters in the range U+0020..U+02FF
513 are allowed in STRING or LABEL elements, and elements derived from them.</li>
514 <li>The code chart layout program
515 (<a href="http://www.unicode.org/unibook/">Unibook</a>)
516 can accept files in several other formats. These include little-endian UTF-16,
517 prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.</li>
518 <li>While the format allows multiple <tab> characters, by convention the
519 actual number of tabs is always one or two, chosen to provide the best
520 layout of the plain text file.</li>
521 <li>Earlier published versions of the NamesList.txt file may contain trailing or otherwise extraneous
522 spaces or tab characters; while these are errors in the files, they are not
523 being corrected, to retain stability of the published versions. Anyone
524 writing a parser for older versions of this file may need to be prepared to
525 handle such exceptions.</li>
526 <li>The final LF in the file must be present.</li>
528 <h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
530 <p><b>Version 12.1.0</b></p>
532 <li>Reissued for Unicode 12.1.0.</li>
534 <p><b>Version 12.0.0</b></p>
536 <li>Reissued for Unicode 12.0.0.</li>
537 <li>Added definition of TAG (allowing uppercase letters), distinct from LCTAG.</li>
538 <li>Corrected definition of VARIATION_LINE to use LCTAG instead of LCNAME.</li>
539 <li>Corrected definition of COMPAT_MAPPING to use TAG instead of LCTAG.</li>
540 <li>Corrected the documentation regarding which elements allow use of characters
541 in the range U+0020..U+02FF.</li>
543 <p><b>Version 11.0.0</b></p>
545 <li>Reissued for Unicode 11.0.0.</li>
546 <li>Loosened the limitation on repertoire allowed in LINE and LABEL
547 elements to include characters outside Latin-1, in the range
550 <p><b>Version 10.0.0</b></p>
552 <li>Reissued for Unicode 10.0.0.</li>
554 <p><b>Version 9.0.0</b></p>
556 <li>Reissued for Unicode 9.0.0.</li>
558 <p><b>Version 8.0.0</b></p>
560 <li>Reissued for Unicode 8.0.0.</li>
561 <li>Added MIXED_SUBHEADER, VARSEL_LIST, and CHAR_LIST to the syntax.</li>
562 <li>Tweaked BNF and notes for variation summaries.</li>
564 <p><b>Version 7.0.0</b></p>
566 <li>Reissued for Unicode 7.0.0.</li>
568 <p><b>Version 6.3.0</b></p>
570 <li>Reissued for Unicode 6.3.0.</li>
572 <p><b>Version 6.2.0</b></p>
574 <li>Edited the variation syntax definitions, description and corresponding notes for wording.</li>
575 <li>Minor tweaks to the layout of BNF syntax, mostly adding tabs and | characters as needed.</li>
576 <li>Fixed some typographical errors and minor inconsistencies.</li>
577 <li>Added syntax for elements required by variation sequence and alternate glyph summaries.</li>
578 <li>Edited and reformatted some notes for readability.</li>
579 <li>Documented the permitted presence of CROSS_REF outside character entries within blocks.
580 Such CROSS_REFs have been present in published names lists, but that information was missing in
581 the syntax description. For an example see the Currency Symbols block in the code charts.</li>
582 <li>Added description of UTF-8 charset declaration and file encoding.</li>
584 <p><b>Version 6.1.0</b></p>
586 <li>Removed constraint that LCTAG consist only of lowercase letters,
587 because of the existence of the "noBreak" tag.</li>
589 <p><b>Version 6.0.0</b></p>
591 <li>Added definitions for ESC_CHAR and ESC primitives.</li>
592 <li>Clarified interpretation of backslash escapes in EXPAND_LINE.</li>
594 <p><b>Version 5.2.0</b></p>
596 <li>Better aligned the rules section with the actual published files and
597 behavior of existing parsers. This included fixing some obvious typos
598 and clarifying some notes as well as the following changes, which are
599 listed individually.</li>
600 <li>Replaced instances of <tab> by TAB throughout.</li>
601 <li>NAME_LINE for special names may have trailing COMMENTs including COMMENTs
602 consisting entirely of "*".</li>
603 <li>In CROSS_REF added the form without LCNAME, fixed the literal to the
604 correct lowercase "x" and noted that LCNAME may have "<" and ">" around
605 it in the data. Also added missing LF in the rules.</li>
606 <li>Removed a redundant rule for BLOCKHEADER.</li>
607 <li>Changed FORMALALIAS_LINE from LINE to NAME to match actual restriction
609 <li>Extended the documentation of lookahead logic for CHAR.</li>
610 <li>Accounted for FILE_COMMENT in overall file structure.</li>
612 <p><b>Version 5.1.0</b></p>
614 <li>Noted that comments in NAME_LINEs must be preceded by SP.</li>
615 <li>Provided additional information on allowable characters in names.</li>
616 <li>Added SIDEBAR_LINE.</li>
617 <li>Noted that CROSS_REF must contain a SP and CHAR, and that
618 COMPAT_MAPPING must contain a SP and may contain a <tag></li>
619 <li>Noted that LCNAME may contain uppercase characters under
620 exceptional circumstances.</li>
621 <li>Relaxed the restriction on lines starting with #, :, %, x and = on
622 the TITLE_PAGE. These are now treated as comments.</li>
624 <p><b>Version 5.0.0</b></p>
626 <li>Added FORMALALIAS_LINE and INDEX_TAB to syntax.</li>
627 <li>Fixed the list of lines that may appear before a BLOCKHEADER by
628 adding NOTICE_LINE.</li>
629 <li>Minor fixes to the wording of several syntax definitions.</li>
631 <p><b>Version 4.0.0</b></p>
633 <li>Fixed syntax to better reflect restrictions on characters
634 in character and block names.</li>
635 <li>Better document treatment of comments in block names, plus
636 French name rules.</li>
638 <p><b>Version 3.2.0</b></p>
640 <li>Fixed several broken links, added a left margin,
641 changed version numbering.</li>
643 <p><b>Version 3.1.0 (2)</b></p>
645 <li>Use of 4-6 digit hex notation is now supported.</li>
650 <table cellspacing="0" cellpadding="0" border="0">
652 <td><a href="http://www.unicode.org/copyright.html">
653 <img src="http://www.unicode.org/img/hb_notice.gif" border="0" alt="Access to Copyright and terms of use" width="216" height="50"></a></td>
656 <script language="Javascript" type="text/javascript" src="http://www.unicode.org/webscripts/lastModified.js">