1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
3 "http://www.w3.org/TR/html4/loose.dtd">
8 <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
9 <meta http-equiv="Content-Language" content="en-us">
10 <title>UCD: Unicode NamesList File Format</title>
11 <link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
16 <table class="header">
18 <td class="icon" style="width:38px; height:35px">
19 <a href="https://www.unicode.org/">
20 <img border="0" src="https://www.unicode.org/webscripts/logo60s2.gif" align="middle"
21 alt="[Unicode]" width="34" height="33"></a>
24 <td class="icon" style="vertical-align:middle">
26 <a class="bar" href="https://www.unicode.org/ucd/"><font size="3">Unicode Character Database</font></a>
30 <td colspan="2" class="gray"> </td>
34 <h1>UnicodeĀ® NamesList File Format</h1>
35 <table class="simple" width="90%">
38 <td valign="top" width="144">Revision</td>
39 <td valign="top">15.0.0</td>
42 <td valign="top" width="144">Authors</td>
43 <td valign="top">Asmus Freytag, Ken Whistler</td>
46 <td valign="top" width="144">Date</td>
47 <td valign="top">2022-08-08</td>
50 <td valign="top" width="144">This Version</td>
52 <a href="http://www.unicode.org/Public/15.0.0/ucd/NamesList.html">
53 http://www.unicode.org/Public/15.0.0/ucd/NamesList.html</a></td>
56 <td valign="top" width="144">Previous Version</td>
58 <a href="http://www.unicode.org/Public/14.0.0/ucd/NamesList.html">
59 http://www.unicode.org/Public/14.0.0/ucd/NamesList.html</a></td>
62 <td valign="top" width="144">Latest Version</td>
63 <td valign="top"><a href="http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html">http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html</a></td>
68 <h3><i>Summary</i></h3>
70 <p>This file describes the format and contents of NamesList.txt</p>
72 <h3><i>Status</i></h3>
74 <p><i>The file and the files described herein are part of the <a href="http://www.unicode.org/ucd/">Unicode
75 Character Database</a> (UCD). The Unicode <a href="http://www.unicode.org/terms_of_use.html">
76 Terms of Use</a> apply.</i></p>
80 <h2>1.0 <a name="Introduction" href="#Introduction">Introduction</a></h2>
82 <p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
83 text file used to drive the layout of the character code charts in the Unicode
84 Standard. The information in this file is a combination of several fields from
85 the UnicodeData.txt and Blocks.txt files, together with additional annotations
86 for many characters.</p>
87 <p>This document describes the syntax rules for the file
88 format, but also gives brief information on how each construct is rendered
89 when laid out for the code charts. Some of the syntax elements are used only in
90 preparation of the drafts of the code charts and are not present in the final,
91 released form of the NamesList.txt file.</p>
93 <p>Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode
94 5.0. The syntax for marginal sidebar comments is utilized extensively in
95 draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
96 declaration in a comment at the head of the file were introduced after Unicode
97 6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
98 in comments and aliases in the names list format was loosened from the prior
99 limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</p>
101 <p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC
102 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
103 information in the name list file that is not needed (and in fact removed
104 during parsing) for the Unicode code charts.</p>
106 <p>With access to the layout program (<a href="http://www.unicode.org/unibook/">Unibook</a>) it is a simple matter of
107 creating name lists for the purpose of formatting working drafts or other documents containing
108 proposed characters.</p>
109 <p>The content of the NamesList.txt file is optimized for code chart creation.
110 Some information that can be inferred by the reader from context has been
111 suppressed to make the code charts more readable. See the chapter on Code
112 Charts in the <a href="http://www.unicode.org/versions/latest">Unicode
115 <h3>1.1 <a name="Overview" href="#Overview">NamesList File Overview</a></h3>
117 <p>The NamesList files are plain text files which in their most simple form look
120 <p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br>
121 ; this is a file comment (ignored)<br>
122 0020<tab>SPACE<br>
123 0021<tab>EXCLAMATION MARK<br>
124 0022<tab>QUOTATION MARK<br>
126 007F<tab>DELETE</p>
128 <p>The semicolon (as first character), @ and <tab> characters are used
129 by the file syntax and must be provided as shown. Hexadecimal digits must be
130 in UPPERCASE. A double @@ introduces a block header, with the title, and
131 start and ending code of the block provided as shown.</p>
133 <p>For a minimal name list, only the NAME_LINE and BLOCKHEADER and
134 their constituent syntax elements are needed.</p>
136 <p>The full syntax with all the options is provided in the following sections.</p>
138 <h2>2.0 <a name="FileStructure" href="#FileStructure">NamesList File Structure</a></h2>
140 <p>This section defines the overall file structure</p>
142 <pre><strong>NAMELIST: TITLE_PAGE* EXTENDED_BLOCK*
144 <strong>TITLE_PAGE: TITLE
145 | TITLE_PAGE SUBTITLE
146 | TITLE_PAGE SUBHEADER
147 | TITLE_PAGE IGNORED_LINE
148 | TITLE_PAGE EMPTY_LINE
149 | TITLE_PAGE NOTICE_LINE
150 | TITLE_PAGE COMMENT_LINE
151 | TITLE_PAGE PAGEBREAK
152 | TITLE_PAGE FILE_COMMENT
156 EXTENDED_BLOCK: BLOCK
161 | BLOCKHEADER INDEX_TAB
173 CHAR_ENTRY: NAME_LINE | RESERVED_LINE
174 | CHAR_ENTRY ALIAS_LINE
175 | CHAR_ENTRY FORMALALIAS_LINE
176 | CHAR_ENTRY COMMENT_LINE
177 | CHAR_ENTRY CROSS_REF
178 | CHAR_ENTRY DECOMPOSITION
179 | CHAR_ENTRY COMPAT_MAPPING
180 | CHAR_ENTRY IGNORED_LINE
181 | CHAR_ENTRY EMPTY_LINE
182 | CHAR_ENTRY NOTICE_LINE
183 | CHAR_ENTRY FILE_COMMENT
184 | CHAR_ENTRY VARIATION_LINE
187 <p>In other words:</p>
189 Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
190 <p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE,
191 EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.</p>
193 <li>CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, VARIATION_LINE, ALIAS and FORMALALIAS_LINE lines
194 occurring before the first block header are treated as if they were
197 <p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted
198 sequence of the following lines may occur (in any order and repeated as often
199 as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE,
200 EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.</p>
202 <li>The conventional order of elements in a char entry: NAME_LINE,
203 FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally
204 ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program
205 (<a href="http://www.unicode.org/unibook/">Unibook</a>). </li>
207 <p>Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and
208 FILE_COMMENT, none of these lines may
209 occur in any other place.</p>
211 <li>A NOTICE_LINE or CROSS_REF displays differently depending on whether it follows a header or title
212 or is part of a CHAR_ENTRY</li>
214 <p>A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY.
215 A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may
216 appear after any block header.</p>
217 <p>If the first line of a file is a file comment, it may contain a UTF-8
218 charset declaration (see below). Alternatively, or in addition, a BOM may be
219 present at the very beginning of the file, forcing the encoding to be
220 interpreted as UTF-16 (little-endian only) or UTF-8. When
221 declared as UTF-8, the names list format will support use of characters in
222 the range U+0020..U+02FF in LINE and LABEL elements. Otherwise,
223 the supported repertoire is limited to Latin-1, and attempted use of characters outside
224 the Latin-1 range will result in data corruption.</p>
225 <p>Several of these elements, while part of the formal definition of the
226 file format, do not occur in final published versions of
227 NamesList.txt in the UCD.</p>
229 <h4>Blocks followed by Summaries</h4>
230 <p>A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:</p>
232 SUMMARY: ALTGLYPH_SUMMARY
234 | ALTGLYPH_SUMMARY VARIATION_SUMMARY
237 ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER
238 | ALTGLYPH_SUMMARY SUMMARY_LINE
240 VARIATION_SUMMARY: VARIATION_SUBHEADER
241 | VARIATION_SUMMARY SUMMARY_LINE
243 MIXED_SUMMARY: MIXED_SUBHEADER
244 | MIXED_SUMMARY SUMMARY_LINE
246 SUMMARY_LINE: SUBHEADER
249 | EMPTY_LINE</strong>
252 <p>When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements
253 of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and
254 preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately
255 follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are
256 interspersed between items in the summary.</p>
258 <p>These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are
259 omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements
260 as described below, Unibook will automatically generate any required summaries using a default format for the headers.</p>
262 <p>Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to
263 provide specific contents for these summary titles as well as allow the ability to add additional
264 information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list
265 is machine generated and will always explicitly provide any summary subheaders.</p>
267 <h3>2.1 <a name="FileElements" href="#FileElements">NamesList File Elements</a></h3>
269 <p>This section provides the details of the syntax for the individual elements.</p>
271 <pre><strong>ELEMENT SYNTAX</strong> // How rendered
273 <strong>NAME_LINE: CHAR TAB NAME LF</strong>
274 // The CHAR and the corresponding image are echoed,
275 // followed by the name as given in NAME
277 <strong> | CHAR TAB "<" LCNAME ">" LF</strong>
278 // Control and noncharacters use this form of
279 // lowercase, bracketed pseudo character name
281 <strong> | CHAR TAB NAME SP COMMENT LF</strong>
282 // Names may have a comment, which is stripped off
283 // unless the file is parsed for an ISO style list
285 <strong> | CHAR TAB "<" LCNAME ">" SP COMMENT LF</strong>
286 // Control and noncharacters may also have comments
288 <strong>RESERVED_LINE: CHAR TAB "<reserved>" LF</strong>
289 // The CHAR is echoed followed by an icon for the
290 // reserved character and a fixed string e.g. "<reserved>"
292 <strong>COMMENT_LINE: TAB "*" SP EXPAND_LINE</strong>
293 // * is replaced by BULLET, output line as comment
295 <strong> | TAB EXPAND_LINE</strong>
296 // Output line as comment
298 <strong>ALIAS_LINE: TAB "=" SP LINE</strong>
299 // Replace = by itself, output line as alias
301 <strong>FORMALALIAS_LINE:
302 TAB "%" SP NAME LF</strong>
303 // Replace % by U+203B, output line as formal alias
305 <strong>CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
306 | TAB "x" SP CHAR SP "<" LCNAME ">" LF</strong>
307 // x is replaced by a right arrow
309 <strong> | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
310 | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF</strong>
311 // x is replaced by a right arrow;
312 // (second type as used for control and noncharacters)
314 // In the forms with parentheses the "(","-" and ")" are removed
315 // and the order of CHAR and LCNAME is reversed;
316 // i.e. all inputs result in the same order of output
318 <strong> | TAB "x" SP CHAR LF</strong>
319 // x is replaced by a right arrow
320 // (this type is the only one without LCNAME
321 // and is used for ideographs)
323 <strong>VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
324 | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")"LF</strong>
325 // output standardized variation sequence or simply the char code in case of alternate
326 // glyphs, followed by the alternate glyph or variation glyph and the label and context
328 <strong>FILE_COMMENT: ";" LINE</strong>
330 <strong>EMPTY_LINE: LF</strong>
331 // Empty and ignored lines as well as
332 // file comments are ignored
334 <strong>IGNORED_LINE: TAB ";" LINE</strong>
337 <strong>SIDEBAR_LINE: ";;" LINE</strong>
338 // Output LINE as marginal note
340 <strong>DECOMPOSITION: TAB ":" SP EXPAND_LINE
341 | TAB ":" SP "<" TAG ">" SP EXPAND_LINE</strong>
342 // Replace ':' by EQUIV, expand line into decomposition
343 // The <tag> gives optional information,
344 // e.g., about composition exclusion.
345 // by convention the tag has initial lowercase
347 <strong>COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
348 | TAB "#" SP "<" TAG ">" SP EXPAND_LINE</strong>
349 // Replace '#' by APPROX, output line as mapping
350 // The <tag> is the optional compatibility decomposition tag.
351 // by convention the tag has initial lowercase
353 <strong>NOTICE_LINE: "@+" TAB LINE</strong>
354 // Output LINE as notice
356 <strong> | "@+" TAB * SP LINE</strong>
357 // Output LINE as notice
358 // "*" expands to a bullet character
359 // Notices following a character code apply to the
360 // character and are indented. Notices not following
361 // a character code apply to the page/block/column
362 // and are italicized, but not indented
364 <strong>TITLE: "@@@" TAB LINE</strong>
365 // Output LINE as text
366 // Title is used in page headers
368 <strong>SUBTITLE: "@@@+" TAB LINE</strong>
369 // Output LINE as subtitle
371 <strong>SUBHEADER: "@" TAB LINE</strong>
372 // Output LINE as column header
374 <strong>VARIATION_SUBHEADER:</strong> <strong>"@~" TAB LINE</strong>
375 // Output LINE as column header (summary subheader)
376 <strong>| "@~"</strong>
377 // Output a default standard variation sequences summary subheader
378 <strong>| "@~" TAB "!"</strong>
379 // Suppress output of a default standard variant sequences summary subheader
380 // and disable display of summary
381 <strong>| "@~" TAB "!" VARSEL_LIST</strong>
382 <strong>| "@~" TAB "!" VARSEL_LIST LINE</strong>
383 // Output a standard summary subheader, using default or LINE respectively
384 // Suppress any std variation sequences using selectors from the list
386 <strong>ALTGLYPH_SUBHEADER:</strong> <strong>"@@~" TAB LINE</strong>
387 // Output LINE as column header (summary subheader)
388 <strong>| "@@~"</strong>
389 // Output a default alternate glyph summary subheader
390 <strong>| "@@~" TAB "!"</strong>
391 // Suppress output of a default alternate glyph summary subheader
392 // and disable display of summary
394 <strong>MIXED_SUBHEADER: </strong><strong>"@@@~" TAB LINE</strong>
395 // Output LINE as column header (summary subheader)
396 <strong>| "@@@~"</strong>
397 // Output a default combined variation and alternate glyph summary subheader
398 <strong>| "@@@~" TAB "!"</strong>
399 // Suppress output of a default alternate glyph summary subheader
400 // and disable display of summary
401 <strong>| "@@@~" TAB "!" VARSEL_LIST</strong>
402 <strong>| "@@@~" TAB "!" VARSEL_LIST LINE</strong>
403 // Output a combined summary subheader, using default or LINE respectively
404 // Suppress any std variation sequences using selectors from the list
406 <strong>BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF</strong>
407 // Cause a page break and optional
408 // blank page, then output one or more charts
409 // followed by the list of character names.
410 // Use BLOCKSTART and BLOCKEND to define
411 // what characters belong to a block.
412 // Use BLOCKNAME in page and table headers
414 <strong>BLOCKNAME: LABEL
415 | LABEL SP "(" LABEL ")"</strong>
416 // If an alternate label is present it replaces
417 // the BLOCKNAME when an ISO-style names list is
418 // laid out; it is ignored in the Unicode charts
420 <strong>BLOCKSTART: CHAR</strong> // First character position in block
421 <strong>BLOCKEND: CHAR</strong> // Last character position in block
422 <strong>PAGEBREAK: "@@"</strong> // Insert a (column) break
423 <strong>INDEX_TAB: "@@+"</strong> // Start a new index tab at latest BLOCKSTART
425 <strong>EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF</strong>
426 // Instances of CHAR (see Notes) are replaced by
427 // CHAR NBSP x NBSP where x is the single Unicode
428 // character corresponding to CHAR.
429 // If character is combining, it is replaced with
430 // CHAR NBSP <circ> x NBSP where <circ> is the
436 <li>Blocks must be aligned on 16-code point boundary and contain an integer
437 multiple of 16-code point columns. The exception to that rule is for blocks of
438 ideographs, <i>etc.</i>, for which no names are listed in the file. The BLOCKEND for such blocks
439 must correspond to the last assigned character, and not the actual end of the block.</li>
440 <li>Blocks must be non-overlapping and in ascending order. NAME_LINEs
441 must be in ascending order and follow the block header for the block to
442 which they belong. </li>
443 <li>Reserved entries are optional, and will normally be supplied automatically. They are
444 required whenever followed by ALIAS_LINE, COMMENT_LINE, NOTICE_LINE or CROSS_REF.
446 <li>An empty alternative glyph summary subheader expression will result in default header "Selected Alternative Glyphs"</li>
447 <li>An empty standard variation subheader expression will result in the default header "Standardized Variation Sequences"</li>
448 <li> A VARSEL_LIST may only contain code points for standard variation selectors (including script specific ones)</li>
449 <li>When displaying a VARIATION_LINE for alternate glyphs, the "ALTn" selector is not displayed. </li>
450 <li>If a glyph is unavailable for the variant glyph in a VARIATION_LINE it is replaced by the glyph for U+2591 LIGHT SHADE.</li>
454 <h3>2.2 <a name="FilePrimitives" href="#FilePrimitives">NamesList File Primitives</a></h3>
456 <p>The following are the primitives and terminals for the NamesList syntax.</p>
458 <pre><strong>LINE</strong>: <strong>STRING LF
459 COMMENT: "(" LABEL ")"
460 | "(" LABEL ")" SP "*"
461 | "*"</strong>
463 <strong>NAME</strong>: <sequence of uppercase ASCII letters, digits, space and hyphen>
464 <strong>LCNAME</strong>: <sequence of lowercase ASCII letters, digits, space and hyphen>
465 <strong>| LCNAME "-" CHAR</strong>
467 <strong>TAG</strong>: <sequence of ASCII letters>
468 <strong>LCTAG</strong>: <sequence of lowercase ASCII letters>
469 <strong>STRING</strong>: <sequence of characters in the range U+0020..U+02FF, except controls>
470 <strong>LABEL</strong>: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
471 <strong>VARSEL</strong>: <strong>CHAR
472 | ALT ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )</strong>
473 <strong>VARSEL_LIST</strong>: <strong>"{" CHAR_LIST "}"</strong>
474 <strong>CHAR_LIST</strong>: <strong>CHAR
475 | CHAR_LIST SP CHAR</strong>
476 <strong>CHAR</strong>: <strong>X X X X</strong>
477 <strong>| X X X X X </strong>
478 <strong>| X X X X X X </strong>
479 <strong>X</strong>: <strong>"0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"</strong>
480 <strong>ESC_CHAR</strong>: <strong>ESC CHAR</strong>
481 <strong>ESC</strong>: <strong>"\"</strong>
482 // Special semantics of backslash (\) are supported
483 // only in EXPAND_LINE.
484 <strong>TAB</strong>: <sequence of one or more ASCII tab characters 0x09>
485 <strong>SP</strong>: <ASCII 20>
486 <strong>LF</strong>: <any sequence of ASCII 0A and 0D>
491 <li>Multiple or leading spaces, multiple or leading hyphens, as well as
492 word-initial digits in NAMEs or LCNAMEs are illegal.</li>
493 <li>The French version of the names list uses French rules, which allow
494 apostrophe and accented letters in character names.</li>
495 <li>When names containing code points are lowercased to make them LCNAMEs,
496 the code point values remain uppercase. Such code points by convention
497 follow a hyphen and are the last element in the name.</li>
498 <li>Special lookahead logic prevents a 4 digit number for a standard, such
499 as ISO 9999 from being misinterpreted as ISO CHAR. Currently recognized are
500 "ISO", "DIN", "IEC" and "S X" as well as "S C" for the JIS X and JIS C series of
501 standards. For other standards, or for four-digit years in a comment, use a
502 NOTICE_LINE instead, which prevents expansion, or use '\" to escape the digits.</li>
503 <li>Single and double straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
504 Smart apostrophes are supported, but nested quotes are not.
505 Single quotes can only be applied around a single word.</li>
506 <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the
507 code value is not echoed.</li>
508 <li>Inside an EXPAND_LINE, backslash is treated as an escape character that
509 removes the special meaning of any literal character and also prevents
510 the following digit sequence from being expanded. A backslash character in
511 isolation is never displayed. A sequence of two backslash characters results
512 in display of a single backslash, but has no effect on the interpretation
513 of following characters.</li>
514 <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
516 <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a
517 FILE_COMMENT containing the declaration "UTF-8" or any casemap variation
518 thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
519 detecting the charset declaration (typically: "; charset=utf-8") the
520 remainder of that comment is ignored.
521 If the file is not encoded as
522 UTF-8, the character repertoire for running text (anything
523 other than CHAR) is effectively restricted to the repertoire of Latin-1.
524 Otherwise, characters in the range U+0020..U+02FF
525 are allowed in STRING or LABEL elements, and elements derived from them.</li>
526 <li>The code chart layout program
527 (<a href="http://www.unicode.org/unibook/">Unibook</a>)
528 can accept files in several other formats. These include little-endian UTF-16,
529 prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.</li>
530 <li>While the format allows multiple <tab> characters, by convention the
531 actual number of tabs is always one or two, chosen to provide the best
532 layout of the plain text file.</li>
533 <li>Earlier published versions of the NamesList.txt file may contain trailing or otherwise extraneous
534 spaces or tab characters; while these are errors in the files, they are not
535 being corrected, to retain stability of the published versions. Anyone
536 writing a parser for older versions of this file may need to be prepared to
537 handle such exceptions.</li>
538 <li>The final LF in the file must be present.</li>
540 <h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
542 <p><b>Version 15.0.0</b></p>
544 <li>Reissued for Unicode 15.0.0.</li>
546 <p><b>Version 14.0.0</b></p>
548 <li>Reissued for Unicode 14.0.0.</li>
549 <li>Corrected character name LIGHT SCREEN to LIGHT SHADE.</li>
551 <p><b>Version 13.0.0</b></p>
553 <li>Reissued for Unicode 13.0.0.</li>
554 <li>Added a second expansion for DECOMPOSITION, for possible future
555 use to designate specific subtypes of canonical decompositions
556 in the names list output.</li>
558 <p><b>Version 12.1.0</b></p>
560 <li>Reissued for Unicode 12.1.0.</li>
562 <p><b>Version 12.0.0</b></p>
564 <li>Reissued for Unicode 12.0.0.</li>
565 <li>Added definition of TAG (allowing uppercase letters), distinct from LCTAG.</li>
566 <li>Corrected definition of VARIATION_LINE to use LCTAG instead of LCNAME.</li>
567 <li>Corrected definition of COMPAT_MAPPING to use TAG instead of LCTAG.</li>
568 <li>Corrected the documentation regarding which elements allow use of characters
569 in the range U+0020..U+02FF.</li>
571 <p><b>Version 11.0.0</b></p>
573 <li>Reissued for Unicode 11.0.0.</li>
574 <li>Loosened the limitation on repertoire allowed in LINE and LABEL
575 elements to include characters outside Latin-1, in the range
578 <p><b>Version 10.0.0</b></p>
580 <li>Reissued for Unicode 10.0.0.</li>
582 <p><b>Version 9.0.0</b></p>
584 <li>Reissued for Unicode 9.0.0.</li>
586 <p><b>Version 8.0.0</b></p>
588 <li>Reissued for Unicode 8.0.0.</li>
589 <li>Added MIXED_SUBHEADER, VARSEL_LIST, and CHAR_LIST to the syntax.</li>
590 <li>Tweaked BNF and notes for variation summaries.</li>
592 <p><b>Version 7.0.0</b></p>
594 <li>Reissued for Unicode 7.0.0.</li>
596 <p><b>Version 6.3.0</b></p>
598 <li>Reissued for Unicode 6.3.0.</li>
600 <p><b>Version 6.2.0</b></p>
602 <li>Edited the variation syntax definitions, description and corresponding notes for wording.</li>
603 <li>Minor tweaks to the layout of BNF syntax, mostly adding tabs and | characters as needed.</li>
604 <li>Fixed some typographical errors and minor inconsistencies.</li>
605 <li>Added syntax for elements required by variation sequence and alternate glyph summaries.</li>
606 <li>Edited and reformatted some notes for readability.</li>
607 <li>Documented the permitted presence of CROSS_REF outside character entries within blocks.
608 Such CROSS_REFs have been present in published names lists, but that information was missing in
609 the syntax description. For an example see the Currency Symbols block in the code charts.</li>
610 <li>Added description of UTF-8 charset declaration and file encoding.</li>
612 <p><b>Version 6.1.0</b></p>
614 <li>Removed constraint that LCTAG consist only of lowercase letters,
615 because of the existence of the "noBreak" tag.</li>
617 <p><b>Version 6.0.0</b></p>
619 <li>Added definitions for ESC_CHAR and ESC primitives.</li>
620 <li>Clarified interpretation of backslash escapes in EXPAND_LINE.</li>
622 <p><b>Version 5.2.0</b></p>
624 <li>Better aligned the rules section with the actual published files and
625 behavior of existing parsers. This included fixing some obvious typos
626 and clarifying some notes as well as the following changes, which are
627 listed individually.</li>
628 <li>Replaced instances of <tab> by TAB throughout.</li>
629 <li>NAME_LINE for special names may have trailing COMMENTs including COMMENTs
630 consisting entirely of "*".</li>
631 <li>In CROSS_REF added the form without LCNAME, fixed the literal to the
632 correct lowercase "x" and noted that LCNAME may have "<" and ">" around
633 it in the data. Also added missing LF in the rules.</li>
634 <li>Removed a redundant rule for BLOCKHEADER.</li>
635 <li>Changed FORMALALIAS_LINE from LINE to NAME to match actual restriction
637 <li>Extended the documentation of lookahead logic for CHAR.</li>
638 <li>Accounted for FILE_COMMENT in overall file structure.</li>
640 <p><b>Version 5.1.0</b></p>
642 <li>Noted that comments in NAME_LINEs must be preceded by SP.</li>
643 <li>Provided additional information on allowable characters in names.</li>
644 <li>Added SIDEBAR_LINE.</li>
645 <li>Noted that CROSS_REF must contain a SP and CHAR, and that
646 COMPAT_MAPPING must contain a SP and may contain a <tag></li>
647 <li>Noted that LCNAME may contain uppercase characters under
648 exceptional circumstances.</li>
649 <li>Relaxed the restriction on lines starting with #, :, %, x and = on
650 the TITLE_PAGE. These are now treated as comments.</li>
652 <p><b>Version 5.0.0</b></p>
654 <li>Added FORMALALIAS_LINE and INDEX_TAB to syntax.</li>
655 <li>Fixed the list of lines that may appear before a BLOCKHEADER by
656 adding NOTICE_LINE.</li>
657 <li>Minor fixes to the wording of several syntax definitions.</li>
659 <p><b>Version 4.0.0</b></p>
661 <li>Fixed syntax to better reflect restrictions on characters
662 in character and block names.</li>
663 <li>Better document treatment of comments in block names, plus
664 French name rules.</li>
666 <p><b>Version 3.2.0</b></p>
668 <li>Fixed several broken links, added a left margin,
669 changed version numbering.</li>
671 <p><b>Version 3.1.0 (2)</b></p>
673 <li>Use of 4-6 digit hex notation is now supported.</li>
678 <table cellspacing="0" cellpadding="0" border="0">
680 <td><a href="https://www.unicode.org/copyright.html">
681 <img src="https://www.unicode.org/img/hb_notice.gif" border="0" alt="Access to Copyright and terms of use" width="216" height="50"></a></td>