+++ /dev/null
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
-
- "http://www.w3.org/TR/html4/loose.dtd">
-
-<html>
-
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
-<meta http-equiv="Content-Language" content="en-us">
-<title>UCD: Unicode NamesList File Format</title>
-<link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
-</head>
-
-<body>
-
- <table class="header">
- <tr>
- <td class="icon" style="width:38px; height:35px">
- <a href="https://www.unicode.org/">
- <img border="0" src="https://www.unicode.org/webscripts/logo60s2.gif" align="middle"
- alt="[Unicode]" width="34" height="33"></a>
- </td>
-
- <td class="icon" style="vertical-align:middle">
- <a class="bar"> </a>
- <a class="bar" href="https://www.unicode.org/ucd/"><font size="3">Unicode Character Database</font></a>
- </td>
- </tr>
- <tr>
- <td colspan="2" class="gray"> </td>
- </tr>
- </table>
-<div class="body">
- <h1>UnicodeĀ® NamesList File Format</h1>
- <table class="simple" width="90%">
- <tbody>
- <tr>
- <td valign="top" width="144">Revision</td>
- <td valign="top">15.0.0</td>
- </tr>
- <tr>
- <td valign="top" width="144">Authors</td>
- <td valign="top">Asmus Freytag, Ken Whistler</td>
- </tr>
- <tr>
- <td valign="top" width="144">Date</td>
- <td valign="top">2022-08-08</td>
- </tr>
- <tr>
- <td valign="top" width="144">This Version</td>
- <td valign="top">
- <a href="http://www.unicode.org/Public/15.0.0/ucd/NamesList.html">
- http://www.unicode.org/Public/15.0.0/ucd/NamesList.html</a></td>
- </tr>
- <tr>
- <td valign="top" width="144">Previous Version</td>
- <td valign="top">
- <a href="http://www.unicode.org/Public/14.0.0/ucd/NamesList.html">
- http://www.unicode.org/Public/14.0.0/ucd/NamesList.html</a></td>
- </tr>
- <tr>
- <td valign="top" width="144">Latest Version</td>
- <td valign="top"><a href="http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html">http://www.unicode.org/Public/UCD/latest/ucd/NamesList.html</a></td>
- </tr>
- </tbody>
- </table>
- <p> </p>
- <h3><i>Summary</i></h3>
- <blockquote>
- <p>This file describes the format and contents of NamesList.txt</p>
- </blockquote>
- <h3><i>Status</i></h3>
- <blockquote>
- <p><i>The file and the files described herein are part of the <a href="http://www.unicode.org/ucd/">Unicode
- Character Database</a> (UCD). The Unicode <a href="http://www.unicode.org/terms_of_use.html">
- Terms of Use</a> apply.</i></p>
- </blockquote>
- <hr width="50%">
-
-<h2>1.0 <a name="Introduction" href="#Introduction">Introduction</a></h2>
-
-<p>The Unicode name list file NamesList.txt (also NamesList.lst) is a plain
-text file used to drive the layout of the character code charts in the Unicode
-Standard. The information in this file is a combination of several fields from
-the UnicodeData.txt and Blocks.txt files, together with additional annotations
-for many characters.</p>
-<p>This document describes the syntax rules for the file
-format, but also gives brief information on how each construct is rendered
-when laid out for the code charts. Some of the syntax elements are used only in
-preparation of the drafts of the code charts and are not present in the final,
-released form of the NamesList.txt file.</p>
-
-<p>Over time, the syntax has been extended by adding new features. The syntax for formal aliases and index tabs was introduced with Unicode
-5.0. The syntax for marginal sidebar comments is utilized extensively in
-draft versions of the NamesList.txt file. The support for UTF-8 encoded files and the syntax for the UTF-8 charset
-declaration in a comment at the head of the file were introduced after Unicode
-6.1.0 was published, as was the syntax for the specification of variation sequences and alternate glyphs and their respective summaries. The repertoire restriction
-in comments and aliases in the names list format was loosened from the prior
-limitation to U+0020..U+00FF, to include the wider range U+0020..U+02FF, as of Unicode 11.0.</p>
-
-<p>The same input file can be used for the preparation of drafts and final editions for ISO/IEC
- 10646. Earlier versions of that standard used a different style, referred to below as ISO-style. That style necessitated the presence of some
- information in the name list file that is not needed (and in fact removed
- during parsing) for the Unicode code charts.</p>
-
-<p>With access to the layout program (<a href="http://www.unicode.org/unibook/">Unibook</a>) it is a simple matter of
-creating name lists for the purpose of formatting working drafts or other documents containing
-proposed characters.</p>
- <p>The content of the NamesList.txt file is optimized for code chart creation.
- Some information that can be inferred by the reader from context has been
- suppressed to make the code charts more readable. See the chapter on Code
- Charts in the <a href="http://www.unicode.org/versions/latest">Unicode
- Standard</a>.</p>
-
-<h3>1.1 <a name="Overview" href="#Overview">NamesList File Overview</a></h3>
-
-<p>The NamesList files are plain text files which in their most simple form look
-like this:</p>
-
-<p>@@<tab>0020<tab>BASIC LATIN<tab>007F<br>
-; this is a file comment (ignored)<br>
-0020<tab>SPACE<br>
-0021<tab>EXCLAMATION MARK<br>
-0022<tab>QUOTATION MARK<br>
-. . . <br>
-007F<tab>DELETE</p>
-
-<p>The semicolon (as first character), @ and <tab> characters are used
-by the file syntax and must be provided as shown. Hexadecimal digits must be
-in UPPERCASE. A double @@ introduces a block header, with the title, and
-start and ending code of the block provided as shown.</p>
-
-<p>For a minimal name list, only the NAME_LINE and BLOCKHEADER and
-their constituent syntax elements are needed.</p>
-
-<p>The full syntax with all the options is provided in the following sections.</p>
-
-<h2>2.0 <a name="FileStructure" href="#FileStructure">NamesList File Structure</a></h2>
-
-<p>This section defines the overall file structure</p>
-
-<pre><strong>NAMELIST: TITLE_PAGE* EXTENDED_BLOCK*
-</strong>
-<strong>TITLE_PAGE: TITLE
- | TITLE_PAGE SUBTITLE
- | TITLE_PAGE SUBHEADER
- | TITLE_PAGE IGNORED_LINE
- | TITLE_PAGE EMPTY_LINE
- | TITLE_PAGE NOTICE_LINE
- | TITLE_PAGE COMMENT_LINE
- | TITLE_PAGE PAGEBREAK
- | TITLE_PAGE FILE_COMMENT
- | FILE_COMMENT
-
-
-EXTENDED_BLOCK: BLOCK
- | BLOCK SUMMARY
-
-
-BLOCK: BLOCKHEADER
- | BLOCKHEADER INDEX_TAB
- | BLOCK CHAR_ENTRY
- | BLOCK SUBHEADER
- | BLOCK NOTICE_LINE
- | BLOCK EMPTY_LINE
- | BLOCK IGNORED_LINE
- | BLOCK SIDEBAR_LINE
- | BLOCK PAGEBREAK
- | BLOCK FILE_COMMENT
- | BLOCK CROSS_REF
-
-
-CHAR_ENTRY: NAME_LINE | RESERVED_LINE
- | CHAR_ENTRY ALIAS_LINE
- | CHAR_ENTRY FORMALALIAS_LINE
- | CHAR_ENTRY COMMENT_LINE
- | CHAR_ENTRY CROSS_REF
- | CHAR_ENTRY DECOMPOSITION
- | CHAR_ENTRY COMPAT_MAPPING
- | CHAR_ENTRY IGNORED_LINE
- | CHAR_ENTRY EMPTY_LINE
- | CHAR_ENTRY NOTICE_LINE
- | CHAR_ENTRY FILE_COMMENT
- | CHAR_ENTRY VARIATION_LINE
-</strong></pre>
-
-<p>In other words:</p>
-<p>
- Neither TITLE nor SUBTITLE may occur after the first BLOCKHEADER. </p>
-<p>Only TITLE, SUBTITLE, SUBHEADER, PAGEBREAK, COMMENT_LINE, NOTICE_LINE,
- EMPTY_LINE, IGNORED_LINE and FILE_COMMENT may occur before the first BLOCKHEADER.</p>
-<ul>
- <li>CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, VARIATION_LINE, ALIAS and FORMALALIAS_LINE lines
- occurring before the first block header are treated as if they were
- COMMENT_LINEs.</li>
-</ul>
-<p>Directly following either a NAME_LINE or a RESERVED_LINE an uninterrupted
- sequence of the following lines may occur (in any order and repeated as often
- as needed): ALIAS_LINE, CROSS_REF, DECOMPOSITION, COMPAT_MAPPING, FORMALALIAS_LINE, NOTICE_LINE,
- EMPTY_LINE, IGNORED_LINE, VARIATION_LINE and FILE_COMMENT.</p>
-<ul>
- <li>The conventional order of elements in a char entry: NAME_LINE,
- FORMALALIAS_LINE, ALIAS, COMMENT_LINE or NOTICE_LINE, CROSS_REFs, VARIATION_LINE, and optionally
- ending in either DECOMPOSITION or COMPAT_MAPPING is not enforced by the layout program
- (<a href="http://www.unicode.org/unibook/">Unibook</a>). </li>
-</ul>
-<p>Except for CROSS_REF, NOTICE_LINE, SIDEBAR_LINE, EMPTY_LINE, IGNORED_LINE and
- FILE_COMMENT, none of these lines may
- occur in any other place.</p>
-<ul>
- <li>A NOTICE_LINE or CROSS_REF displays differently depending on whether it follows a header or title
- or is part of a CHAR_ENTRY</li>
- </ul>
-<p>A PAGEBREAK may appear anywhere, except the middle of a CHARACTER_ENTRY.
- A PAGEBREAK before the file title lines may not be supported. INDEX_TABs may
- appear after any block header.</p>
-<p>If the first line of a file is a file comment, it may contain a UTF-8
- charset declaration (see below). Alternatively, or in addition, a BOM may be
- present at the very beginning of the file, forcing the encoding to be
- interpreted as UTF-16 (little-endian only) or UTF-8. When
- declared as UTF-8, the names list format will support use of characters in
- the range U+0020..U+02FF in LINE and LABEL elements. Otherwise,
- the supported repertoire is limited to Latin-1, and attempted use of characters outside
- the Latin-1 range will result in data corruption.</p>
-<p>Several of these elements, while part of the formal definition of the
- file format, do not occur in final published versions of
- NamesList.txt in the UCD.</p>
-
-<h4>Blocks followed by Summaries</h4>
-<p>A block may be extended by a summary of standard variation sequences or selected alternate glyphs (or both) defined for characters in the block:</p>
-<pre><strong>
-SUMMARY: ALTGLYPH_SUMMARY
- | VARIATION SUMMARY
- | ALTGLYPH_SUMMARY VARIATION_SUMMARY
- | MIXED_SUMMARY
-
-ALTGLYPH_SUMMARY: ALTGLYPH_SUBHEADER
- | ALTGLYPH_SUMMARY SUMMARY_LINE
-
-VARIATION_SUMMARY: VARIATION_SUBHEADER
- | VARIATION_SUMMARY SUMMARY_LINE
-
-MIXED_SUMMARY: MIXED_SUBHEADER
- | MIXED_SUMMARY SUMMARY_LINE
-
-SUMMARY_LINE: SUBHEADER
- | NOTICE_LINE
- | FILE_COMMENT
- | EMPTY_LINE</strong>
-</pre>
-
-<p>When formatted for display, each summary will recap the information presented in the VARIATION_LINE elements
-of the preceding block, grouped by alternate glyph variants and standardized variation sequences, and
-preceded by the corresponding subheader. Additional SUBHEADER and NOTICE lines, if provided, immediately
-follow the ALTGLYPH_SUBHEADER, VARIATION_SUBHEADER or MIXED_SUBHEADER. There is no provision to provide subheaders that are
-interspersed between items in the summary.</p>
-
-<p>These syntax constructs are entirely optional. If the ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER are
-omitted from the names list, but the preceding block nevertheless contains VARIATION_LINE elements
-as described below, Unibook will automatically generate any required summaries using a default format for the headers.</p>
-
-<p>Thus, the main purpose for providing ALTGLYPH_SUBHEADER or VARIATION_SUBHEADER elements would be to
-provide specific contents for these summary titles as well as allow the ability to add additional
-information via SUBHEADER and NOTICE elements. The final published version of the Unicode names list
-is machine generated and will always explicitly provide any summary subheaders.</p>
-
-<h3>2.1 <a name="FileElements" href="#FileElements">NamesList File Elements</a></h3>
-
-<p>This section provides the details of the syntax for the individual elements.</p>
-
-<pre><strong>ELEMENT SYNTAX</strong> // How rendered
-
-<strong>NAME_LINE: CHAR TAB NAME LF</strong>
- // The CHAR and the corresponding image are echoed,
- // followed by the name as given in NAME
-
-<strong> | CHAR TAB "<" LCNAME ">" LF</strong>
- // Control and noncharacters use this form of
- // lowercase, bracketed pseudo character name
-
-<strong> | CHAR TAB NAME SP COMMENT LF</strong>
- // Names may have a comment, which is stripped off
- // unless the file is parsed for an ISO style list
-
-<strong> | CHAR TAB "<" LCNAME ">" SP COMMENT LF</strong>
- // Control and noncharacters may also have comments
-
-<strong>RESERVED_LINE: CHAR TAB "<reserved>" LF</strong>
- // The CHAR is echoed followed by an icon for the
- // reserved character and a fixed string e.g. "<reserved>"
-
-<strong>COMMENT_LINE: TAB "*" SP EXPAND_LINE</strong>
- // * is replaced by BULLET, output line as comment
-
-<strong> | TAB EXPAND_LINE</strong>
- // Output line as comment
-
-<strong>ALIAS_LINE: TAB "=" SP LINE</strong>
- // Replace = by itself, output line as alias
-
-<strong>FORMALALIAS_LINE:
- TAB "%" SP NAME LF</strong>
- // Replace % by U+203B, output line as formal alias
-
-<strong>CROSS_REF: TAB "x" SP CHAR SP LCNAME LF
- | TAB "x" SP CHAR SP "<" LCNAME ">" LF</strong>
- // x is replaced by a right arrow
-
-<strong> | TAB "x" SP "(" LCNAME SP "-" SP CHAR ")" LF
- | TAB "x" SP "(" "<" LCNAME ">" SP "-" SP CHAR ")" LF</strong>
- // x is replaced by a right arrow;
- // (second type as used for control and noncharacters)
-
- // In the forms with parentheses the "(","-" and ")" are removed
- // and the order of CHAR and LCNAME is reversed;
- // i.e. all inputs result in the same order of output
-
-<strong> | TAB "x" SP CHAR LF</strong>
- // x is replaced by a right arrow
- // (this type is the only one without LCNAME
- // and is used for ideographs)
-
-<strong>VARIATION_LINE: TAB "~" SP CHAR VARSEL SP LABEL LF
- | TAB "~" SP CHAR VARSEL SP LABEL "(" LCTAG ")"LF</strong>
- // output standardized variation sequence or simply the char code in case of alternate
- // glyphs, followed by the alternate glyph or variation glyph and the label and context
-
-<strong>FILE_COMMENT: ";" LINE</strong>
-
-<strong>EMPTY_LINE: LF</strong>
- // Empty and ignored lines as well as
- // file comments are ignored
-
-<strong>IGNORED_LINE: TAB ";" LINE</strong>
- // Ignore LINE
-
-<strong>SIDEBAR_LINE: ";;" LINE</strong>
- // Output LINE as marginal note
-
-<strong>DECOMPOSITION: TAB ":" SP EXPAND_LINE
- | TAB ":" SP "<" TAG ">" SP EXPAND_LINE</strong>
- // Replace ':' by EQUIV, expand line into decomposition
- // The <tag> gives optional information,
- // e.g., about composition exclusion.
- // by convention the tag has initial lowercase
-
-<strong>COMPAT_MAPPING: TAB "#" SP EXPAND_LINE
- | TAB "#" SP "<" TAG ">" SP EXPAND_LINE</strong>
- // Replace '#' by APPROX, output line as mapping
- // The <tag> is the optional compatibility decomposition tag.
- // by convention the tag has initial lowercase
-
-<strong>NOTICE_LINE: "@+" TAB LINE</strong>
- // Output LINE as notice
-
-<strong> | "@+" TAB * SP LINE</strong>
- // Output LINE as notice
- // "*" expands to a bullet character
- // Notices following a character code apply to the
- // character and are indented. Notices not following
- // a character code apply to the page/block/column
- // and are italicized, but not indented
-
-<strong>TITLE: "@@@" TAB LINE</strong>
- // Output LINE as text
- // Title is used in page headers
-
-<strong>SUBTITLE: "@@@+" TAB LINE</strong>
- // Output LINE as subtitle
-
-<strong>SUBHEADER: "@" TAB LINE</strong>
- // Output LINE as column header
-
-<strong>VARIATION_SUBHEADER:</strong> <strong>"@~" TAB LINE</strong>
- // Output LINE as column header (summary subheader)
- <strong>| "@~"</strong>
- // Output a default standard variation sequences summary subheader
- <strong>| "@~" TAB "!"</strong>
- // Suppress output of a default standard variant sequences summary subheader
- // and disable display of summary
- <strong>| "@~" TAB "!" VARSEL_LIST</strong>
- <strong>| "@~" TAB "!" VARSEL_LIST LINE</strong>
- // Output a standard summary subheader, using default or LINE respectively
- // Suppress any std variation sequences using selectors from the list
-
-<strong>ALTGLYPH_SUBHEADER:</strong> <strong>"@@~" TAB LINE</strong>
- // Output LINE as column header (summary subheader)
- <strong>| "@@~"</strong>
- // Output a default alternate glyph summary subheader
- <strong>| "@@~" TAB "!"</strong>
- // Suppress output of a default alternate glyph summary subheader
- // and disable display of summary
-
-<strong>MIXED_SUBHEADER: </strong><strong>"@@@~" TAB LINE</strong>
- // Output LINE as column header (summary subheader)
- <strong>| "@@@~"</strong>
- // Output a default combined variation and alternate glyph summary subheader
- <strong>| "@@@~" TAB "!"</strong>
- // Suppress output of a default alternate glyph summary subheader
- // and disable display of summary
- <strong>| "@@@~" TAB "!" VARSEL_LIST</strong>
- <strong>| "@@@~" TAB "!" VARSEL_LIST LINE</strong>
- // Output a combined summary subheader, using default or LINE respectively
- // Suppress any std variation sequences using selectors from the list
-
-<strong>BLOCKHEADER: "@@" TAB BLOCKSTART TAB BLOCKNAME TAB BLOCKEND LF</strong>
- // Cause a page break and optional
- // blank page, then output one or more charts
- // followed by the list of character names.
- // Use BLOCKSTART and BLOCKEND to define
- // what characters belong to a block.
- // Use BLOCKNAME in page and table headers
-
-<strong>BLOCKNAME: LABEL
- | LABEL SP "(" LABEL ")"</strong>
- // If an alternate label is present it replaces
- // the BLOCKNAME when an ISO-style names list is
- // laid out; it is ignored in the Unicode charts
-
-<strong>BLOCKSTART: CHAR</strong> // First character position in block
-<strong>BLOCKEND: CHAR</strong> // Last character position in block
-<strong>PAGEBREAK: "@@"</strong> // Insert a (column) break
-<strong>INDEX_TAB: "@@+"</strong> // Start a new index tab at latest BLOCKSTART
-
-<strong>EXPAND_LINE: {ESC_CHAR | CHAR | STRING | ESC +}+ LF</strong>
- // Instances of CHAR (see Notes) are replaced by
- // CHAR NBSP x NBSP where x is the single Unicode
- // character corresponding to CHAR.
- // If character is combining, it is replaced with
- // CHAR NBSP <circ> x NBSP where <circ> is the
- // dotted circle
-</pre>
-
-
- <b>Notes:</b><ul>
- <li>Blocks must be aligned on 16-code point boundary and contain an integer
- multiple of 16-code point columns. The exception to that rule is for blocks of
- ideographs, <i>etc.</i>, for which no names are listed in the file. The BLOCKEND for such blocks
- must correspond to the last assigned character, and not the actual end of the block.</li>
- <li>Blocks must be non-overlapping and in ascending order. NAME_LINEs
- must be in ascending order and follow the block header for the block to
- which they belong. </li>
- <li>Reserved entries are optional, and will normally be supplied automatically. They are
- required whenever followed by ALIAS_LINE, COMMENT_LINE, NOTICE_LINE or CROSS_REF.
- </li>
-<li>An empty alternative glyph summary subheader expression will result in default header "Selected Alternative Glyphs"</li>
-<li>An empty standard variation subheader expression will result in the default header "Standardized Variation Sequences"</li>
-<li> A VARSEL_LIST may only contain code points for standard variation selectors (including script specific ones)</li>
-<li>When displaying a VARIATION_LINE for alternate glyphs, the "ALTn" selector is not displayed. </li>
-<li>If a glyph is unavailable for the variant glyph in a VARIATION_LINE it is replaced by the glyph for U+2591 LIGHT SHADE.</li>
- </ul>
-
-
-<h3>2.2 <a name="FilePrimitives" href="#FilePrimitives">NamesList File Primitives</a></h3>
-
-<p>The following are the primitives and terminals for the NamesList syntax.</p>
-
-<pre><strong>LINE</strong>: <strong>STRING LF
-COMMENT: "(" LABEL ")"
- | "(" LABEL ")" SP "*"
- | "*"</strong>
-
-<strong>NAME</strong>: <sequence of uppercase ASCII letters, digits, space and hyphen>
-<strong>LCNAME</strong>: <sequence of lowercase ASCII letters, digits, space and hyphen>
- <strong>| LCNAME "-" CHAR</strong>
-
-<strong>TAG</strong>: <sequence of ASCII letters>
-<strong>LCTAG</strong>: <sequence of lowercase ASCII letters>
-<strong>STRING</strong>: <sequence of characters in the range U+0020..U+02FF, except controls>
-<strong>LABEL</strong>: <sequence of characters in the range U+0020..U+02FF, except controls, "(" or ")">
-<strong>VARSEL</strong>: <strong>CHAR
- | ALT ( "1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9" )</strong>
-<strong>VARSEL_LIST</strong>: <strong>"{" CHAR_LIST "}"</strong>
-<strong>CHAR_LIST</strong>: <strong>CHAR
- | CHAR_LIST SP CHAR</strong>
-<strong>CHAR</strong>: <strong>X X X X</strong>
- <strong>| X X X X X </strong>
- <strong>| X X X X X X </strong>
-<strong>X</strong>: <strong>"0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9"|"A"|"B"|"C"|"D"|"E"|"F"</strong>
-<strong>ESC_CHAR</strong>: <strong>ESC CHAR</strong>
-<strong>ESC</strong>: <strong>"\"</strong>
- // Special semantics of backslash (\) are supported
- // only in EXPAND_LINE.
-<strong>TAB</strong>: <sequence of one or more ASCII tab characters 0x09>
-<strong>SP</strong>: <ASCII 20>
-<strong>LF</strong>: <any sequence of ASCII 0A and 0D>
-</pre>
-
-<p><b>Notes:</b></p>
-<ul>
- <li>Multiple or leading spaces, multiple or leading hyphens, as well as
- word-initial digits in NAMEs or LCNAMEs are illegal.</li>
- <li>The French version of the names list uses French rules, which allow
- apostrophe and accented letters in character names.</li>
- <li>When names containing code points are lowercased to make them LCNAMEs,
- the code point values remain uppercase. Such code points by convention
- follow a hyphen and are the last element in the name.</li>
- <li>Special lookahead logic prevents a 4 digit number for a standard, such
- as ISO 9999 from being misinterpreted as ISO CHAR. Currently recognized are
- "ISO", "DIN", "IEC" and "S X" as well as "S C" for the JIS X and JIS C series of
- standards. For other standards, or for four-digit years in a comment, use a
- NOTICE_LINE instead, which prevents expansion, or use '\" to escape the digits.</li>
- <li>Single and double straight quotes in an EXPAND_LINE are replaced by curly quotes using English rules.
- Smart apostrophes are supported, but nested quotes are not.
- Single quotes can only be applied around a single word.</li>
- <li>A CHAR inside ' or " is expanded, but only its glyph image is printed, the
- code value is not echoed.</li>
- <li>Inside an EXPAND_LINE, backslash is treated as an escape character that
- removes the special meaning of any literal character and also prevents
- the following digit sequence from being expanded. A backslash character in
- isolation is never displayed. A sequence of two backslash characters results
- in display of a single backslash, but has no effect on the interpretation
- of following characters.</li>
- <li>The hyphen in a character range CHAR-CHAR is replaced by an EN DASH on
- output.</li>
- <li>The NamesList.txt file is encoded in UTF-8 if the <i>first line</i> is a
- FILE_COMMENT containing the declaration "UTF-8" or any casemap variation
- thereof. Otherwise the file is encoded in Latin-1 (older versions). Beyond
- detecting the charset declaration (typically: "; charset=utf-8") the
- remainder of that comment is ignored.
- If the file is not encoded as
- UTF-8, the character repertoire for running text (anything
- other than CHAR) is effectively restricted to the repertoire of Latin-1.
- Otherwise, characters in the range U+0020..U+02FF
- are allowed in STRING or LABEL elements, and elements derived from them.</li>
- <li>The code chart layout program
- (<a href="http://www.unicode.org/unibook/">Unibook</a>)
- can accept files in several other formats. These include little-endian UTF-16,
- prefixed with a BOM, or UTF-8 prefixed with the UTF-8 BOM.</li>
- <li>While the format allows multiple <tab> characters, by convention the
- actual number of tabs is always one or two, chosen to provide the best
- layout of the plain text file.</li>
- <li>Earlier published versions of the NamesList.txt file may contain trailing or otherwise extraneous
- spaces or tab characters; while these are errors in the files, they are not
- being corrected, to retain stability of the published versions. Anyone
- writing a parser for older versions of this file may need to be prepared to
- handle such exceptions.</li>
- <li>The final LF in the file must be present.</li>
-</ul>
- <h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
-
- <p><b>Version 15.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 15.0.0.</li>
- </ul>
- <p><b>Version 14.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 14.0.0.</li>
- <li>Corrected character name LIGHT SCREEN to LIGHT SHADE.</li>
- </ul>
- <p><b>Version 13.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 13.0.0.</li>
- <li>Added a second expansion for DECOMPOSITION, for possible future
- use to designate specific subtypes of canonical decompositions
- in the names list output.</li>
- </ul>
- <p><b>Version 12.1.0</b></p>
- <ul>
- <li>Reissued for Unicode 12.1.0.</li>
- </ul>
- <p><b>Version 12.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 12.0.0.</li>
- <li>Added definition of TAG (allowing uppercase letters), distinct from LCTAG.</li>
- <li>Corrected definition of VARIATION_LINE to use LCTAG instead of LCNAME.</li>
- <li>Corrected definition of COMPAT_MAPPING to use TAG instead of LCTAG.</li>
- <li>Corrected the documentation regarding which elements allow use of characters
- in the range U+0020..U+02FF.</li>
- </ul>
- <p><b>Version 11.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 11.0.0.</li>
- <li>Loosened the limitation on repertoire allowed in LINE and LABEL
- elements to include characters outside Latin-1, in the range
- U+0100..U+02FF.</li>
- </ul>
- <p><b>Version 10.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 10.0.0.</li>
- </ul>
- <p><b>Version 9.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 9.0.0.</li>
- </ul>
- <p><b>Version 8.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 8.0.0.</li>
- <li>Added MIXED_SUBHEADER, VARSEL_LIST, and CHAR_LIST to the syntax.</li>
- <li>Tweaked BNF and notes for variation summaries.</li>
- </ul>
- <p><b>Version 7.0.0</b></p>
- <ul>
- <li>Reissued for Unicode 7.0.0.</li>
- </ul>
- <p><b>Version 6.3.0</b></p>
- <ul>
- <li>Reissued for Unicode 6.3.0.</li>
- </ul>
- <p><b>Version 6.2.0</b></p>
- <ul>
- <li>Edited the variation syntax definitions, description and corresponding notes for wording.</li>
- <li>Minor tweaks to the layout of BNF syntax, mostly adding tabs and | characters as needed.</li>
- <li>Fixed some typographical errors and minor inconsistencies.</li>
- <li>Added syntax for elements required by variation sequence and alternate glyph summaries.</li>
- <li>Edited and reformatted some notes for readability.</li>
- <li>Documented the permitted presence of CROSS_REF outside character entries within blocks.
- Such CROSS_REFs have been present in published names lists, but that information was missing in
- the syntax description. For an example see the Currency Symbols block in the code charts.</li>
- <li>Added description of UTF-8 charset declaration and file encoding.</li>
- </ul>
- <p><b>Version 6.1.0</b></p>
- <ul>
- <li>Removed constraint that LCTAG consist only of lowercase letters,
- because of the existence of the "noBreak" tag.</li>
- </ul>
- <p><b>Version 6.0.0</b></p>
- <ul>
- <li>Added definitions for ESC_CHAR and ESC primitives.</li>
- <li>Clarified interpretation of backslash escapes in EXPAND_LINE.</li>
- </ul>
- <p><b>Version 5.2.0</b></p>
- <ul>
- <li>Better aligned the rules section with the actual published files and
- behavior of existing parsers. This included fixing some obvious typos
- and clarifying some notes as well as the following changes, which are
- listed individually.</li>
- <li>Replaced instances of <tab> by TAB throughout.</li>
- <li>NAME_LINE for special names may have trailing COMMENTs including COMMENTs
- consisting entirely of "*".</li>
- <li>In CROSS_REF added the form without LCNAME, fixed the literal to the
- correct lowercase "x" and noted that LCNAME may have "<" and ">" around
- it in the data. Also added missing LF in the rules.</li>
- <li>Removed a redundant rule for BLOCKHEADER.</li>
- <li>Changed FORMALALIAS_LINE from LINE to NAME to match actual restriction
- on contents.</li>
- <li>Extended the documentation of lookahead logic for CHAR.</li>
- <li>Accounted for FILE_COMMENT in overall file structure.</li>
- </ul>
- <p><b>Version 5.1.0</b></p>
- <ul>
- <li>Noted that comments in NAME_LINEs must be preceded by SP.</li>
- <li>Provided additional information on allowable characters in names.</li>
- <li>Added SIDEBAR_LINE.</li>
- <li>Noted that CROSS_REF must contain a SP and CHAR, and that
- COMPAT_MAPPING must contain a SP and may contain a <tag></li>
- <li>Noted that LCNAME may contain uppercase characters under
- exceptional circumstances.</li>
- <li>Relaxed the restriction on lines starting with #, :, %, x and = on
- the TITLE_PAGE. These are now treated as comments.</li>
- </ul>
- <p><b>Version 5.0.0</b></p>
- <ul>
- <li>Added FORMALALIAS_LINE and INDEX_TAB to syntax.</li>
- <li>Fixed the list of lines that may appear before a BLOCKHEADER by
- adding NOTICE_LINE.</li>
- <li>Minor fixes to the wording of several syntax definitions.</li>
- </ul>
- <p><b>Version 4.0.0</b></p>
- <ul>
- <li>Fixed syntax to better reflect restrictions on characters
- in character and block names.</li>
- <li>Better document treatment of comments in block names, plus
- French name rules.</li>
- </ul>
- <p><b>Version 3.2.0</b></p>
- <ul>
- <li>Fixed several broken links, added a left margin,
- changed version numbering.</li>
- </ul>
- <p><b>Version 3.1.0 (2)</b></p>
- <ul>
- <li>Use of 4-6 digit hex notation is now supported.</li>
- </ul>
- <hr width="50%">
- <div align="center">
- <center>
- <table cellspacing="0" cellpadding="0" border="0">
- <tr>
- <td><a href="https://www.unicode.org/copyright.html">
- <img src="https://www.unicode.org/img/hb_notice.gif" border="0" alt="Access to Copyright and terms of use" width="216" height="50"></a></td>
- </tr>
- </table>
- </center>
- </div>
-</div>
-
-</body>
-
-</html>
-