1 ! Copyright (C) 2009 Chris Double.
2 ! See https://factorcode.org/license.txt for BSD license.
3 USING: help.syntax help.markup peg peg.search words
8 { $syntax "EBNF[[ ...ebnf... ]]" }
9 { $values { "...ebnf..." "EBNF DSL text" } }
11 "Creates and calls a quotation that parses a string using the syntax "
12 "defined with the EBNF DSL. The quotation has stack effect "
13 { $snippet "( string -- ast )" } " where 'string' is the text to be parsed "
14 "and 'ast' is the resulting abstract syntax tree. If the parsing fails the "
15 "quotation throws an exception."
19 "USING: multiline prettyprint peg.ebnf ;"
20 "\"ab\" EBNF[[ rule=\"a\" \"b\" ]] ."
26 { $syntax "EBNF-PARSER: word \"...ebnf...\"" }
28 "Defines a word that when called will return a parser for the "
29 "syntax defined with the EBNF DSL. The parser can be used with "
30 "the " { $vocab-link "peg.search" } " vocab."
34 { $syntax "EBNF: word [=[ ...ebnf... ]=]" }
35 { $values { "word" word } { "...ebnf..." "EBNF DSL text" } }
37 "Defines a word that when called will parse a string using the syntax "
38 "defined with the EBNF DSL. The word has stack effect "
39 { $snippet "( string -- ast )" } " where 'string' is the text to be parsed "
40 "and 'ast' is the resulting abstract syntax tree. If the parsing fails the "
41 "word throws an exception."
45 "USING: prettyprint multiline peg.ebnf ;"
47 "EBNF: foo [=[ rule=\"a\" \"b\" ]=]"
53 ARTICLE: "peg.ebnf.strings" "EBNF Rule: Strings"
54 "A string in a rule will match that sequence of characters from the input string. "
55 "The string is delimited by matching single or double quotes. "
56 "Factor's escape sequences are interpreted: " { $link "escape" } ". "
57 "For double quotes delimiters, an escaped double quote doesn't terminate the string. "
58 "The AST result from the match is the string itself."
61 "USING: prettyprint peg.ebnf ;"
62 "\"helloworld\" EBNF[[ rule=\"hello\" \"world\" ]] ."
63 "V{ \"hello\" \"world\" }"
66 "USING: prettyprint peg.ebnf ;"
67 "\"AΣ𝄞\" EBNF[[ rule='\\x41' '\\u{greek-capital-letter-sigma}' '\\u01D11E' ]] ."
68 "V{ \"A\" \"Σ\" \"𝄞\" }"
71 "USING: io peg.ebnf ;"
72 "\"A double quote: \\\"\" EBNF[[ rule='A double quote: \"' ]] print"
76 "USING: io peg.ebnf ;"
77 "\"' and \\\"\" EBNF[[ rule=\"' and \\\"\" ]] print"
82 ARTICLE: "peg.ebnf.any" "EBNF Rule: Any"
83 "A full stop character (.) will match any single token in the input string. "
84 "The AST resulting from this is the token itself."
87 "USING: prettyprint peg.ebnf ;"
88 "\"abc\" EBNF[[ rule=\"a\" . \"c\" ]] ."
93 ARTICLE: "peg.ebnf.sequence" "EBNF Rule: Sequence"
94 "Any white space separated rule element is considered a sequence. Each rule "
95 "in the sequence is matched from the input stream, consuming the input as it "
96 "goes. The AST result is a vector containing the results of each rule element in "
100 "USING: prettyprint peg.ebnf ;"
101 "\"abbba\" EBNF[[ rule=\"a\" (\"b\")* \"a\" ]] ."
102 "V{ \"a\" V{ \"b\" \"b\" \"b\" } \"a\" }"
107 ARTICLE: "peg.ebnf.grouping" "EBNF Rule: Group"
108 "Any sequence of rules may be grouped using parentheses (" { $snippet "()" } "). "
109 "The parenthesized sequence can then be modified as a group. Parentheses also "
110 "delimit sets of choices separated by pipe (|) characters."
112 "A group can also be delimited with curly braces (" { $snippet "{}" } "), in "
113 "which case an implicit optional whitespace-matching rule will be inserted between "
114 "rules sequenced within the braces."
117 "USING: prettyprint peg.ebnf ;"
118 "\"abcca\" EBNF[[ rule=\"a\" (\"b\" | \"c\")* \"a\" ]] ."
119 "V{ \"a\" V{ \"b\" \"c\" \"c\" } \"a\" }"
122 "USING: prettyprint peg.ebnf ;"
123 "\"ab c\nd \" EBNF[[ rule={\"a\" \"b\" \"c\" \"d\"} ]] ."
124 "V{ \"a\" \"b\" \"c\" \"d\" }"
129 ARTICLE: "peg.ebnf.choice" "EBNF Rule: Choice"
130 "Any rule element separated by a pipe character (|) is considered a " { $strong "choice" } ". Choices "
131 "are matched against the input stream in order. If a match succeeds then the remaining "
132 "choices are discarded and the result of the match is the AST result of the choice."
135 "USING: prettyprint peg.ebnf ;"
136 "\"a\" EBNF[[ rule=\"a\" | \"b\" | \"c\" ]] ."
140 "USING: prettyprint peg.ebnf ;"
141 "\"b\" EBNF[[ rule=\"a\" | \"b\" | \"c\" ]] ."
145 "USING: prettyprint peg.ebnf ;"
146 "\"d\" EBNF[[ rule=\"a\" | \"b\" | \"c\" ]] ."
147 "Peg parsing error at character position 0.\nExpected 'a' or 'b' or 'c'\nGot 'd'"
150 { $notes "Due to parser caching, rules can't re-use parsers that have already failed earlier in the choice." }
153 ARTICLE: "peg.ebnf.ignore" "EBNF Rule: Ignore"
154 "Any rule element followed by a tilde (~) will be matched, and its results "
155 "discarded from the AST."
158 "USING: prettyprint peg.ebnf ;"
159 "\"abc\" EBNF[[ rule=\"a\" \"b\"~ \"c\" ]] ."
165 ARTICLE: "peg.ebnf.option" "EBNF Rule: Option"
166 "Any rule element followed by a question mark (?) is considered optional. The "
167 "rule is tested against the input. If it succeeds the result is stored in the AST. "
168 "If it fails then the parse still succeeds and false (f) is stored in the AST."
171 "USING: prettyprint peg.ebnf ;"
172 "\"abc\" EBNF[[ rule=\"a\" \"b\"? \"c\" ]] ."
173 "V{ \"a\" \"b\" \"c\" }"
176 "USING: prettyprint peg.ebnf ;"
177 "\"ac\" EBNF[[ rule=\"a\" \"b\"? \"c\" ]] ."
183 ARTICLE: "peg.ebnf.character-class" "EBNF Rule: Character Class"
184 "Character class matching can be done using a range of characters defined in "
185 "square brackets. Multiple ranges can be included in a single character class "
186 "definition. The syntax for the range is a start character, followed by a minus "
187 "(-) followed by an end character. For example " { $snippet "[a-zA-Z]" } ". "
188 "To include the minus (-) character in the class, make it the first or the last one: " { $snippet "[-0-9]" } " or " { $snippet "[a-z-]" } ". "
189 "The AST resulting from the match is an integer of the character code for the "
190 "character that matched."
193 "USING: prettyprint peg.ebnf ;"
194 "\"123\" EBNF[[ rule=[0-9]+ ]] ."
200 ARTICLE: "peg.ebnf.one-or-more" "EBNF Rule: One or more"
201 "Any rule element followed by a plus (+) matches one or more instances of the rule "
202 "from the input string. The AST result is the vector of the AST results from "
206 "USING: prettyprint peg.ebnf ;"
207 "\"aab\" EBNF[[ rule=\"a\"+ \"b\" ]] ."
208 "V{ V{ \"a\" \"a\" } \"b\" }"
213 ARTICLE: "peg.ebnf.zero-or-more" "EBNF Rule: Zero or more"
214 "Any rule element followed by an asterisk (*) matches zero or more instances of the rule "
215 "from the input string. The AST result is the vector of the AST results from "
216 "the matched rule. This will be empty if there are no matches."
219 "USING: prettyprint peg.ebnf ;"
220 "\"aab\" EBNF[[ rule=\"a\"* \"b\" ]] ."
221 "V{ V{ \"a\" \"a\" } \"b\" }"
224 "USING: prettyprint peg.ebnf ;"
225 "\"b\" EBNF[[ rule=\"a\"* \"b\" ]] ."
231 ARTICLE: "peg.ebnf.and" "EBNF Rule: And"
232 "Any rule element prefixed by an ampersand (&) performs the Parsing Expression "
233 "Grammar 'And Predicate' match. It attempts to match the rule against the input "
234 "string. It will cause the parse to succeed or fail depending on if the rule "
235 "succeeds or fails. It will not consume anything from the input string however and "
236 "does not leave any result in the AST. This can be used for lookahead and "
237 "disambiguation in choices."
240 "USING: prettyprint peg.ebnf ;"
241 "\"ab\" EBNF[[ rule=&(\"a\") \"a\" \"b\" ]] ."
247 ARTICLE: "peg.ebnf.not" "EBNF Rule: Not"
248 "Any rule element prefixed by an exclamation mark (!) performs the Parsing Expression "
249 "Grammar 'Not Predicate' match. It attempts to match the rule against the input "
250 "string. It will cause the parse to succeed if the rule match fails, and to fail "
251 "if the rule match succeeds. It will not consume anything from the input string "
252 "however and does not leave any result in the AST. This can be used for lookahead and "
253 "disambiguation in choices."
256 "USING: prettyprint peg.ebnf ;"
257 "\"<abcd>\" EBNF[[ rule=\"<\" (!(\">\") .)* \">\" ]] ."
258 "V{ \"<\" V{ 97 98 99 100 } \">\" }"
263 ARTICLE: "peg.ebnf.action" "EBNF Action"
264 "An action is a quotation that is run after a rule matches. The quotation "
265 "consumes the AST of the rule match and leaves a new AST as the result. "
266 "The stack effect of the action can be " { $snippet "( ast -- ast )" } " or "
267 { $snippet "( -- ast )" } ". "
268 "If it is the latter then the original AST is implicitly dropped and will be "
269 "replaced by the AST left on the stack. This is mostly useful if variables are "
270 "used in the rule since they can be referenced like locals in the action quotation. "
271 "The action is defined by having a ' => ' at the end of a rule and "
272 "using '[[' and ']]' to open and close the quotation. "
273 "If an action leaves the object 'ignore' on the stack then the result of that "
274 "action will not be put in the AST of the result."
277 "USING: prettyprint peg.ebnf strings ;"
278 "\"<abcd>\" EBNF[=[ rule=\"<\" ((!(\">\") .)* => [[ >string ]]) \">\" ]=] ."
279 "V{ \"<\" \"abcd\" \">\" }"
282 "USING: prettyprint peg.ebnf math.parser ;"
283 "\"123\" EBNF[=[ rule=[0-9]+ => [[ string>number ]] ]=] ."
289 ARTICLE: "peg.ebnf.semantic-action" "EBNF Semantic Action"
290 "Semantic actions allow providing a quotation that gets run on the AST of a "
291 "matched rule that returns success or failure. The result of the parse is decided by "
292 "the result of the semantic action. The stack effect for the quotation is "
293 { $snippet "( ast -- ? )" } ". "
294 "A semantic action follows the rule it applies to and is delimited by '?[' and ']?'."
297 "USING: prettyprint peg.ebnf math math.parser ;"
298 "\"1\" EBNF[[ rule=[0-9] ?[ digit> odd? ]? ]] ."
302 "USING: prettyprint peg.ebnf math math.parser ;"
303 "\"2\" EBNF[[ rule=[0-9] ?[ digit> odd? ]? ]] ."
304 "Peg parsing error at character position 0.\nExpected \nGot '2'"
309 ARTICLE: "peg.ebnf.variable" "EBNF Variable"
310 "Variables names can be suffixed to a rule element using the colon character (:) "
311 "followed by the variable name. These can then be used in rule actions to refer to "
312 "the AST result of the rule element with that variable name."
315 "USING: prettyprint peg.ebnf math.parser ;"
316 "\"1+2\" EBNF[=[ rule=[0-9]:a \"+\" [0-9]:b => [[ a digit> b digit> + ]] ]=] ."
322 ARTICLE: "peg.ebnf.foreign-rules" "EBNF Foreign Rules"
323 "Rules can call out to other " { $vocab-link "peg.ebnf" } " defined parsers. The result of "
324 "the foreign call then becomes the AST of the successful parse. Foreign rules "
325 "are invoked using '<foreign word-name>' or '<foreign word-name rule>'. The "
326 "latter allows calling a specific rule in a previously designed peg.ebnf parser. "
327 "If the 'word-name' is not the name of a peg.ebnf defined parser then it must be "
328 "a word with stack effect " { $snippet "( -- parser )" } ". It must return a "
329 { $vocab-link "peg" } " defined parser and it will be called to perform the parse "
333 "USING: prettyprint peg.ebnf ;"
334 "EBNF: parse-string [=["
335 "StringBody = (!('\"') .)*"
336 "String= '\"' StringBody:b '\"' => [[ b >string ]]"
338 "EBNF: parse-two-strings [=["
339 "TwoStrings = <foreign parse-string String> <foreign parse-string String>"
341 "EBNF: parse-two-strings [=["
342 "TwoString = <foreign parse-string> <foreign parse-string>"
346 ": a-token ( -- parser ) \"a\" token ;"
347 "EBNF: parse-abc [=["
348 "abc = <foreign a-token> 'b' 'c'"
354 ARTICLE: "peg.ebnf.tokenizers" "EBNF Tokenizers"
355 "It is possible to override the tokenizer in an EBNF defined parser. "
356 "Usually the input sequence to be parsed is an array of characters or a string. "
357 "Terminals in a rule match successive characters in the array or string."
362 "rule = \"++\" \"--\""
366 "This parser when run with the string \"++--\" or the array "
367 "{ CHAR: + CHAR: + CHAR: - CHAR: - } will succeed with an AST of { \"++\" \"--\" }. "
368 "If you want to add whitespace handling to the grammar you need to put it "
369 "between the terminals:"
374 "space = (\" \" | \"\\r\" | \"\\t\" | \"\\n\")"
375 "spaces = space* => [[ drop ignore ]]"
376 "rule = spaces \"++\" spaces \"--\" spaces"
380 "In a large grammar this gets tedious and makes the grammar hard to read. "
381 "Instead you can write a rule to split the input sequence into tokens, and "
382 "have the grammar operate on these tokens. This is how the previous example "
388 "space = (\" \" | \"\\r\" | \"\\t\" | \"\\n\")"
389 "spaces = space* => [[ drop ignore ]]"
390 "tokenizer = spaces ( \"++\" | \"--\" )"
391 "rule = \"++\" \"--\""
395 "'tokenizer' is the name of a built in rule. Once defined it is called to "
396 "retrieve the next complete token from the input sequence. So the first part "
397 "of 'rule' is to try and match \"++\". It calls the tokenizer to get the next "
398 "complete token. This ignores spaces until it finds a \"++\" or \"--\". "
399 "It is as if the input sequence for the parser was actually { \"++\" \"--\" } "
400 "instead of the string \"++--\". With the new tokenizer \"....\" sequences "
401 "in the grammar are matched for equality against the token, rather than a "
402 "string comparison against successive items in the sequence. This can be used "
403 "to match an AST from a tokenizer."
405 "In this example I split the tokenizer into a separate parser and use "
406 "'foreign' to call it from the main one. This allows testing of the "
407 "tokenizer separately:"
410 "USING: prettyprint peg peg.ebnf kernel math.parser strings"
411 "accessors math arrays multiline ;"
414 "TUPLE: ast-number value ;"
415 "TUPLE: ast-string value ;"
417 "EBNF: foo-tokenizer [=["
418 "space = (\" \" | \"\\r\" | \"\\t\" | \"\\n\")"
419 "spaces = space* => [[ drop ignore ]]"
421 "number = [0-9]+ => [[ >string string>number ast-number boa ]]"
422 "operator = (\"+\" | \"-\")"
424 "token = spaces ( number | operator )"
429 "tokenizer = <foreign foo-tokenizer token>"
431 "number = . ?[ ast-number? ]? => [[ value>> ]]"
432 "string = . ?[ ast-string? ]? => [[ value>> ]]"
434 "rule = string:a number:b \"+\" number:c => [[ a b c + 2array ]]"
437 "\"123 456 +\" foo-tokenizer ."
438 "V{\n T{ ast-number { value 123 } }\n T{ ast-number { value 456 } }\n \"+\"\n}"
441 "The '.' EBNF production means match a single object in the source sequence. "
442 "Usually this is a character. With the replacement tokenizer it is either a "
443 "number object, a string object or a string containing the operator. "
444 "Using a tokenizer in language grammars makes it easier to deal with whitespace. "
445 "Defining tokenizers in this way has the advantage of the tokenizer and parser "
446 "working in one pass. There is no tokenization occurring over the whole string "
447 "followed by the parse of that result. It tokenizes as it needs to. You can even "
448 "switch tokenizers multiple times during a grammar. Rules use the tokenizer that "
449 "was defined lexically before the rule. This is useful in the JavaScript grammar:"
453 "EBNF: javascript [=["
454 "tokenizer = default"
455 "nl = \"\\r\" \"\\n\" | \"\\n\""
456 "tokenizer = <foreign tokenize-javascript Tok>"
459 "Name = . ?[ ast-name? ]? => [[ value>> ]] "
460 "Number = . ?[ ast-number? ]? => [[ value>> ]]"
461 "String = . ?[ ast-string? ]? => [[ value>> ]]"
462 "RegExp = . ?[ ast-regexp? ]? => [[ value>> ]]"
463 "SpacesNoNl = (!(nl) Space)* => [[ ignore ]]"
464 "Sc = SpacesNoNl (nl | &(\"}\") | End)| \";\""
468 "Here the rule 'nl' is defined using the default tokenizer of sequential "
469 "characters ('default' has the special meaning of the built in tokenizer). "
470 "This is followed by using the JavaScript tokenizer for the remaining rules. "
471 "This tokenizer strips out whitespace and newlines. Some rules in the grammar "
472 "require checking for a newline. In particular the automatic semicolon insertion "
473 "rule (managed by the 'Sc' rule here). If there is a newline, the semicolon can "
474 "be optional in places."
477 "\"do\" Stmt:s \"while\" \"(\" Expr:c \")\" Sc => [[ s c ast-do-while boa ]]"
480 "Even though the JavaScript tokenizer has removed the newlines, the 'nl' rule can "
481 "be used to detect them since it is using the default tokenizer. This allows "
482 "grammars to mix and match the tokenizer as required to make them more readable."
485 ARTICLE: "peg.ebnf" "EBNF"
486 "The " { $vocab-link "peg.ebnf" } " vocabulary provides a DSL that allows writing PEG parsers that look like "
487 "EBNF syntax. It provides three parsing words described below. These words all "
488 "accept the same EBNF syntax. The difference is in how they are used."
497 "The EBNF syntax is composed of a series of rules of the form:"
502 "The last defined rule is the main rule for the EBNF. It is the first one run "
503 "and it is expected that the remaining rules are used by that rule. Rules may be "
505 "Each rule can contain the following:"
506 { $subsections "peg.ebnf.strings"
513 "peg.ebnf.one-or-more"
514 "peg.ebnf.zero-or-more"
517 "peg.ebnf.character-class"
518 "peg.ebnf.foreign-rules"
520 "peg.ebnf.semantic-action"
521 "peg.ebnf.variable" }
522 "Grammars defined in EBNF need to handle each character, or sequence of "
523 "characters in the input. This can be tedious for dealing with whitespace in "
524 "grammars that have 'tokens' separated by whitespace. You can define your "
525 "own tokenizer that for an EBNF grammar, and write the grammar in terms of "
526 "those tokens, allowing you to ignore the whitespace issue. The tokenizer "
527 "can be changed at various parts in the grammar as needed. The JavaScript grammar "
528 "does this to define the optional semicolon rule for example."
529 { $subsections "peg.ebnf.tokenizers" }