core/handbook/sequences.facts

   1 IN: sequences
   2 USING: arrays help kernel kernel-internals sequences-internals
   3 strings vectors words ;
   4
   5 ARTICLE: "sequence-implementations" "Sequence implementations"
   6 "There are two basic types of sequences. Instances of the following two types have fixed length:"
   7 { $subsection "arrays" }
   8 { $subsection "strings" }
   9 "Instances of the following are resizable:"
  10 { $subsection "vectors" }
  11 { $subsection "sbufs" }
  12 "Quotations are special sequences which hold code:"
  13 { $subsection "quotations" }
  14 "Integers support the sequence protocol:"
  15 { $subsection "sequences-integers" }
  16 "Virtual sequences wrap an underlying sequence, and changes to the underlying sequence are reflected in the virtual sequence:"
  17 { $subsection <reversed> }
  18 { $subsection <slice> }
  19 { $subsection <column> }
  20 "The " { $link f } " object also supports the sequence protocol. It responds with a length of zero, and instead of throwing an out of bounds error, outputs " { $link f } " when an element is accessed. This can simplify code that would like a dummy sequence behaving as if it has arbitrary length." ;
  21
  22 ARTICLE: "sequences-integers" "Integer sequences and counted loops"
  23 "Integers support the sequence protocol in a trivial fashion; a non-negative integer presents its non-negative predecessors as elements. For example, the integer 3, when viewed as a sequence, contains the elements 0, 1, and 2. This is very useful for performing counted loops."
  24 $terpri
  25 "For example, the " { $link each } " combinator, given an integer, simply calls a quotation that number of times, pushing a counter on each iteration that ranges from 0 up to that integer:"
  26 { $example "3 [ . ] each" "0\n1\n2" }
  27 "A common idiom is to iterate over a sequence, while also maintaining a loop counter. This can be done using " { $link 2each } ":"
  28 { $example "{ \"a\" \"b\" \"c\" } dup length [\n    \"Index: \" write . \"Element: \" write .\n] 2each" "Index: 0\nElement: \"a\"\nIndex: 1\nElement: \"b\"\nIndex: 2\nElement: \"c\"" }
  29 "Combinators that produce new sequences, such as " { $link map } ", will output an array if the input is an integer." ;
  30
  31 ARTICLE: "sequences-access" "Accessing sequence elements"
  32 { $subsection nth }
  33 { $subsection ?nth }
  34 { $subsection first }
  35 { $subsection second }
  36 { $subsection third }
  37 { $subsection fourth }
  38 { $subsection first2 }
  39 { $subsection first3 }
  40 { $subsection first4 }
  41 { $subsection peek }
  42 { $subsection last/first } ;
  43
  44 ARTICLE: "sequences-combinators" "Sequence combinators"
  45 "Iteration:"
  46 { $subsection each }
  47 { $subsection each-with }
  48 { $subsection reduce }
  49 { $subsection interleave }
  50 { $subsection 2each }
  51 { $subsection 2reduce }
  52 "Mapping:"
  53 { $subsection map }
  54 { $subsection map-with }
  55 { $subsection accumulate }
  56 { $subsection 2map }
  57 "Filtering:"
  58 { $subsection subset }
  59 { $subsection subset-with } ;
  60
  61 ARTICLE: "sequences-tests" "Testing sequences"
  62 "Testing for an empty sequence:"
  63 { $subsection empty? }
  64 "Testing indices:"
  65 { $subsection bounds-check? }
  66 "Testing if a sequence contains an object:"
  67 { $subsection member? }
  68 { $subsection memq? }
  69 "Testing if a sequence contains a subsequence:"
  70 { $subsection head? }
  71 { $subsection tail? }
  72 { $subsection subseq? }
  73 "Testing if a sequence contains elements satisfying a predicate:"
  74 { $subsection contains? }
  75 { $subsection contains-with? }
  76 { $subsection all? }
  77 { $subsection all-with? }
  78 "Testing how elements are related:"
  79 { $subsection monotonic? }
  80 { $subsection all-eq? }
  81 { $subsection all-equal? } ;
  82
  83 ARTICLE: "sequences-search" "Searching sequences"
  84 "Finding the index of an element:"
  85 { $subsection index }
  86 { $subsection index* }
  87 { $subsection last-index }
  88 { $subsection last-index* }
  89 "Finding the start of a subsequence:"
  90 { $subsection start }
  91 { $subsection start* }
  92 "Finding the index of an element satisfying a predicate:"
  93 { $subsection find }
  94 { $subsection find* }
  95 { $subsection find-with }
  96 { $subsection find-with* }
  97 { $subsection find-last }
  98 { $subsection find-last-with }
  99 { $subsection find-last* }
 100 { $subsection find-last-with* } ;
 101
 102 ARTICLE: "sequences-join-split" "Joining and splitting sequences"
 103 "Forming new sequences from existing sequences:"
 104 { $subsection append }
 105 { $subsection 3append }
 106 { $subsection concat }
 107 { $subsection join }
 108 { $subsection replace-slice }
 109 "Slicing sequences:"
 110 { $subsection subseq }
 111 { $subsection head }
 112 { $subsection tail }
 113 { $subsection head* }
 114 { $subsection tail* }
 115 "Variants of the above which output a virtual sequence sharing storage with the input sequence:"
 116 { $subsection <slice> }
 117 { $subsection head-slice }
 118 { $subsection tail-slice }
 119 { $subsection head-slice* }
 120 { $subsection tail-slice* }
 121 "Splitting sequences at specific indices:"
 122 { $subsection cut }
 123 { $subsection cut* }
 124 { $subsection unclip }
 125 "Splitting sequences at occurrences of subsequences:"
 126 { $subsection ?head }
 127 { $subsection ?tail }
 128 { $subsection split1 }
 129 { $subsection split }
 130 { $subsection drop-prefix } ;
 131
 132 ARTICLE: "sequences-add-remove" "Adding and removing sequence elements"
 133 "Adding elements:"
 134 { $subsection add }
 135 { $subsection add* }
 136 "Removing elements:"
 137 { $subsection remove }
 138 { $subsection remove-nth }
 139 { $subsection diff }
 140 { $subsection prune } ;
 141
 142 ARTICLE: "sequences-reshape" "Reshaping sequences"
 143 { $subsection reverse }
 144 { $subsection <reversed> }
 145 { $subsection group }
 146 { $subsection flatten }
 147 { $subsection flip }
 148 { $subsection <column> }
 149 { $subsection subst } ;
 150
 151 ARTICLE: "sequences-destructive" "Destructive operations"
 152 "These words modify their input, instead of creating a new sequence."
 153 $terpri
 154 "In-place variant of " { $link add } ":"
 155 { $subsection push }
 156 "In-place variant of " { $link append } ":"
 157 { $subsection nappend }
 158 "In-place variant of " { $link remove } ":"
 159 { $subsection delete }
 160 "In-place variant of " { $link map } ":"
 161 { $subsection inject }
 162 { $subsection inject-with }
 163 "Changing elements:"
 164 { $subsection set-nth }
 165 { $subsection change-nth }
 166 { $subsection cache-nth }
 167 "Other destructive words:"
 168 { $subsection exchange }
 169 { $subsection push-new }
 170 { $subsection pop }
 171 { $subsection pop* }
 172 { $subsection delete-all }
 173 { $subsection copy-into } ;
 174
 175 ARTICLE: "sequences-stacks" "Stack operations"
 176 "The classical stack operations, modifying a sequence in place:"
 177 { $subsection empty? }
 178 { $subsection peek }
 179 { $subsection push }
 180 { $subsection pop }
 181 { $subsection pop* }
 182 "Lazy instantiation of stacks:"
 183 { $subsection ?push } ;
 184
 185 ARTICLE: "sequences-comparing" "Comparing sequences"
 186 "Element equality:"
 187 { $subsection sequence= }
 188 { $subsection mismatch }
 189 "Lexicographic order:"
 190 { $subsection <=> } ;
 191
 192 ARTICLE: "sequences-assoc" "Association lists"
 193 "An " { $emphasis "association list" } " is a sequence of pairs. Association lists come up from time to time; for example, the " { $link cond } " combinator takes an association list of quotations as input. You can perform lookups and take association lists apart:"
 194 { $subsection assoc }
 195 { $subsection rassoc }
 196 { $subsection unpair }
 197 "An association list is slower to search than a hashtable. The main advantage of an association list is that the elements are ordered; also sometimes it is more convenient to construct an association list with sequence words than to construct a hashtable with hastable words. Most of the time, hashtables are more appropriate. See " { $link "hashtables" } "." ;
 198
 199 ARTICLE: "sequence-protocol" "Sequence protocol"
 200 "All sequences must know their length, and provide a way to access elements:"
 201 { $subsection length }
 202 { $subsection nth }
 203 "Mutable sequences:"
 204 { $subsection set-nth }
 205 "Resizable sequences:"
 206 { $subsection set-length }
 207 "An optional generic word for creating sequences of the same class as a given sequence:"
 208 { $subsection like }
 209 "Another optional generic word for optimization purposes:"
 210 { $subsection new } ;
 211
 212 ARTICLE: "arrays" "Arrays"
 213 "An array is a fixed-size mutable sequence whose elements are stored in a contiguous range of memory. The literal syntax is covered in " { $link "syntax-arrays" } ". Sometimes you need a resizable array -- this is called a vector, and vectors are documented in " { $link "vectors" } "."
 214 $terpri
 215 "Array words are in the " { $vocab-link "arrays" } " vocabulary. Unsafe implementation words are in the " { $vocab-link "kernel-internals" } " vocabulary."
 216 $terpri
 217 "Arrays form a class of objects."
 218 { $subsection array }
 219 { $subsection array? }
 220 "There are several ways to construct arrays."
 221 { $subsection >array }
 222 { $subsection <array> }
 223 { $subsection 1array }
 224 { $subsection 2array }
 225 { $subsection 3array }
 226 { $subsection 4array }
 227 "Arrays can be accessed without bounds checks in a pointer unsafe way."
 228 { $subsection array-nth }
 229 { $subsection set-array-nth } ;
 230
 231 ARTICLE: "strings" "Strings"
 232 "A string is a fixed-size mutable sequence of characters."
 233 $terpri
 234 "String words are found in the " { $vocab-link "strings" } " vocabulary."
 235 { $subsection string? }
 236 { $subsection >string }
 237 { $subsection <string> }
 238 "A pair of words are used to nicely format columns of text."
 239 { $subsection pad-left }
 240 { $subsection pad-right }
 241 "Characters are not a first-class type; they are simply represented as integers between 0 and 65535. A few words operate on characters:"
 242 { $subsection blank? }
 243 { $subsection letter? }
 244 { $subsection LETTER? }
 245 { $subsection digit? }
 246 { $subsection printable? }
 247 { $subsection control? }
 248 { $subsection quotable? }
 249 { $subsection ch>lower }
 250 { $subsection ch>upper } ;
 251
 252 ARTICLE: "sbufs" "String buffers"
 253 "A string buffer is a resizable mutable sequence of characters. String buffers can be used to construct new strings by accumilating substrings and characters, however usually they are only used indirectly, since the sequence construction words are more convenient to use in most cases (see " { $link "namespaces-make" } ")."
 254 $terpri
 255 "String buffer words are found in the " { $vocab-link "strings" } " vocabulary."
 256 { $subsection sbuf? }
 257 "Words for creating string buffers:"
 258 { $subsection >sbuf }
 259 { $subsection <sbuf> }
 260 "If you don't care about initial capacity, a more elegant way to create a new string buffer is to write:"
 261 { $code "SBUF\" \" clone" } ;
 262
 263 ARTICLE: "vectors" "Vectors"
 264 "A vector is a resizable mutable sequence of objects. Vector words are found in the " { $vocab-link "vectors" } " vocabulary."
 265 { $subsection vector? }
 266 "Words for creating vectors:"
 267 { $subsection >vector }
 268 { $subsection <vector> }
 269 "If you don't care about initial capacity, a more elegant way to create a new vector is to write:"
 270 { $code "V{ } clone" } ;
 271
 272 ARTICLE: "sequences-sorting" "Sorting and binary search"
 273 "Sorting and binary search combinators all take comparator quotations with stack effect " { $snippet "( elt1 elt2 -- n )" } " that order the two given elements and output a value whose sign denotes the result:"
 274 { $list
 275     { "positive - indicates that " { $snippet "elt1" } " follows " { $snippet "elt2" } }
 276     { "zero - indicates that " { $snippet "elt1" } " is ordered equivalently to " { $snippet "elt2" } }
 277     { "negative - indicates that " { $snippet "elt1" } " precedes " { $snippet "elt2" } }
 278 }
 279 "In-place sorting:"
 280 { $subsection nsort }
 281 "Sorting elements in a new sequence:"
 282 { $subsection sort }
 283 "Common comparators:"
 284 { $subsection natural-sort }
 285 { $subsection sort-keys }
 286 { $subsection sort-values }
 287 "Binary search:"
 288 { $subsection binsearch }
 289 { $subsection binsearch* } ;
 290
 291 ARTICLE: "sequences-unsafe" "Unsafe sequence operations"
 292 "The unsafe sequence protocol bypasses bounds checks for increased performance:"
 293 { $subsection nth-unsafe }
 294 { $subsection set-nth-unsafe }
 295 "These words assume the sequence index given is within bounds; if it is not, memory corruption can occur. Please think twice before using them; first, make sure the code in question is actually a bottleneck; next, try improving the algorithm first. If all else fails, then use these words and test your code very carefully."
 296 $terpri
 297 "There is a very important invariant these word must preserve: if at some point in time, the length of a sequence was " { $snippet "n" } ", then any future lookups of elements with indices below " { $snippet "n" } " must not crash the runtime, even if the sequence length is now less than " { $snippet "n" } ". For example, vectors preserve this invariant by never shrinking the underlying storage, only growing it as necessary."
 298 $terpri
 299 "The justification for this is that the runtime should not crash if a resizable sequence is resized during the execution of an iteration combinator."
 300 $terpri
 301 "Indeed, iteration combinators are the primary use-case for these words; if the iteration index is already guarded by a loop test which ensures it is within bounds, then additional bounds checks are redundant. For example, see the implementation of " { $link each } "." ;
 302
 303 ARTICLE: "sequences-resizable" "Resizable sequence implementation"
 304 "Resizable sequences are implementing by having a wrapper object hold a reference to an underlying sequence, together with a fill pointer indicating how many elements of the underlying sequence are occupied. When the fill pointer exceeds the underlying sequence capacity, the underlying sequence grows."
 305 $terpri
 306 "There is a resizable sequence protocol:"
 307 { $subsection underlying }
 308 { $subsection set-underlying }
 309 { $subsection set-fill }
 310 "Any instance of a class implementing the above generics can make use of several utility words:"
 311 { $subsection capacity }
 312 { $subsection ensure }
 313 { $subsection grow-length }
 314 { $subsection clone-resizable }
 315 "This protocol and the above words are unsafe; they do not perform bounds checks for performance reasons, and thus a mistake can lead to memory corruption due to an underlying sequence being shorter than the fill pointer."
 316 $terpri
 317 "Vectors and string buffers are implemented using the resizable sequence facility (and they perform full bounds-checks and thus are safe)." ;
 318
 319 ARTICLE: "sequences" "Sequences"
 320 "A sequence is a linearly-ordered finite collection of elements."
 321 $terpri
 322 "Sequence utility words can operate on any object whose class implements the sequence protocol."
 323 { $subsection "sequence-protocol" }
 324 "There are a number of implementations of sequences in the core library, and you can write new implementations yourself."
 325 { $subsection "sequence-implementations" }
 326 "Much of the power of sequences lies in the polymorphic utility words that allow computations to be expressed as bulk operations without loops, recursion or micro-management of elements."
 327 { $subsection "sequences-access" }
 328 { $subsection "sequences-combinators" }
 329 { $subsection "sequences-tests" }
 330 { $subsection "sequences-search" }
 331 { $subsection "sequences-add-remove" }
 332 { $subsection "sequences-join-split" }
 333 { $subsection "sequences-reshape" }
 334 { $subsection "sequences-comparing" }
 335 { $subsection "sequences-sorting" }
 336 { $subsection "sequences-destructive" }
 337 "Sequences are also used to implement other data structures:"
 338 { $subsection "sequences-stacks" }
 339 { $subsection "sequences-assoc" }
 340 "Some implementation details which your code should probably not care about:"
 341 { $subsection "sequences-unsafe" }
 342 { $subsection "sequences-resizable" }
 343 { $see-also "namespaces-make" } ;