basis/math/vectors/simd/simd-docs.factor

   1 USING: help.markup help.syntax sequences math math.vectors
   2 multiline kernel.private classes.tuple.private
   3 math.vectors.simd.intrinsics cpu.architecture ;
   4 IN: math.vectors.simd
   5
   6 ARTICLE: "math.vectors.simd.intro" "Introduction to SIMD support"
   7 "Modern CPUs support a form of data-level parallelism, where arithmetic operations on fixed-size short vectors can be done on all components in parallel. This is known as single-instruction-multiple-data (SIMD)."
   8 $nl
   9 "SIMD support in the processor takes the form of instruction sets which operate on vector registers. By operating on multiple scalar values at the same time, code which operates on points, colors, and other vector data can be sped up."
  10 $nl
  11 "In Factor, SIMD support is exposed in the form of special-purpose SIMD " { $link "sequence-protocol" } " implementations. These are fixed-length, homogeneous sequences. They are referred to as vectors, but should not be confused with Factor's " { $link "vectors" } ", which can hold any type of object and can be resized.)."
  12 $nl
  13 "The words in the " { $vocab-link "math.vectors" } " vocabulary, which can be used with any sequence of numbers, are special-cased by the compiler. If the compiler can prove that only SIMD vectors are used, it expands " { $link "math-vectors" } " into " { $link "math.vectors.simd.intrinsics" } ". While in the general case, SIMD intrinsics operate on heap-allocated SIMD vectors, that too can be optimized since in many cases the compiler unbox SIMD vectors, storing them directly in registers."
  14 $nl
  15 "Since the only difference between ordinary code and SIMD-accelerated code is that the latter uses special fixed-length SIMD sequences, the SIMD library is very easy to use. To ensure your code compiles to use vector instructions without boxing and unboxing overhead, follow the guidelines for " { $link "math.vectors.simd.efficiency" } "."
  16 $nl
  17 "There should never be any reason to use " { $link "math.vectors.simd.intrinsics" } " directly, but they too have a straightforward, but lower-level, interface." ;
  18
  19 ARTICLE: "math.vectors.simd.support" "Supported SIMD instruction sets and operations"
  20 "At present, the SIMD support makes use of SSE, SSE2 and a few SSE3 instructions on x86 CPUs."
  21 $nl
  22 "SSE1 only supports single-precision SIMD (" { $snippet "float-4" } " and " { $snippet "float-8" } ")."
  23 $nl
  24 "SSE2 introduces double-precision and integer SIMD."
  25 $nl
  26 "SSE3 introduces horizontal adds (summing all components of a single vector register), which is useful for computing dot products. Where available, SSE3 operations are used to speed up " { $link sum } ", " { $link v. } ", " { $link norm-sq } ", " { $link norm } ", and " { $link distance } ". If SSE3 is not available, software fallbacks are used for " { $link sum } " and related words."
  27 $nl
  28 "On PowerPC, or older x86 chips without SSE, software fallbacks are used for all high-level vector operations. SIMD code can run with no loss in functionality, just decreased performance."
  29 $nl
  30 "The primities in the " { $vocab-link "math.vectors.simd.intrinsics" } " vocabulary do not have software fallbacks, but they should not be called directly in any case." ;
  31
  32 ARTICLE: "math.vectors.simd.types" "SIMD vector types"
  33 "Each SIMD vector type is named " { $snippet "scalar-count" } ", where " { $snippet "scalar" } " is a scalar C type and " { $snippet "count" } " is a vector dimension."
  34 $nl
  35 "To use a SIMD vector type, a parsing word is used to generate the relevant code and bring it into the vocabulary search path; this is the same idea as with " { $link "specialized-arrays" } ":"
  36 { $subsection POSTPONE: SIMD: }
  37 "The following vector types are supported:"
  38 { $code
  39     "char-16"
  40     "uchar-16"
  41     "char-32"
  42     "uchar-32"
  43     "short-8"
  44     "ushort-8"
  45     "short-16"
  46     "ushort-16"
  47     "int-4"
  48     "uint-4"
  49     "int-8"
  50     "uint-8"
  51     "float-4"
  52     "float-8"
  53     "double-2"
  54     "double-4"
  55 } ;
  56
  57 ARTICLE: "math.vectors.simd.words" "SIMD vector words"
  58 "For each SIMD vector type, several words are defined:"
  59 { $table
  60     { "Word" "Stack effect" "Description" }
  61     { { $snippet "type-with" } { $snippet "( x -- simd-array )" } "creates a new instance where all components are set to a single scalar" }
  62     { { $snippet "type-boa" } { $snippet "( ... -- simd-array )" } "creates a new instance where components are read from the stack" }
  63     { { $snippet ">type" } { $snippet "( seq -- simd-array )" } "creates a new instance initialized with the elements of an existing sequence, which must have the correct length" }
  64     { { $snippet "type{" } { $snippet "type{ elements... }" } "parsing word defining literal syntax for an SIMD vector; the correct number of elements must be given" }
  65 }
  66 "To actually perform vector arithmetic on SIMD vectors, use " { $link "math-vectors" } " words."
  67 { $see-also "c-types-specs" } ;
  68
  69 ARTICLE: "math.vectors.simd.efficiency" "Writing efficient SIMD code"
  70 "Since SIMD vectors are heap-allocated objects, it is important to write code in a style which is conducive to the compiler being able to inline generic dispatch and eliminate allocation."
  71 $nl
  72 "If the inputs to a " { $vocab-link "math.vectors" } " word are statically known to be SIMD vectors, the call is converted into an SIMD primitive, and the output is then also known to be an SIMD vector (or scalar, depending on the operation); this information propagates forward within a single word (together with any inlined words and macro expansions). Any intermediate values which are not stored into collections, or returned from the word, are furthermore unboxed."
  73 $nl
  74 "To check if optimizations are being performed, pass a quotation to the " { $snippet "optimizer-report." } " and " { $snippet "optimized." } " words in the " { $vocab-link "compiler.tree.debugger" } " vocabulary, and look for calls to " { $link "math.vectors.simd.intrinsics" } " as opposed to high-level " { $link "math-vectors" } "."
  75 $nl
  76 "For example, in the following, no SIMD operations are used at all, because the compiler's propagation pass does not consider dynamic variable usage:"
  77 { $code
  78 <" USING: compiler.tree.debugger math.vectors
  79 math.vectors.simd ;
  80 SYMBOLS: x y ;
  81
  82 [
  83     double-4{ 1.5 2.0 3.7 0.4 } x set
  84     double-4{ 1.5 2.0 3.7 0.4 } y set
  85     x get y get v+
  86 ] optimizer-report."> }
  87 "The following word benefits from SIMD optimization, because it begins with an unsafe declaration:"
  88 { $code
  89 <" USING: compiler.tree.debugger kernel.private
  90 math.vectors math.vectors.simd ;
  91 SIMD: float-4
  92 IN: simd-demo
  93
  94 : interpolate ( v a b -- w )
  95     { float-4 float-4 float-4 } declare
  96     [ v* ] [ [ 1.0 ] dip n-v v* ] bi-curry* bi v+ ;
  97
  98 \ interpolate optimizer-report. "> }
  99 "Note that using " { $link declare } " is not recommended. Safer ways of getting type information for the input parameters to a word include defining methods on a generic word (the value being dispatched upon has a statically known type in the method body), as well as using " { $link "hints" } " and " { $link POSTPONE: inline } " declarations."
 100 $nl
 101 "Here is a better version of the " { $snippet "interpolate" } " words above that uses hints:"
 102 { $code
 103 <" USING: compiler.tree.debugger hints
 104 math.vectors math.vectors.simd ;
 105 SIMD: float-4
 106 IN: simd-demo
 107
 108 : interpolate ( v a b -- w )
 109     [ v* ] [ [ 1.0 ] dip n-v v* ] bi-curry* bi v+ ;
 110
 111 HINTS: interpolate float-4 float-4 float-4 ;
 112
 113 \ interpolate optimizer-report. "> }
 114 "This time, the optimizer report lists calls to both SIMD primitives and high-level vector words, because hints cause two code paths to be generated. The " { $snippet "optimized." } " word can be used to make sure that the fast code path consists entirely of calls to primitives."
 115 $nl
 116 "If the " { $snippet "interpolate" } " word was to be used in several places with different types of vectors, it would be best to declare it " { $link POSTPONE: inline } "."
 117 $nl
 118 "In the " { $snippet "interpolate" } " word, there is still a call to the " { $link <tuple-boa> } " primitive, because the return value at the end is being boxed on the heap. In the next example, no memory allocation occurs at all because the SIMD vectors are stored inside a struct class (see " { $link "classes.struct" } "); also note the use of inlining:"
 119 { $code
 120 <" USING: compiler.tree.debugger math.vectors math.vectors.simd ;
 121 SIMD: float-4
 122 IN: simd-demo
 123
 124 STRUCT: actor
 125 { id int }
 126 { position float-4 }
 127 { velocity float-4 }
 128 { acceleration float-4 } ;
 129
 130 GENERIC: advance ( dt object -- )
 131
 132 : update-velocity ( dt actor -- )
 133     [ acceleration>> n*v ] [ velocity>> v+ ] [ ] tri
 134     (>>velocity) ; inline
 135
 136 : update-position ( dt actor -- )
 137     [ velocity>> n*v ] [ position>> v+ ] [ ] tri
 138     (>>position) ; inline
 139
 140 M: actor advance ( dt actor -- )
 141     [ >float ] dip
 142     [ update-velocity ] [ update-position ] 2bi ;
 143
 144 M\ actor advance optimized.">
 145 }
 146 "The " { $vocab-link "compiler.cfg.debugger" } " vocabulary can give a lower-level picture of the generated code, that includes register assignments and other low-level details. To look at low-level optimizer output, call " { $snippet "test-mr mr." } " on a word or quotation:"
 147 { $code
 148 <" USE: compiler.tree.debugger
 149
 150 M\ actor advance test-mr mr."> }
 151 "An example of a high-performance algorithm that uses SIMD primitives can be found in the " { $vocab-link "benchmark.nbody-simd" } " vocabulary." ;
 152
 153 ARTICLE: "math.vectors.simd.intrinsics" "Low-level SIMD primitives"
 154 "The words in the " { $vocab-link "math.vectors.simd.intrinsics" } " vocabulary are used to implement SIMD support. These words have three disadvantages compared to the higher-level " { $link "math-vectors" } " words:"
 155 { $list
 156     "They operate on raw byte arrays, with a separate “representation” parameter passed in to determine the type of the operands and result."
 157     "They are unsafe; passing values which are not byte arrays, or byte arrays with the wrong size, will dereference invalid memory and possibly crash Factor."
 158     { "They do not have software fallbacks; if the current CPU does not have SIMD support, a " { $link bad-simd-call } " error will be thrown." }
 159 }
 160 "The compiler converts " { $link "math-vectors" } " into SIMD primitives automatically in cases where it is safe; this means that the input types are known to be SIMD vectors, and the CPU supports SIMD."
 161 $nl
 162 "It is best to avoid calling these primitives directly. To write efficient high-level code that compiles down to primitives and avoids memory allocation, see " { $link "math.vectors.simd.efficiency" } "."
 163 { $subsection (simd-v+) }
 164 { $subsection (simd-v-) }
 165 { $subsection (simd-v/) }
 166 { $subsection (simd-vmin) }
 167 { $subsection (simd-vmax) }
 168 { $subsection (simd-vsqrt) }
 169 { $subsection (simd-sum) }
 170 { $subsection (simd-broadcast) }
 171 { $subsection (simd-gather-2) }
 172 { $subsection (simd-gather-4) }
 173 "There are two primitives which are used to implement accessing SIMD vector fields of " { $link "classes.struct" } ":"
 174 { $subsection alien-vector }
 175 { $subsection set-alien-vector }
 176 "For the most part, the above primitives correspond directly to vector arithmetic words. They take a representation parameter, which is one of the singleton members of the " { $link vector-rep } " union in the " { $vocab-link "cpu.architecture" } " vocabulary." ;
 177
 178 ARTICLE: "math.vectors.simd.alien" "SIMD data in struct classes"
 179 "Struct classes may contain fields which store SIMD data; for each SIMD vector type listed in " { $snippet "math.vectors.simd.types" } " there is a C type with the same name."
 180 $nl
 181 "Only SIMD struct fields are allowed at the moment; passing SIMD data as function parameters is not yet supported." ;
 182
 183 ARTICLE: "math.vectors.simd" "Hardware vector arithmetic (SIMD)"
 184 "The " { $vocab-link "math.vectors.simd" } " vocabulary extends the " { $vocab-link "math.vectors" } " vocabulary to support efficient vector arithmetic on small, fixed-size vectors."
 185 { $subsection "math.vectors.simd.intro" }
 186 { $subsection "math.vectors.simd.types" }
 187 { $subsection "math.vectors.simd.words" }
 188 { $subsection "math.vectors.simd.support" }
 189 { $subsection "math.vectors.simd.efficiency" }
 190 { $subsection "math.vectors.simd.alien" }
 191 { $subsection "math.vectors.simd.intrinsics" } ;
 192
 193 HELP: SIMD:
 194 { $syntax "SIMD: type-length" }
 195 { $values { "type" "a scalar C type" } { "length" "a vector dimension" } }
 196 { $description "Brings a SIMD array for holding " { $snippet "length" } " values of " { $snippet "type" } " into the vocabulary search path. The possible type/length combinations are listed in " { $link "math.vectors.simd.types" } " and the generated words are documented in " { $link "math.vectors.simd.words" } "." } ;
 197
 198 ABOUT: "math.vectors.simd"