This changes avoids looping many times if the denominator is a power of
2. After this change, the implementation matches the linked sbcl
algorithm. This was probably a mistake done when porting the algorithm.
Basically, as an optimization, all trailing zeros are removed from the
base2 representation of the denominator to have smaller bignums to
divide. But the previous factor implementation didn't take this into
account when making the initial guess of the shift of the numerator to
obtain a result in the range [2^54-1,2^53]. The loop would then correct
the initial guess by a factor of 2 at each iteration, so it would run as
many iteration as the denominator base2 power reduction, instead of only
a few times. For pathological cases, the speed up is huge (10^4):
1 1000 2^ bignum/f
: (epsilon?) ( num shift -- ? )
dup neg? [ neg 2^ 1 - bitand zero? not ] [ 2drop f ] if ; inline
+: scale-numerator ( num den -- epsilon? num' scale )
+ over [ log2 ] bi@ - [
+ 54 + [ (epsilon?) ] [ shift ] 2bi
+ ] keep ; inline
+
: pre-scale ( num den -- epsilon? mantissa den' scale )
- 2dup [ log2 ] bi@ -
- [ neg 54 + [ (epsilon?) ] [ shift ] 2bi ]
- [ [ scale-denonimator ] dip + ] bi-curry bi* ; inline
+ scale-denonimator [
+ [ scale-numerator ] keep swap
+ ] dip swap - ; inline
! Second step: loop
: (2/-with-epsilon) ( epsilon? num -- epsilon?' num' )