================
| Introduction |
================

The dos executable PiFast32.exe runs on windows 96/98 and NT.

PiFast computes decimal digits of Pi and writes them to the file
pi.txt in standard mode.

===========
| Version |
===========
  PiFast32 is Version 3.2 of PiFast (released on Dec 27 1999). 
  New features of version 3.2 :

  * A larger number of digits can be computed with a limited physical memory in the
    advanced disk memory mode.
  * Timings have been decreased by a few percents, especially in the standard mode.
  * Physical memory needed has been decreased by a few percents (~ 4%).
  * A Sqrt(2) computation is available (with verification). This mode essentially 
    permits to test PiFast32 with a huge number of digits, with ranges that I cannot 
    reach with my computer. The timings of Sqrt(2) computation is at least ten times 
    faster than the pi computation.
  * When errors of the program are encountered (I hope this does not happen, but since
    I could not test PiFast in huge ranges, bugs can appear...), a PiFast.log file is 
    generated. This file contains information about the program and the error and should
    be send to xavier.gourdon@free.fr to help me to debug the problem.
  * When reading a compressed format, the program can generate x digits every y digits,
    where x and y are user specified.

Older versions
+++++++++++++

Version : 3.1   (released on Oct 30 1999)
-------------
  This version contains several improvements, essentially dedicated 
  to huge computations (1 giga digits or more).

  * A new disk swap mode exists, using a Number Theoritic Transform (NTT) with
    a variable number of primes, that permits to have a quasi-linear behavior of
    the timings of the program for really huge computations (to be compared with 
    the asymptotic quadratic behavior of PiFast23).
  * The maximum number of digits to be computed with that version is
    now 24 billions digits (instead of 4.5 billions with version 2.3) 
    in the disk swap mode. In this respect, developpement have been made 
    to limit the internal swap files size to 2 giga-bytes (by generating 
    several files if needed).
  * The output can be directly compressed by PiFast32. The resulting file 
    can be read (to access digits at some places only for example) or
    transformed into a standard one with PiFast. Different output format 
    are also available in this compressed output mode.

  Larger number of digits computation with version 3.1 :
   > 4,294,960,000 ( ~ 2^32 ) digits on a PIII 350 (1024 Mo) 
     (Nov 13 1999)  in 246 hours by Shigeru Kondo (world record pi
     computation on PC at this date).
   > 268,435,456 (=256 * 2^20) digits on a PII 300 (128 Mo) (Dec 15 1999) in
     25.5 hours by Stuart Lyster.

Version : 2.3   (released on Sep 5 1999)
-------------
  PiFast23 is Version 2.3 of PiFast. This new version contains
  a few number of improvements :

  * A difficult bug (reported to me by Shigeru Kondo) was found that 
    made version 2.2 going wrong when FFT size was >= 4096k 
    (possible on machines with 256 Meg at least).
  * The maximum number of digits to be computed with that version is
    now 4.5 billion digits (instead of 1.5 billion with version 2.2).

  Larger number of digits computation with version 2.3 :
   > 2,684,354,560 ( = 2.5*2^30 ) digits on a PIII 550 (1024 Mo) 
     (19 Oct 1999)  in 245 hours by Shigeru Kondo (world record pi
     computation on PC at this date).
   > 2.1 billions digits on a PIII 350 (1024 Mo) (13 Sep 1999) in
     157 hours by Shigeru Kondo.


Version : 2.2   (released on 10 Aug 1999)
-------------
  PiFast22 is Version 2.2 of PiFast. Efforts have essential been made
  in the disk swap mode. Here are the  new features of version 2.2

   * In the disk swap mode, Disk swapping has been reduced, saving a
     quite large part of the total timing.
   * Disk swap timings are given in the computation information summary
     (in the header of the resulting file).
   * Disk head movements are reduced.
   * A bug has been fixed to have all computed digits right (in some 
     cases with version 1.1 and 2.1, the 0.8% last digits were wrong).
     Another bug in internal computation has also been fixed.
   * Physical memory needed has been decreased by 5% in the disk 
     memory mode.
   * Documentation has been enlarged (behavior of the program).

  Larger number of digits computation with version 2.2 :
   > 268,435,456 (=256 * 2^20) digits on a PII 300 (128 Mo) (13 Aug 1999) in
     32.7 hours by Stuart Lyster.

Version : 2.1   (released on 5 Aug 1999)
-------------
  PiFast21 is Version 2.1 of PiFast. This version has new features :
   * It can use disk memory to perform huge computations.
   * A bug has been fixed than permits to reach a larger number of
     digits.
   * A large computation using disk memory can be done in several 
     runs.
   * A computation information summary is written as the header of the
     resulting file.

  Larger number of digits computation with version 2.1 :
   > 1.5 billion digits on a PIII 550 (1024 Mo),
     by Shigeru Kondo in 119 hours (29 Aug 1999).
   > 1 billion digits on a PIII 550 (1024 Mo),
     by Shigeru Kondo in 63.5 hours (13 Aug 1999).
   > 256 millions digits on my PII 350 (128 Mo) (6 Aug 1999) in
     41 hours.

Version 1.1 (released on 30 jul 1999) : 
-----------
  * Faster than the first version of PiFast (15%)
  * Less memory consuming

  Larger number of digits computation reported to me with version 1.1 :
   > 128 Mega (134217728) decimal digits on a PIII 600 (1024 Mo),
     by Shigeru Kondo in 2 hours and 45 minutes (4 Aug 1999).
   > 64 Mega (67108864) decimal digits on a pII 300 (128 Mo) in 8 hours
     and 18 minutes, by Stuart Lyster (30 jul 1999).     

Version 1.0 (First Version) was released on 17 jul 1999 :
-----------
  Larger number of digits computation reported to me with version 1.0 :
   > 64 Mega (67108864) decimal digits on a pII 300 (128 Mo) in
     19 hours 12 minutes, by Stuart Lyster (21 Jul 1999).

===========
| Timings |
===========

PiFast32 is the fastest program to compute Pi on the net on
Windows (Dec 27 99). It is twice faster as the fastest program I have
found on the net (pi_agm_23 by Carey BloodWorth). See also the page
http://home.istar.ca/~lyster/pi.html
for the current fastest pi programs on PC.

A timing progress have been made with version 3.2, essentially in 
the standard mode (~4% faster on my machine in this mode).

Standard mode (no disk space used)
-------------
This mode is the fastest when there are enough physical memory.
(timings with version 3.2, Pentium 450 with 512k of cache, 256 M of memory)

1  million  decimal digits  :   28.5 seconds
8  millions decimal digits  :  350 seconds
16 millions decimal digits  :  841 seconds
32 millions decimal digits  : 1976 seconds
64 millions decimal digits  : 5223 seconds

Standard Disk memory mode
-------------------------

(timings with version 2.3, on my Pentium 350 with 128M of physical memory)

 32 millions decimal digits :   3662 seconds (~ 1 hour   1 minutes)
 64 millions decimal digits :   9829 seconds (~ 2 hours 44 minutes)
128 millions decimal digits :  30320 seconds (~ 8 hours 35 minutes)
256 millions decimal digits : 110000 seconds (~30 hours 30 minutes)

Disk memory mode for huge computations
--------------------------------------

(timings with version 3.1, on my Pentium 350 with 128M of physical memory)

128 millions decimal digits :  28400 seconds (~ 7 hours 53 minutes)
256 millions decimal digits :  67900 seconds (~18 hours 52 minutes)


Important : the timings in the disk memory use mode are sensitive 
to your disk access speed.

====================
| Number of digits |
====================

Standard mode (no disk space used)
-------------

The maximum number of decimal digits you can compute with PiFast in
standard mode depends essentially of the amount of your physical memory.
Several low memory modes enable to compute a very large number of
digits. But when you want a large number of digits, you should use the
disk memory mode instead of a very low memory mode in standard mode.
(I do not know exactly the threshold, but on my 128M machine, it is
between 16 and 32 millions of digits).
With version 1.1, which ran only in standard mode, Stuart Lyster has
reported me a 64 millions digits computation on a 128M machine.
It is strongly recommended that you work only with physical memory (disk
swapping gives very poor results) in this mode.

Disk memory use mode
--------------------

Two mode of this type exists (a special disk memory mode can be used
for really huge computations).
This mode permits to reach a very large number of digits. 
The program can potentially compute a bit more than 24 billion digits (which 
is more than the 4.5 billion limitation  of version 2.3) independantly of the 
physical memory available (but a large physical memory amount is better to have
not too bad timings), but no practical tests have been done in this
direction (the largest number of digits tested is 2.5 billion digits with 
PiFast23).
The 24 billions limitation is due to the maximum value 2^31 on the long C type
(in various places in the program).

I would appreciate any comments on a very large number of digits
computation experience with PiFast (xavier.gourdon@free.fr). 

=========
| Usage |
=========

Run PiFast32.exe in a dos window. You can consult the help.txt file to have
usage information about running PiFast32.

===========================
| Behavior of the program |
===========================

Memory :
 In standard mode, the required physical memory by PiFast to compute N
 decimal digits of Pi using and FFT of size NFFT is approximately

   Physical memory (in bytes) = 2*N + 40*NFFT.

 In the disk memory mode, memory required is
   Physical memory (in bytes) = 48*NFFT
   Disk memory (in bytes) = 2*N

 In the disk memory huge mode, memory required is
   Physical memory (in bytes) = 48*NFFT
   Disk memory (in bytes) ~ 4*N  (approximation)

 Note that the Disk memory requirement during the computation is also
 enough to contain the final output files.
 
Timing :
 In the fastest mode (until you do not reach your physical memory limit),
 PiFast is close to linear. That is, when you double the number of
 digits to be computed, your timing almost doubles (in fact, the 
 practical factor is often 2.2 or 2.3).
 
 Things are getting worse once your physical memory is full : the
 program becomes asymptotically quadratic (that is, for huge number of
 digits, the timing has a factor of 4.) Timings in the standard
 disk memory mode on my 128M machine show a factor of 2.7 between the
 32 and 64 millions computation, a factor of 3.1 between 64 and 128 millions,
 a factor of 3.6 ~ 3.7 between 128 and 256 millions. 
 
 The asymptotic behavior is much better in the second disk memory mode,
 dedicated to huge computations. The theoritical behaviour in this
 latest mode is of the form T = n log(n)^3 + alpha * n^2, with alpha
 very small. In practical computations, the alpha*n^2 part is so small that
 it should not be representative (it correspond to the chinese
 remainder theorem in the NTT when the number of primes is large).

Number of digits :
 Due to how I do things, the program has a (small) threshold when the
 number of digits required is close to a power of two (for example, 
 on my machine, computing 1048576 digits takes 48 seconds, computing
 1040000 digits takes 44 seconds : nearly 10% time saved with 0.8% less
 digits). This remark is important since power of two number of digits
 is often used to compute pi (PiFast does not need power of two
 digits). For that reason, I personnaly compute pi with PiFast with
 1 million digits (instead of 1Mega = 1048576), 2millions, 4, 8 ...
 

==============================
| Formula and Algorithm used |
==============================

PiFast is based on Ramanajun like formula.

The Chudnovsky method is based on the Chudnovsky formula

                       ----
426880 (10005)^(1/2)   \    (6n)! (545140134 n + 13591409)
-------------------- = /    ------------------------------
       Pi              ----   (n!)^3 (3n)! (-640320)^(3n)
                       n>=0

which adds roughly 14 decimal digits by term.


The Ramanujan method is based on the Ramanujan formula

                   ----
  1                \     (4n)! (1103 + 26390 n)
---- = 2 (2)^(1/2) /    -----------------------
 Pi                ----  4^(4n) (n!)^4 99^(4n+2) 
                   n>=0

which adds roughly 8 decimal digits by term.

From these formulas, the algorithm used is the Brent binary splitting 
acceleration together with an efficient cache handling hermitian FFT
to multiply big integers. Details of the binary splitting method can be
found on my web site : 
http://xavier.gourdon.free.fr/Constants/Algorithms/splitting.html

Note : This approach has a theoritical complexity of O(n log(n)^3)
to compute n digits of Pi. The AGM techniques (Gauss-Salamin or
Borwein Quartic algorithm for example) have an asymptotically better
complexity of O(n log(n)^2). Nethertheless these
theoritical bounds are not practical (non cached-memory access
and data trashing make the practical complexity higher) and the
constants on front of the big O are important.
My 10 years experience on the subject have shown me that for a reachable
number of digits on actual machines, a well handled binary splitting
approach from Chudnovsky formula is better than any other known method.

For more detailed information about computation of mathematical
constants with a large number of digits, go to
http://xavier.gourdon.free.fr/Constants/constants.html

============
| Problems |
============

The program is not guaranteed free of bug. If some WARNING, ERROR
or strange messages appear, you can report them to me
(xavier.gourdon@free.fr) together with the PiFast.log file which should be
generated (or at least the input values), so I can correct the program. 
Changing a little bit your input should make the program run correctly.

=========================================
| What should be done for next versions |
=========================================

* Reduce the disk memory used during the computation
  (for space problems but also for disk access timing) by compressing
  the disk saved data.
* Have a verification based on the Fabrice Bellard algorithm (which
  computes the n-th bit of pi in time O(n log(n)) with memory O(1))
  to avoid the computation of pi by a second method (Ramanujan).
* Make versions of PiFast available for other platforms.
* Compute other constants (e for example).

Email me your requests for next versions !

==============
| Conclusion |
==============

For any remarks on the program, huge computations experience, requests
for next versions, email to xavier.gourdon@free.fr.
