GNU MP 6.0.0: Subquadratic GCD

15.3.3 Subquadratic GCD

For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD (Half GCD) function, as a generalization to Lehmer’s algorithm.

Let the inputs a,b be of size N limbs each. Put S = floor(N/2) + 1. Then HGCD(a,b) returns a transformation matrix T with non-negative elements, and reduced numbers (c;d) = T^{-1} (a;b). The reduced numbers c,d must be larger than S limbs, while their difference abs(c-d) must fit in S limbs. The matrix elements will also be of size roughly N/2.

The HGCD base case uses Lehmer’s algorithm, but with the above stop condition that returns reduced numbers and the corresponding transformation matrix half-way through. For inputs larger than HGCD_THRESHOLD, HGCD is computed recursively, using the divide and conquer algorithm in “On Schönhage’s algorithm and subquadratic integer GCD computation” by Möller (see References). The recursive algorithm consists of these main steps.

Call HGCD recursively, on the most significant N/2 limbs. Apply the resulting matrix T_1 to the full numbers, reducing them to a size just above 3N/2.
Perform a small number of division or subtraction steps to reduce the numbers to size below 3N/2. This is essential mainly for the unlikely case of large quotients.
Call HGCD recursively, on the most significant N/2 limbs of the reduced numbers. Apply the resulting matrix T_2 to the full numbers, reducing them to a size just above N/2.
Compute T = T_1 T_2.
Perform a small number of division and subtraction steps to satisfy the requirements, and return.

GCD is then implemented as a loop around HGCD, similarly to Lehmer’s algorithm. Where Lehmer repeatedly chops off the top two limbs, calls mpn_hgcd2, and applies the resulting matrix to the full numbers, the subquadratic GCD chops off the most significant third of the limbs (the proportion is a tuning parameter, and 1/3 seems to be more efficient than, e.g, 1/2), calls mpn_hgcd, and applies the resulting matrix. Once the input numbers are reduced to size below GCD_DC_THRESHOLD, Lehmer’s algorithm is used for the rest of the work.

The asymptotic running time of both HGCD and GCD is O(M(N)*log(N)), where M(N) is the time for multiplying two N-limb numbers.