GCDs and the Euclidean Algorithm
(Lab 1 from Clint By Hugh Montgomery, Edited by Elad Zelingher, all rights reserved to Hugh Montgomery)
Problem 1
The gcd symbol (b, c) is defined for any pair of integers, not both equal to 0. This quantity enjoys four basic identities:
- (b, c)=(-b, c);
- (b, c)=(b+m c, c) for all integers m;
- (b, c)=(c, b);
- (b, 0)=\left|b\right|.
By using these identities systematically (recall pp. 10-12 of NZM), we may reduce the size of the arguments until iv) applies, and the value emerges. For example,
Apply this reasoning to calculate (127,49), and use Simple EuAlgDem to verify your arithmetic.
In the calculation displayed above, we have written down more than we need. Since each new number is the remainder after division, it suffices to write down only these remainders, 31,12,7,5,2,1,0. The gcd of any two consecutive members of this sequence is constant throughout. When we consider the last two numbers, we see that the gcd is the last positive remainder in the sequence. Sequences generated in this way are very rapidly decreasing, and hence are not very long. Thus the gcd is much more quickly determined by this method-known as the Euclidean Algorithm. The program GCD uses this faster method to evaluate gcds.
The Euclidean Algorithm can be modified in various ways to make it still faster. For example, in performing divisions, one may round to the nearest integer instead of rounding down. This gives rise to negative remainders, but the remainders decrease in absolute value a little faster than formerly. For example, to calculate (31,12) in this way, we would generate the sequence of remainders 31,12,-5,2,1,0. This sequence saves one step over the sequence above.
Problem 2
The program LnComTab displays linear combinations x b+y c of two given integers b and c. Start with b=9, c=15. Note that the resulting table is antisymmetric about the origin. Why is this? What is the smallest non-zero number (in absolute value) that you see? How is this related to (9,15)? (Recall Theorem 1.4 of NZM.) Is the table periodic in any way? At what locations do you find a zero? Let \mathcal{C} denote a collection of 5 consecutive columns. Show that every number that occurs somewhere in the table is found exactly once in \mathcal{C}. Do the same for \mathcal{R}, which consists of 3 consecutive rows of the table. Where do the numbers "5" and "3" come from? What would they be replaced by, if the values of b and c were changed? Take now b=3, c=5. How is this new table related to the one you were looking at before? Note that small values on the table follow a line pointing roughly NorthWest-SouthEast. Use the center X and center Y controls to follow these small values. What is the slope of the line along which these small values lie?
Problem 3
Use EuAlgDem with the arguments b=12345, c=54321. The remainders are now presented in a neat table. Use the arguments b = 54321 and c = 12345. The two arguments have been reversed. What effect does this have on the sequence of remainders generated? Does this persist in general? Can you prove it? Substitute some large arguments, say >10^{16}. The table of remainders is now too large to fit on one screen. Scroll down and up through the table.
Problem 4
Let j=j(b, c) be the index of the last positive remainder in the sequence of remainders generated by the Euclidean Algorithm, so that r_j=(b, c) and r_{j+1}=0. Thus j+1 divisions have been performed in the calculation. For given b and c, the value of j(b, c) is easily determined by reading the index of the bottom line in the table provided by EuAlgDem. Using this program, try to answer the following questions. What is the least pair of integers b, c with b \geq c>0 such that j(b, c)=1? =2? =3? =4? Can you spot a pattern? Can you prove that it persists?
Problem 5
Use EuAlgDem to determine the value of j(b, c). If b and c are large, is j(b, c) necessarily large? For 10 different pairs of "randomly chosen" large values of b and c, record the value of j(b, c). How large is j(b, c) on average?
Problem 6
The quotients q_i generated by the Euclidean Algorithm are displayed in the table provided by EuAlgDem. By hand calculation, find a pair b, c of integers, each >10^5, such that q_i=1 for all i. (Hint: Start at the end and work back toward the beginning. If r_j=1 and r_{j-1}=1 then
and so on.) Similarly, find a pair b and c of integers >10^5 for which q_i=10 for all i. In both cases, confirm your results by using EuAlgDem. Quite clearly, the sequence of q_i could be anything. However, for most pairs b, c the q_i follow a definite statistical distribution. About 0.415 of them are =1, about 0.170 of them are =2, about 0.093 of them are =3, and so on. More precisely, we expect that q_i=k for a proportion of approximately $$ (\log (1+1 / k)-\log (1+1 /(k+1))) / \log 2$$ of the i. Gauss claimed to have proved this, but his proof (if he had one) is unknown. The first known proof was given in 1928 by R. O. Kuz'min. Using modern tools, one finds this result as an easy consequence of the ergodic theorem. Choose a pair of large integers b, c at random, and use EuAlgDem to generate the q_i. How close to the expected distribution are the q_i?
Problem 7
As is discussed on pp. 13-15 of NZM, each remainder r_i generated by the Euclidean Algorithm is a linear combination of the b and c that initiated the sequence. That is, r_i=x_i b+y_i c. These x_i and y_i are not uniquely determined. (For example, if we replace x_i by x_i+c and at the same time replace y_i by y_i-b then the value of x_i b+y_i c is unchanged.) However, one set of natural values for the x_i and y_i is given by the recursions
Indeed, it is this same recursion, $$ r_i = r_{i-2} - q_i r_{i-1} $$ that generates the r_i. These x_i and y_i are displayed by EuAlgDem. What do you note about the signs of these numbers? About their absolute values? Can you prove that these patterns hold in general? What values are taken on by x_i y_{i-1}-x_{i-1} y_i?
Problem 8
The program GCDTab displays the greatest common divisors of pairs of integers. After invoking this program, use the center X and center Y controls to move away from the origin. Each gcd displayed is calculated (by the Euclidean Algorithm, of course) and then immediately written to the screen. Admire how quickly this is accomplished. What value occurs most frequently? Enter b=3300 and c=2200 to move to a new location in the screen. Note that there are two columns near the middle of the screen that consist entirely of 1's. Use the center X and center Y controls to examine more entries in these columns. Why do these columns contain so many 1's? Where in these columns will one find larger entries?
Problem 9
In EuAlgDem, the quotients are initially determined by rounding down, $$ q_i = \left[ r_{i-2} / r_{i-1} \right] $$ but one can switch to rounding to the nearest integer by clicking the checkbox. For several pairs b, c compare the two calculations. How many steps are saved when rounding to the nearest integer? Is there much in common among the two sets of r_i? Try the pairs b, c that you found in Problem 6 with all q_i=1 and all q_i=10. What do you find?
Problem 10
For the programmer. Write a program in which b and c run independently from 1 to some number N. For each pair, evaluate (b, c). Count the number K of pairs for which (b, c)=1. What is the proportion K / N^2? How close is it to 6 / \pi^2? How does the running time of this program depend on N? For any fixed g>0, the density of pairs b, c, for which (b, c)=g is asymptotically 6 /(\pi g)^2. You could write your program so as to track the incidence of other small values of the gcd.
Problem 11
For the hopelessly addicted programmer. Construct a routine that evaluates j(b, c). Use this in a program that chooses pairs b, c of large integers \left(\approx 10^{17}\right) at random, and tabulates the value of j(b, c). For 10,000 such pairs, say, how are the values of j distributed? What is their mean? Max? Min? Standard deviation? For the theory behind this, consult J. Dixon, The number of steps in the Euclidean Algorithm, J. Number Theory 2 (1970), 414-422. How close are your numerical values to the theoretical prediction?
Problem 12
For the truly ambitious. In addition to rounding to the nearest integer, the Euclidean Algorithm may be enhanced by removing powers of 2 whenever possible. Your machine knows b as a string of binary digits, so the power of 2 dividing b can be read as a block of trailing 0 's. One may divide by 2 by right-shifting the binary expansion. This is much faster than long division (or at least it should be). Suppose that 2^j \| b and that 2^k \| c. Put b'=b / 2^j, c'=c / 2^k, and set m=\min (j, k). Then (b, c)=2^m\left(b', c'\right). Now use the following identities, as appropriate:
- (b, c)=(c, b) \quad (use this to ensure that b \geq c>0),
- (b, c)=(b-c, c) \quad (use this when b \geq c>0 and b and c are both odd),
- (b, c)=(b / 2, c) (if b is even and c is odd),
- (b, 0)=b \quad( if b>0).
The point here is that if b and c are odd then b-c is even, so that (c) becomes applicable after (b) has been applied. Division by 2 is accomplished by right-shifting the binary expansion. Thus the usual division, which is slow, is avoided.