GCDs and the Euclidean Algorithm
This is Lab 1 from Clint by Hugh Montgomery, edited by Elad Zelingher. All rights reserved to Hugh Montgomery.
Problem 1
The gcd symbol is defined for any pair of integers, not both equal to 0. This quantity enjoys four basic identities:
- ;
- for all integers ;
- ;
- .
By using these identities systematically (recall pp. 10-12 of NZM), we may reduce the size of the arguments until iv) applies, and the value emerges. For example,
Apply this reasoning to calculate , and use Simple EuAlgDem to verify your arithmetic.
In the calculation displayed above, we have written down more than we need. Since each new number is the remainder after division, it suffices to write down only these remainders, . The gcd of any two consecutive members of this sequence is constant throughout. When we consider the last two numbers, we see that the gcd is the last positive remainder in the sequence. Sequences generated in this way are very rapidly decreasing, and hence are not very long. Thus the gcd is much more quickly determined by this method-known as the Euclidean Algorithm. The program GCD uses this faster method to evaluate gcds.
The Euclidean Algorithm can be modified in various ways to make it still faster. For example, in performing divisions, one may round to the nearest integer instead of rounding down. This gives rise to negative remainders, but the remainders decrease in absolute value a little faster than formerly. For example, to calculate in this way, we would generate the sequence of remainders . This sequence saves one step over the sequence above.
Problem 2
The program LnComTab displays linear combinations of two given integers and . Start with . Note that the resulting table is antisymmetric about the origin. Why is this? What is the smallest non-zero number (in absolute value) that you see? How is this related to ? (Recall Theorem 1.4 of NZM.) Is the table periodic in any way? At what locations do you find a zero? Let denote a collection of 5 consecutive columns. Show that every number that occurs somewhere in the table is found exactly once in . Do the same for , which consists of 3 consecutive rows of the table. Where do the numbers "5" and "3" come from? What would they be replaced by, if the values of and were changed? Take now . How is this new table related to the one you were looking at before? Note that small values on the table follow a line pointing roughly NorthWest-SouthEast. Use the center X and center Y controls to follow these small values. What is the slope of the line along which these small values lie?
Problem 3
Use EuAlgDem with the arguments . The remainders are now presented in a neat table. Use the arguments and . The two arguments have been reversed. What effect does this have on the sequence of remainders generated? Does this persist in general? Can you prove it? Substitute some large arguments, say . The table of remainders is now too large to fit on one screen. Scroll down and up through the table.
Problem 4
Let be the index of the last positive remainder in the sequence of remainders generated by the Euclidean Algorithm, so that and . Thus divisions have been performed in the calculation. For given and , the value of is easily determined by reading the index of the bottom line in the table provided by EuAlgDem. Using this program, try to answer the following questions. What is the least pair of integers with such that ? ? ? ? Can you spot a pattern? Can you prove that it persists?
Problem 5
Use EuAlgDem to determine the value of . If and are large, is necessarily large? For 10 different pairs of "randomly chosen" large values of and , record the value of . How large is on average?
Problem 6
The quotients generated by the Euclidean Algorithm are displayed in the table provided by EuAlgDem. By hand calculation, find a pair of integers, each , such that for all . (Hint: Start at the end and work back toward the beginning. If and then
and so on.) Similarly, find a pair and of integers for which for all . In both cases, confirm your results by using EuAlgDem. Quite clearly, the sequence of could be anything. However, for most pairs the follow a definite statistical distribution. About 0.415 of them are , about 0.170 of them are , about 0.093 of them are , and so on. More precisely, we expect that for a proportion of approximately
of the . Gauss claimed to have proved this, but his proof (if he had one) is unknown. The first known proof was given in 1928 by R. O. Kuz'min. Using modern tools, one finds this result as an easy consequence of the ergodic theorem. Choose a pair of large integers at random, and use EuAlgDem to generate the . How close to the expected distribution are the ?
Problem 7
As is discussed on pp. 13-15 of NZM, each remainder generated by the Euclidean Algorithm is a linear combination of the and that initiated the sequence. That is, . These and are not uniquely determined. (For example, if we replace by and at the same time replace by then the value of is unchanged.) However, one set of natural values for the and is given by the recursions
Indeed, it is this same recursion,
that generates the . These and are displayed by EuAlgDem. What do you note about the signs of these numbers? About their absolute values? Can you prove that these patterns hold in general? What values are taken on by ?
Problem 8
The program GCDTab displays the greatest common divisors of pairs of integers. After invoking this program, use the center X and center Y controls to move away from the origin. Each gcd displayed is calculated (by the Euclidean Algorithm, of course) and then immediately written to the screen. Admire how quickly this is accomplished. What value occurs most frequently? Enter and to move to a new location in the screen. Note that there are two columns near the middle of the screen that consist entirely of 1's. Use the center X and center Y controls to examine more entries in these columns. Why do these columns contain so many 1's? Where in these columns will one find larger entries?
Problem 9
In EuAlgDem, the quotients are initially determined by rounding down,
but one can switch to rounding to the nearest integer by clicking the checkbox. For several pairs , compare the two calculations. How many steps are saved when rounding to the nearest integer? Is there much in common among the two sets of ? Try the pairs that you found in Problem 6 with all and all . What do you find?
Problem 10
For the programmer. Write a program in which and run independently from 1 to some number . For each pair, evaluate . Count the number of pairs for which . What is the proportion ? How close is it to ? How does the running time of this program depend on ? For any fixed , the density of pairs , for which is asymptotically . You could write your program so as to track the incidence of other small values of the gcd.
Problem 11
For the hopelessly addicted programmer. Construct a routine that evaluates . Use this in a program that chooses pairs of large integers at random, and tabulates the value of . For 10,000 such pairs, say, how are the values of distributed? What is their mean? Max? Min? Standard deviation? For the theory behind this, consult J. Dixon, The number of steps in the Euclidean Algorithm, J. Number Theory 2 (1970), 414-422. How close are your numerical values to the theoretical prediction?
Problem 12
For the truly ambitious. In addition to rounding to the nearest integer, the Euclidean Algorithm may be enhanced by removing powers of 2 whenever possible. Your machine knows as a string of binary digits, so the power of 2 dividing can be read as a block of trailing 0 's. One may divide by 2 by right-shifting the binary expansion. This is much faster than long division (or at least it should be). Suppose that and that . Put , and set . Then . Now use the following identities, as appropriate:
- (use this to ensure that ,
- (use this when and and are both odd),
- (if is even and is odd),
- if .
The point here is that if and are odd then is even, so that (c) becomes applicable after (b) has been applied. Division by 2 is accomplished by right-shifting the binary expansion. Thus the usual division, which is slow, is avoided.