Number Theory for Programmers, Part 2
What is Number Theory?
Number theory is the study of numbers, their properties, and what can be inferred from their properties. For programmers, it is most practical to focus on the theory of positive integers.
Who should use this guide?
- Those who did not know the answer to the above question
- Those who are interested in the math behind hash functions
- Those who found my last article interesting
What will this article focus on?
This article will focus on using the integers (mod n) as indices of a hash table, and the math behind different choices of hash functions. Our goal is to find a “good” hash function (see below). The mathematical explanation will be done irrespective of Group Theory, and I may write another article to look at a hash table as a group over addition or multiplication of the integers (mod n). For a quick refresher of the (mod n) concept, go here, or for another explanation, please look here.
Useful Tools
Greatest Common Divisor (GCD) of positive integers
Explanation:
Mathematically, the greatest common divisor of two numbers a and b is the product of all common divisors of a and b. For a simple explanation as to why, look here.
The Algorithm:
Naive:
unsigned int gcd(unsigned int a, unsigned int b) { int remaind; if(!a) { return b; } // gcd(a, 0) = a if(!b) {return a; } if(a < b) { a ^= b; // Swap a and b in place b ^= a; a ^= b; } while((remaind = a % b) > 0) { a = b; b = remaind; } return b; }
Binary: (It’s always worth it to try to find the algorithms that take advantage of working with bits. If life gives you an integer as the sum of powers of two, make lemonade
)
Wikipedia has a page that explains a binary algorithm that takes advantage of the binary format of the data. It reduces the problem by stripping out common multiples of two, and then applying the binary analogy of the GCD algorithm. For more details, follow the above link. I haven’t benchmarked it, but it relies heavily on bit operations, so it should run a little faster on modern popular architectures.
Least Common Multiple (LCM) of positive integers
Explanation:
The least common multiple is as it sounds: the smallest multiple that both a and b share. For example:
LCM(15, 20) = 60.



It appears that for each prime, the LCM of a and b includes the largest power from either a or b. In fact, this is true.
Relation between GCD and LCM
For integers a and b:

This is very powerful, and lets us efficiently calculate the LCM of a and b by dividing out the GCD of a * b. Why does this work? If a and b don’t have any prime factors in common, clearly the only way that we can have a multiple of a equal some multiple of b is by multiplying b by a. If a and b only have one prime factor in common (let’s call it d), if you multiply a by b, we get a*b as an answer. However, (a*b)/d is clearly a multiple of both a and b. We don’t need to multiply a by d, since a already HAS d as a factor. d is uncoincidentally the GCD of a and b, and clearly, GCD(a, b) * LCM(a, b) = a * b. An actual proof is left as an exercise to the reader.
What makes a good hash // hash table?
The short answer is that nobody knows. Hashes that work well for some kinds of inputs can produce intractable results for other kinds of input. For our purposes, we will say that a good hash function minimizes the odds of two different inputs ending up in the same congruence class (mod n). When two different inputs DO end up in the same index, this is called a collision, and is as undesirable in hash tables as it is while driving. Also bad is clustering, which is when collisions are much more likely to happen in certain indices than in other indices.
Ideally, we would like the hash function to be able to place elements at any index in the table. This makes it a generator, namely, it can generate any value in the table.
We will try to find a happy medium of all concerns through experimentation. I will define a few different hash functions in the upcoming articles, and then will show how to compare them. That will be where the “Science” part of Computer Science enters the picture
Linear Hashes
Linear hashes take in some number x, and place the object in the index ax + b (mod n). To make the mathematics easier, we will just use ax(mod n), as it should be obvious that adding b produces the set in the same order, but with a different starting point. In order for us to consider a as a hash function, a must be a generator (mod n). How do we know that it does that? Let’s look at a few different values of a (mod 16).
: {2, 4, 6, 8, 10, 12, 14, 0, 2} (mod 16) (doesn’t generate the integers (mod 16))
= 3: {3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, 4, 7, 9, 12, 15, 3} (mod 16) (generates the integers (mod 16))
2*3 = 6: {6, 12, 2, 8, 14, 4, 10, 0, 6} (mod 16) (doesn’t generate the integers (mod 16)).
So what works? It works when gcd(a, n) = 1. This is known as being relatively prime or coprime, meaning they don’t share any common prime factors.
and
obviously don’t share any prime factors, so 3 is a generator using addition (mod 16).
Why does that work? The largest possible multiple of a that will give us 0 (mod n) is n, because a*n == a * 0 == 0 (mod n). We need to make the LCM of a and n equal to a * n, and since we know that LCM(a, n) = a * n / GCD(a, n), it follows that GCD(a, n) = 1.
Since most hash tables you make will have 2^n elements (this seems to be the standard, for addressing reasons), any odd number a will suffice to be a generator for linear hashes.
Theoretically, which hash value should I use?
Linear hashing is obviously a very simple hash function (the simplest one there is, I believe), and therefore, there is not a single hash fucntion that will work for every input set. In fact, this type of hash will have many input sets that will make it have very poor performance. However, if we have advanced knowledge of the kind of data that will be the input, we can stack the deck in our favor.
If your data is guaranteed to have no collisions (mapping unique integers less than the size of the hash table to some value), you can use any positive integer you want as your hash. I recommend 1 for ease of calculation
If your data is sorted ascending, use hash values close to 1. If you can find the mode of the data in advance, you can yourself by setting the hash value larger than the mode. If the mode is large with respect to the size of the hash table or with respect to the size of the data set, you can make the hash value larger than the average number of repetitions for each input.
If your data is sorted descending, you want to do as above, except make your hash value close to n-1. The reasoning can be derived from the above paragraph.
If your data is either purely random, or of several different varieties, your hash function is not always going to work no matter how hard you try. We should avoid hashes close to 1 and n-1, but other than that, we will need to benchmark to see if there is a better value.
Popularity: 34% [?]

Reader Comments
Good post. I note that in your statement “clearly the only way that we can have a multiple of a equal a multiple of b is by multiplying it by a” the last “it” can be misinterpreted as “a”, which I don’t believe is what you intended to state. Maybe replacing this last “it” by “the latter” would be more explicit, and I’m sure you can think of other ways as well. A bit picky, I realize, but precision of thought and expression are so crucial in mathematics.
Since you used this pedantic form, I’ll indulge myself as well in pointing out that on modern CPUs with out-of-order pipelines, this is actually slower than using a temp variable, because the instructions aren’t parallelizable.
Aside from my irrelevant nit, thanks for the interesting article. Number theorists amaze me with their knack for discovering seemingly arbitrary, and in the latest several decades so useful, properties of numbers.
“Linear hashing is obviously a very simple hash … the simplest … I believe”
Theres always the Identity hash.
Aside from that somewhat facetious note- good article, I don’t know what your math background is- but if you haven’t looked at Group Theory, That has lots of implications on Hashing, and cryptography in general (notably, RSA is a fairly simple bit of group theory, involving cyclic groups and some theorems about how the order of an element changes).
Keep on with the math!
@Joe:
I’m in college, with a CS major // Math minor. I’m taking an Abstract Algebra course at the moment, and I may write another set of articles from the perspective of Group Theory. Thanks for the feedback!