<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>So Jake Says &#187; Hash table</title>
	<atom:link href="http://www.jakevoytko.com/blog/tag/hash-table/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jakevoytko.com/blog</link>
	<description>Ye Olde Computer Science Blogge</description>
	<lastBuildDate>Sun, 17 Jan 2010 15:16:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Number Theory, Hash Tables, and Geometric Progressions</title>
		<link>http://www.jakevoytko.com/blog/2007/09/30/number-theory-hash-tables-and-geometric-progressions/</link>
		<comments>http://www.jakevoytko.com/blog/2007/09/30/number-theory-hash-tables-and-geometric-progressions/#comments</comments>
		<pubDate>Sun, 30 Sep 2007 17:48:13 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[Geometric Sequence]]></category>
		<category><![CDATA[Hash table]]></category>
		<category><![CDATA[Number Theory]]></category>
		<category><![CDATA[Phi]]></category>
		<category><![CDATA[Primitive Root]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/2007/09/30/number-theory-hash-tables-and-geometric-progressions/</guid>
		<description><![CDATA[Or, and Loathing in Los Vegas What will this article focus on? This particular article looks at geometric sequences (mod n), and how we can use them instead of linear hashes. A geometric sequence is simply a sequence of powers of some number: 1, , , , &#8230; So instead of adding the same number [...]]]></description>
			<content:encoded><![CDATA[<p>Or, <strong><img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" > and Loathing in Los Vegas</strong></p>
<h3>What will this article focus on?</h3>
<p>This particular article looks at geometric sequences (mod <em>n</em>), and how we can use them instead of linear hashes. A <strong>geometric sequence</strong> is simply a sequence of powers of some number: 1, <img src='/blog/wp-content/plugins/latexrender/pictures/0cc175b9c0f1b6a831c399e269772661_1.0pt.gif' title='a' alt='a'  style="vertical-align:-1.0pt;" >, <img src='/blog/wp-content/plugins/latexrender/pictures/ebc3d7bedc1f11e08895c3124001cbb5_1.0pt.gif' title='a^2' alt='a^2'  style="vertical-align:-1.0pt;" >, <img src='/blog/wp-content/plugins/latexrender/pictures/0e12d972c205ea4de06749a887ff1ffe_1.0pt.gif' title='a^3' alt='a^3'  style="vertical-align:-1.0pt;" >, &#8230; So instead of adding the same number together a bunch of times, we&#8217;re multiplying it together a bunch of times. And then you subtract one. More on that below!</p>
<h3>First, the math</h3>
<p><strong>Euler&#8217;s Phi Function</strong></p>
<p>When Euler was attempting to generalize <a href="http://www.jakevoytko.com/blog/2007/09/16/number-theory-for-programmers-part-1/">Fermat&#8217;s Little Theorem</a>, he defined a function using the Greek symbol <img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" > (pronounced fee by most people I&#8217;ve encountered). It has a simple job: it takes in a natural number, <em>n</em>, and returns the number of positive integers less than <em>n</em> that are <a href="http://www.jakevoytko.com/blog/2007/09/23/number-theory-for-programmers-part-2/">relatively prime </a>to <em>n</em>. In this article, we are not concerned with <img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" >&#8216;s calculation for anything but prime numbers.</p>
<p>It is easy to show that <img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" >(p) = p-1 when p is prime: all numbers less than a prime are relatively prime to the prime in question, otherwise it wouldn&#8217;t be prime! Easy proof.</p>
<p>Euler&#8217;s phi function is of vital to the RSA encryption algorithm, and is the cornerstone of the generalization of Fermat&#8217;s Little Theorem, but it makes cameo appearances in many other areas of mathematics.</p>
<p>Examples:</p>
<p><img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" >(5) = 4, because gcd(5, 1) = gcd(5, 2) = gcd(5, 3), = gcd(5, 4) = 1.</p>
<p><img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" >(6) = 2, because gcd(6, 1) = gcd(6, 5) = 1, but gcd(6, 2) = 2, gcd(6, 3) = 3, and gcd(6, 4) = 2.</p>
<p><strong>Order of a number (mod n)</strong></p>
<p>The <strong>order</strong> of a number (mod n), where n is an integer, is the smallest positive value of <em>x</em> such that <img src='/blog/wp-content/plugins/latexrender/pictures/6043e99443887a278ea012378e2faf9a_3.5pt.gif' title='s^x \equiv 1(mod\ p)' alt='s^x \equiv 1(mod\ p)'  style="vertical-align:-3.5pt;" >. If it is never equal to 1, it is considered infinite. 6 (mod 10) is an example that never has an answer. Note that this still has a solution under Euler&#8217;s generalization of Fermat&#8217;s Little Theorem. The laws of the universe won&#8217;t let you off that easy.</p>
<p><strong>Example</strong>:</p>
<p>The order of 2 (mod 7) is 3, because [Unparseable or potentially dangerous latex formula. Error 1 ](prime) = prime-1, so <img src='/blog/wp-content/plugins/latexrender/pictures/1ed346930917426bc46d41e22cc525ec_2.94444pt.gif' title='\phi' alt='\phi'  style="vertical-align:-2.94444pt;" >(p) = p-1. If order(m) (mod p) is p-1, that means that m is a generator for all numbers (mod p) except p itself! Since this will not generate p, and 0 by extension, (since they are in the same congruence class), we must subtract our result by 1. So our generator is <em>m</em>, and our hash function is <img src='/blog/wp-content/plugins/latexrender/pictures/a37ed226c9e6b3696ff43f2451cd1c40.gif' title='a^x &amp;#8211; 1(mod\ prime)' alt='a^x &amp;#8211; 1(mod\ prime)'  align=absmiddle></p>
<p>It is not true that all numbers have a primitive root, but it WAS proved by Legendre that every prime has at least one generator (mod p). Interestingly, according to my college Number Theory textbook, Euler tried his hand at the proof, but was incorrect. To the uninitiated into the <em>Cult of Euler,</em> this would be akin to a team of Michael Jordan clones failing to score a single point in a basketball game against a team of middle school students.</p>
<p>We need to find one such that the first time this happens is for a power of p-1. Instead of testing every power, we can instead (because of this proof), just test powers where the power divides p-1. If we were looking mod 9, and we knew 3^8 == 1(mod p) (which it has to be because of Fermat&#8217;s Little Theorem), then <img src='/blog/wp-content/plugins/latexrender/pictures/d7b213cee95b4b6b3ab6b90cadfed175_1.0pt.gif' title='3^1' alt='3^1'  style="vertical-align:-1.0pt;" >, <img src='/blog/wp-content/plugins/latexrender/pictures/15a774bb3441106ae6145acd8b634821_1.0pt.gif' title='3^2' alt='3^2'  style="vertical-align:-1.0pt;" >, <img src='/blog/wp-content/plugins/latexrender/pictures/a0faf7b4c911b1fd4448c87db5067057_1.0pt.gif' title='3^4' alt='3^4'  style="vertical-align:-1.0pt;" >, and <img src='/blog/wp-content/plugins/latexrender/pictures/130693682fe4d9d5612c6bc6f7df878f_1.0pt.gif' title='3^8' alt='3^8'  style="vertical-align:-1.0pt;" > are the only possible powers that can be equal to one. We will call this the <strong>generator test</strong>. We can check these particular values quickly through successive squaring. If any of the powers of 3 less than 8 are congruent to 1, then we have a failure, and it is not a generator.</p>
<p>If you do not have access to a good way to factor p-1, the following naive method will work well for small numbers. Please note that the preferable way is to factor p-1 and to find all of the divisors of p-1 that way.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #666666;">// ***********************************************************************</span>
<span style="color: #666666;">// Precondition: p is a prime. If it is not, it will return 0 indicating</span>
<span style="color: #666666;">// failure</span>
<span style="color: #666666;">//</span>
<span style="color: #666666;">// This assumes that you are trying to do this for a small p, without being</span>
<span style="color: #666666;">// able to factorize p-1 quickly.</span>
<span style="color: #666666;">// ************************************************************************</span>
<span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">int</span> find_generator<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> p<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">int</span> phi_p<span style="color: #008000;">&#40;</span>p<span style="color: #000040;">-</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  std<span style="color: #008080;">::</span><span style="color: #007788;">vector</span> test_powers<span style="color: #008080;">;</span> 
&nbsp;
  <span style="color: #0000ff;">int</span> i<span style="color: #008080;">;</span> 
&nbsp;
  <span style="color: #0000ff;">for</span><span style="color: #008000;">&#40;</span>i<span style="color: #000080;">=</span><span style="color: #0000dd;">1</span><span style="color: #008080;">;</span> i
<span style="color: #000040;">&amp;</span>lt<span style="color: #008080;">;</span><span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> <span style="color: #000040;">--</span>i<span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span>powmod<span style="color: #008000;">&#40;</span>test, test_powers<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span>, p<span style="color: #008000;">&#41;</span> <span style="color: #000080;">==</span> <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span>
	<span style="color: #008000;">&#123;</span>
	      found <span style="color: #000080;">=</span> <span style="color: #0000ff;">false</span><span style="color: #008080;">;</span>
	      <span style="color: #0000ff;">break</span><span style="color: #008080;">;</span>
	<span style="color: #008000;">&#125;</span>
      <span style="color: #008000;">&#125;</span> 
&nbsp;
      <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span>found<span style="color: #008000;">&#41;</span>
      <span style="color: #008000;">&#123;</span>
	  <span style="color: #0000ff;">return</span> test<span style="color: #008080;">;</span>
      <span style="color: #008000;">&#125;</span>
   <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<h3>So what?</h3>
<p>If we have an element a (mod n) who has <img src='/blog/wp-content/plugins/latexrender/pictures/d16434743153552f195e740c1f93bd26_1.0pt.gif' title='a^{n-1} = 1' alt='a^{n-1} = 1'  style="vertical-align:-1.0pt;" >, and <img src='/blog/wp-content/plugins/latexrender/pictures/433d6e4ef1439c920d75200519547732_1.0pt.gif' title='a^{positive\ integer\ less\ than\ n}' alt='a^{positive\ integer\ less\ than\ n}'  style="vertical-align:-1.0pt;" > is not equal to 1, we have a<strong> generator</strong>. The generator is for a set of integers of size (p-1), which is even.</p>
<h3>Finding generators is nontrivial</h3>
<p>A downside to this method is that there is no free lunch when it comes to finding generators. You have to find one, although fortunately for us, most numbers have generators that are less than 10, so you can find them by linearly searching. There are a few strategies of how we can pick primes that will allow us to (relatively) quickly find a generator (mod p). The one I use is:</p>
<p>One strategy is finding a prime, <em>p</em>, such that <em>q = </em>2*<em>p</em> + 1 is also prime. The only two numbers that you have to check that violate our generator condition are 2 and p, in which case <em>q</em> is a generator. This helps reduce the complexity of the test. How do we know if our numbers are prime? Probabilistic primality testing, of course <img src='http://www.jakevoytko.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . It&#8217;s amazing how all of this stuff ties together.</p>
<p>A professor I had for a cryptology course said that the odds of the first generator NOT being less than 10 has been shown to be inordinately small, but I can&#8217;t for the life of me find any sort of reference to a figure that states that. As there is no trivial way to find a hash function, it is acceptable to search for the first generator (mod p) linearly, using our generator test, if you are looking for just any generator of p. Likewise, you can also find the largest such generator (mod p) by reverse searching.</p>
<p><strong>This is so complicated. Why would I use this over a linear hash?</strong></p>
<ul>
<li>The elements selected are not at a fixed interval, so data is usually less likely to cluster, which results in fewer collisions</li>
<li>It does better at the <strong>avalanche test</strong>, which says that when a bit of the input changes, at least half of the bits of the output should change. The linear hash fails miserably at this, and geometric hashes (depending on your generator, of course), perform better than their linear counterparts.</li>
</ul>
<p>Sometime in the future, (not in the next post, though), I will develop benchmarks to see what is better to deal with various different input scenarios. There&#8217;s no sense in developing the mathematics if we don&#8217;t actually put it all on the line and see if the &#8220;better&#8221; method works better in the real world. The real world has an amazing way of yelling &#8220;surprise!&#8221;, but we can limit that surprise through testing, testing, testing.</p>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=15&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2007/09/30/number-theory-hash-tables-and-geometric-progressions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Number Theory for Programmers, Part 2</title>
		<link>http://www.jakevoytko.com/blog/2007/09/23/number-theory-for-programmers-part-2/</link>
		<comments>http://www.jakevoytko.com/blog/2007/09/23/number-theory-for-programmers-part-2/#comments</comments>
		<pubDate>Sun, 23 Sep 2007 21:40:46 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Math]]></category>
		<category><![CDATA[GCD]]></category>
		<category><![CDATA[Greatest Common Divisor]]></category>
		<category><![CDATA[Hash]]></category>
		<category><![CDATA[Hash table]]></category>
		<category><![CDATA[Number Theory]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/2007/09/23/number-theory-for-programmers-part-2/</guid>
		<description><![CDATA[What is Number Theory? Number theory is the study of numbers, their properties, and what can be inferred from their properties. For programmers, it is most practical to focus on the theory of positive integers. Who should use this guide? Those who did not know the answer to the above question Those who are interested [...]]]></description>
			<content:encoded><![CDATA[<h3>What is Number Theory?</h3>
<p>Number theory is the study of numbers, their properties, and what can be inferred from their properties. For programmers, it is most practical to focus on the theory of positive integers.</p>
<h3>Who should use this guide?</h3>
<ul>
<li>Those who did not know the answer to the above question</li>
<li>Those who are interested in the math behind hash functions</li>
<li>Those who found my last article interesting</li>
</ul>
<h3>What will this article focus on?</h3>
<p>This article will focus on using the integers (mod <em>n</em>) as indices of a <a href="http://en.wikipedia.org/wiki/Hash_table">hash table</a>, and the math behind different choices of hash functions. Our goal is to find a &#8220;good&#8221; hash function (see below). The mathematical explanation will be done irrespective of Group Theory, and I may write another article to look at a hash table as a group over addition or multiplication of the integers (mod <em>n</em>). For a quick refresher of the (mod <em>n</em>) concept, go <a href="http://www.jakevoytko.com/blog/2007/09/16/number-theory-for-programmers-part-1/">here</a>, or for another explanation, please look <a href="http://www.math.csusb.edu/notes/rel/node4.html">here</a>.</p>
<h3>Useful Tools</h3>
<h3>Greatest Common Divisor (GCD) of positive integers</h3>
<p><strong>Explanation:</strong></p>
<p>Mathematically, the greatest common divisor of two numbers a and b is the product of all common divisors of a and b. For a simple explanation as to why, look <a href="http://en.wikipedia.org/wiki/Euclidean_algorithm#Proof">here</a>.</p>
<p><strong>The Algorithm:</strong></p>
<p><strong>Naive</strong>:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">int</span> gcd<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">int</span> a, <span style="color: #0000ff;">unsigned</span> <span style="color: #0000ff;">int</span> b<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">int</span> remaind<span style="color: #008080;">;</span>
&nbsp;
    <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span><span style="color: #000040;">!</span>a<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span> <span style="color: #0000ff;">return</span> b<span style="color: #008080;">;</span> <span style="color: #008000;">&#125;</span> <span style="color: #666666;">// gcd(a, 0) = a</span>
    <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span><span style="color: #000040;">!</span>b<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span><span style="color: #0000ff;">return</span> a<span style="color: #008080;">;</span> <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">if</span><span style="color: #008000;">&#40;</span>a <span style="color: #000040;">&amp;</span>lt<span style="color: #008080;">;</span> b<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        a <span style="color: #000040;">^</span><span style="color: #000080;">=</span> b<span style="color: #008080;">;</span>  <span style="color: #666666;">// Swap a and b in place</span>
        b <span style="color: #000040;">^</span><span style="color: #000080;">=</span> a<span style="color: #008080;">;</span>
        a <span style="color: #000040;">^</span><span style="color: #000080;">=</span> b<span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">while</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#40;</span>remaind <span style="color: #000080;">=</span> a <span style="color: #000040;">%</span> b<span style="color: #008000;">&#41;</span> <span style="color: #000040;">&amp;</span>gt<span style="color: #008080;">;</span> <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        a <span style="color: #000080;">=</span> b<span style="color: #008080;">;</span>
        b <span style="color: #000080;">=</span> remaind<span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0000ff;">return</span> b<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p><strong>Binary: </strong>(It&#8217;s <strong>always</strong> worth it to try to find the algorithms that take advantage of working with bits. If life gives you an integer as the sum of powers of two, make lemonade <img src='http://www.jakevoytko.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> )</p>
<p>Wikipedia has a <a href="http://en.wikipedia.org/wiki/Binary_GCD_algorithm">page</a> that explains a binary algorithm that takes advantage of the binary format of the data. It reduces the problem by stripping out common multiples of two, and then applying the binary analogy of the GCD algorithm. For more details, follow the above link. I haven&#8217;t benchmarked it, but it relies heavily on bit operations, so it should run a little faster on modern popular architectures.</p>
<h3>Least Common Multiple (LCM) of positive integers</h3>
<p><strong>Explanation:</strong></p>
<p>The least common multiple is as it sounds: the smallest multiple that both <em>a</em> and <em>b</em> share. For example:<br />
LCM(15, 20) = 60.</p>
<p><img src='/blog/wp-content/plugins/latexrender/pictures/cb3e98c4c0a1ad7600b28db8a0587ce6_1.0pt.gif' title=' 15 = 3^{1} * 5^{1} ' alt=' 15 = 3^{1} * 5^{1} '  style="vertical-align:-1.0pt;" ><br />
<img src='/blog/wp-content/plugins/latexrender/pictures/25f505db899a6b15e31310cfe2837b22_1.0pt.gif' title='20 = 2^{2} * 5^{1}' alt='20 = 2^{2} * 5^{1}'  style="vertical-align:-1.0pt;" ><br />
<img src='/blog/wp-content/plugins/latexrender/pictures/8fd773a9cfb91b509f5943cfeed1ae0d_1.0pt.gif' title='60 = 3^{1} * 2 ^{2} * 5^{1}' alt='60 = 3^{1} * 2 ^{2} * 5^{1}'  style="vertical-align:-1.0pt;" ></p>
<p>It appears that for each prime, the LCM of <em>a</em> and <em>b</em> includes the largest power from either <em>a</em> or <em>b</em>. In fact, this is true.</p>
<h3>Relation between GCD and LCM</h3>
<p>For integers a and b:</p>
<p><img src='/blog/wp-content/plugins/latexrender/pictures/ed271db0080f343fce6f6125b77c3872_3.5pt.gif' title='LCM(a, b)\ *\ GCD(a, b)\ =\ a\ *\ b' alt='LCM(a, b)\ *\ GCD(a, b)\ =\ a\ *\ b'  style="vertical-align:-3.5pt;" ></p>
<p>This is very powerful, and lets us efficiently calculate the LCM of a and b by dividing out the GCD of a * b. Why does this work? If <em>a</em> and <em>b</em> don&#8217;t have any prime factors in common, clearly the only way that we can have a multiple of <em>a</em> equal some multiple of <em>b</em> is by multiplying <em>b</em> by <em>a</em>. If <em>a</em> and <em>b</em> only have one prime factor in common (let&#8217;s call it <em>d</em>), if you multiply <em>a</em> by <em>b</em>, we get a*b as an answer. However, (a*b)/d is clearly a multiple of both <em>a</em> and <em>b</em>. We don&#8217;t need to multiply <em>a</em> by <em>d</em>, since <em>a</em> already HAS <em>d</em> as a factor. <em>d</em> is uncoincidentally the GCD of <em>a</em> and <em>b</em>, and clearly, GCD(a, b) * LCM(a, b) = a * b. An actual proof is left as an exercise to the reader.</p>
<h3>What makes a good hash // hash table?</h3>
<p>The short answer is that nobody knows. Hashes that work well for some kinds of inputs can produce intractable results for other kinds of input. For our purposes, we will say that a good hash function minimizes the odds of two different inputs ending up in the same congruence class (mod <em>n</em>). When two different inputs DO end up in the same index, this is called a <strong>collision</strong>, and is as undesirable in hash tables as it is while driving. Also bad is <strong>clustering</strong>, which is when collisions are much more likely to happen in certain indices than in other indices.</p>
<p>Ideally, we would like the hash function to be able to place elements at any index in the table. This makes it a <strong>generator</strong>, namely, it can generate any value in the table.</p>
<p>We will try to find a happy medium of all concerns through experimentation. I will define a few different hash functions in the upcoming articles, and then will show how to compare them. That will be where the &#8220;<a href="http://www.xkcd.com/store/try_science_shirt_300.png">Science</a>&#8221; part of Computer Science enters the picture <img src='http://www.jakevoytko.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>Linear Hashes<br />
</strong></p>
<p>Linear hashes take in some number <em>x</em>, and place the object in the index <em>ax </em>+ <em>b </em>(mod <em>n</em>). To make the mathematics easier, we will just use <em>ax</em>(mod <em>n</em>), as it should be obvious that adding <em>b</em> produces the set in the same order, but with a different starting point. In order for us to consider <em>a</em> as a hash function, <em>a</em> must be a generator (mod <em>n</em>). How do we know that it does that? Let&#8217;s look at a few different values of <em>a</em> (mod 16).</p>
<p><em><img src='/blog/wp-content/plugins/latexrender/pictures/701067ed5d646af1c269d1bb85bd3e69_1.0pt.gif' title='2^1 = 2' alt='2^1 = 2'  style="vertical-align:-1.0pt;" > </em>: {2, 4, 6, 8, 10, 12, 14, 0, 2} (mod 16) (doesn&#8217;t generate the integers (mod 16))</p>
<p><em><img src='/blog/wp-content/plugins/latexrender/pictures/d7b213cee95b4b6b3ab6b90cadfed175_1.0pt.gif' title='3^1' alt='3^1'  style="vertical-align:-1.0pt;" > = 3</em>: {3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, 4, 7, 9, 12, 15, 3} (mod 16) (generates the integers (mod 16))</p>
<p><em>2*3 = 6</em>: {6, 12, 2, 8, 14, 4, 10, 0, 6} (mod 16) (doesn&#8217;t generate the integers (mod 16)).</p>
<p>So what works? It works when gcd(<em>a</em>, <em>n</em>) = 1. This is known as being <strong>relatively prime</strong> or <strong>coprime</strong>, meaning they don&#8217;t share any common prime factors. <img src='/blog/wp-content/plugins/latexrender/pictures/d7b213cee95b4b6b3ab6b90cadfed175_1.0pt.gif' title='3^1' alt='3^1'  style="vertical-align:-1.0pt;" > and <img src='/blog/wp-content/plugins/latexrender/pictures/27eac782422adb62c41a6f3c2c99a5d1_1.0pt.gif' title='2^4' alt='2^4'  style="vertical-align:-1.0pt;" > obviously don&#8217;t share any prime factors, so 3 is a generator using addition (mod 16).</p>
<p>Why does that work?  The largest possible multiple of a that will give us 0 (mod n) is n, because a*n == a * 0 == 0 (mod n). We need to make the LCM of a and n equal to a * n, and since we know that LCM(a, n) = a * n / GCD(a, n), it follows that GCD(a, n) = 1.</p>
<p>Since most hash tables you make will have 2^n elements (this seems to be the standard, for addressing reasons), any odd number <em>a</em> will suffice to be a generator for linear hashes.</p>
<p><strong>Theoretically, which hash value should I use?</strong></p>
<p>Linear hashing is obviously a very simple hash function (the simplest one there is, I believe), and therefore, there is not a single hash fucntion that will work for every input set. In fact, this type of hash will have many input sets that will make it have very poor performance. However, if we have advanced knowledge of the kind of data that will be the input, we can stack the deck in our favor.</p>
<p>If your data is guaranteed to have no collisions (mapping unique integers less than the size of the hash table to some value), you can use any positive integer you want as your hash. I recommend 1 for ease of calculation <img src='http://www.jakevoytko.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>If your data is sorted ascending, use hash values close to 1. If you can find the mode of the data in advance, you can yourself by setting the hash value larger than the mode. If the mode is large with respect to the size of the hash table or with respect to the size of the data set, you can make the hash value larger than the average number of repetitions for each input.</p>
<p>If your data is sorted descending, you want to do as above, except make your hash value close to n-1. The reasoning can be derived from the above paragraph.</p>
<p>If your data is either purely random, or of several different varieties, your hash function is not always going to work no matter how hard you try. We should avoid hashes close to <em>1</em> and <em>n-1, </em>but other than that, we will need to benchmark to see if there is a better value.</p>
<p><script type="text/javascript"><!--
  amazon_ad_tag = "jakvoyshom-20";  amazon_ad_width = "468";  amazon_ad_height = "60";
// --></script><br />
<script src="http://www.assoc-amazon.com/s/ads.js" type="text/javascript"></script></p>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=14&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2007/09/23/number-theory-for-programmers-part-2/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
