<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>So Jake Says &#187; Python</title>
	<atom:link href="http://www.jakevoytko.com/blog/tag/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jakevoytko.com/blog</link>
	<description>Ye Olde Computer Science Blogge</description>
	<lastBuildDate>Sun, 17 Jan 2010 15:16:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Buddy, Can You Spare A Namespace?</title>
		<link>http://www.jakevoytko.com/blog/2008/09/15/buddy-can-you-spare-a-namespace/</link>
		<comments>http://www.jakevoytko.com/blog/2008/09/15/buddy-can-you-spare-a-namespace/#comments</comments>
		<pubDate>Mon, 15 Sep 2008 04:00:13 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Politics]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Languages]]></category>
		<category><![CDATA[Quote]]></category>
		<category><![CDATA[C++ compiler]]></category>
		<category><![CDATA[CLAPACK]]></category>
		<category><![CDATA[max() problem]]></category>
		<category><![CDATA[name collisions]]></category>
		<category><![CDATA[Namespace]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/?p=146</guid>
		<description><![CDATA[C++, Namespaces, and Macros You&#8217;re a careful C++ programmer. You use namespaces obsessively, your classes are well-organized, and your files are carefully placed in a nice hierarchy. Name collisions aren&#8217;t going to strike your code! Well, you actually can&#8217;t guarantee that: Namespaces are completely ignored by the macro preprocessor. Short of drastic measure (such as [...]]]></description>
			<content:encoded><![CDATA[<h2>C++, Namespaces, and Macros</h2>
<p>You&#8217;re a careful C++ programmer. You use namespaces obsessively, your classes are well-organized, and your files are carefully placed in a nice hierarchy. Name collisions aren&#8217;t going to strike your code!</p>
<p>Well, you actually can&#8217;t guarantee that: Namespaces are completely ignored by the macro preprocessor. Short of drastic measure (such as using #undef on every symbol you use) it is <em>impossible</em> to tell the C++ macro preprocessor, &#8220;You cannot pass!&#8221;</p>
<div id="attachment_158" class="wp-caption aligncenter" style="width: 460px"><a href="http://www.jakevoytko.com/blog/wp-content/uploads/2008/09/cpp-compilation.jpg"><img class="size-full wp-image-158" title="cpp-compilation" src="http://www.jakevoytko.com/blog/wp-content/uploads/2008/09/cpp-compilation.jpg" alt="C++ Compiler at Work" width="450" height="187" /></a><p class="wp-caption-text">C++ Compiler at Work</p></div>
<p>If a macro is defined before it reaches your class, the preprocessor will try to expand all names in your class.</p>
<p>Does this <span style="text-decoration: underline;">ever</span> happen? Sure! For example, <a href="http://en.wikipedia.org/wiki/LAPACK">CLAPACK</a> defines a macro named &#8220;max,&#8221; which finds the maximum value of two elements. The C++ standard library offers std::numerical_limits&lt;T&gt;::max(), which gives you the maximum value of the T datatype. In fact, CLAPACK and similar libraries can reach through time and hamper std::numerical_limits&#8217; compilation in any project. Surprise!</p>
<p>One name. Two valid uses. One compiler error.</p>
<p>Sure, CLAPACK is a C library, and one has to make allowances for this. In lieu of function overloading, macros are the only tool for approximating generic functions. On the other hand, <a href="http://www.youtube.com/watch?v=IvwyOrTB-rQ">COME ON</a>! Someone else is going to use the name &#8220;max.&#8221;</p>
<p>I don&#8217;t mean to pick on CLAPACK! It&#8217;s a nice library. I like it. In fact, there is are <a href="http://math.nist.gov/lapack++/">several C++ versions</a> of LAPACK that alleviates this specific issue. The C version exemplifies the problem perfectly.</p>
<h2>What&#8217;s My So-Called Point?</h2>
<p>My point is twofold. As a minor aside, I&#8217;m disappointed that this problem isn&#8217;t being addressed in the updated C++ standard. Macro namespaces would have been a nice touch (and another 10 pages of the already-huge standard). It&#8217;s not, so I won&#8217;t dwell on it.</p>
<p>If that were my overall point, you would be justified in leaving me an angry blog comment telling me that this post was a waste of breath. However, this isn&#8217;t my overall point! My <em>real</em> overall point is that <strong>many programmers don&#8217;t take enough advantage of features offered by modern programming languages.</strong></p>
<p>I admit freely that I don&#8217;t, especially for one-shot projects. Today I looked at a well-documented (but poorly designed) project I wrote in early 2007. The comments and function names were good enough that I immediately knew what was going on. I was able to use the header files within minutes.</p>
<p>The downside is that I could immediately see that I designed myself into several corners, the worst being a macro named &#8220;NAME&#8221;.</p>
<p>Why did I do this? Because none of my teachers/professors/programming books ever condemned this kind of usage. I had to figure this out gradually over time.</p>
<h2>The Imperfect Macro Workaround</h2>
<p>Since macro name collision is a real problem, unofficial standards have risen for dealing with macros. They can usually be summarized by a few rules:</p>
<ul>
<li>Use all-caps.</li>
<li>Use underscores for human-readable spacing.</li>
<li>Add a likely-unique prefix to your macro names.</li>
</ul>
<p>There are a few reasons that the conventions aren&#8217;t good enough, but most of them are edge cases: You can argue that you might find two libraries that use the same prefix and define the same macros, but this is probably about as common as namespace conflicts.</p>
<p>The <span style="text-decoration: underline;">real</span> reason that this isn&#8217;t good enough is that it&#8217;s unenforceable. Any new programmer who just learned how to use macros won&#8217;t be using the convention, and they&#8217;ll be mucking up the preprocessor with macro definitions, which will cause protons to smash, create several black holes, and destroy the universe. I should know: as I mentioned above, I&#8217;ve done it.</p>
<p>Old libraries will also violate the guidelines. So will people with strong opinions who think that the conventions are OBVIOUSLY trash. So will people who code at 3AM on a deadline.</p>
<h2>&#8220;One Honking Great Idea.&#8221;</h2>
<p>The Zen of Python is one of the great programming guides available to everyone. How do you get it? Simply type &#8220;import this&#8221; into the interpreter!</p>
<blockquote>
<pre>&gt;&gt;&gt; import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!</pre>
</blockquote>
<p>The last line is reflected in Python&#8217;s design: loading code from a different file automatically provides you with namespaces. This is awesome: newbies and professionals alike must use namespaces, and best of all, it&#8217;s all done transparent with respect to the programming experience.</p>
<p>This is not unique to Python. Modern popular programming languages offer almost-universal support for <a href="http://en.wikipedia.org/wiki/Namespace_(computer_science)">namespaces</a>.</p>
<ul>
<li>Common Lisp uses packages in order to support namespaces.</li>
<li>Ruby and Python use modules.</li>
<li>C++ has explicit namespaces.</li>
<li>Bash scripts use the uniqueness of the file system (a path points to one file).</li>
<li>C uses&#8230;. name prefixes. When the programmer feels like it.</li>
</ul>
<p>Why do all of these languages support some form of namespaces? <strong>Because name collisions really are that bad.</strong> Libraries with major name collisions simply can&#8217;t be used together, and that is NOT the aim of programming.</p>
<h2>What Can We Do?</h2>
<p>I find three things that ultimately help me.</p>
<ol>
<li>Work with people who know a lot more than you do.</li>
<li>Learn new languages.</li>
<li>Read books written by the greats.</li>
</ol>
<p>Right now, I have the good fortune of doing all three.</p>
<p><strong>Images used in this post</strong></p>
<p>&#8220;<a href="http://www.flickr.com/photos/cjc/228736612/">The Demon Comes</a>&#8221; by Flickr user Islãndßoy.</p>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=146&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2008/09/15/buddy-can-you-spare-a-namespace/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Code Golf: My Thought Process on &#8220;Saving Time&#8221;</title>
		<link>http://www.jakevoytko.com/blog/2008/08/25/code-golf-my-thought-process-on-saving-time/</link>
		<comments>http://www.jakevoytko.com/blog/2008/08/25/code-golf-my-thought-process-on-saving-time/#comments</comments>
		<pubDate>Mon, 25 Aug 2008 04:00:15 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Programming Languages]]></category>
		<category><![CDATA[Code Golf]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Python tricks]]></category>
		<category><![CDATA[Saving time]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/?p=133</guid>
		<description><![CDATA[My week was going so well. I moved into a new apartment that cut my commute time by 2/3, and I found an unsecured network accessible from within the confines of my apartment. Nothing else to do, I was expecting to get some good work done on one of my Lisp projects. My friend Steve [...]]]></description>
			<content:encoded><![CDATA[<p>My week was going so well. I moved into a new apartment that cut my commute time by 2/3, and I found an unsecured network accessible from within the confines of my apartment. Nothing else to do, I was expecting to get some good work done on one of my Lisp projects. My friend <a href="http://www.stephenlombardi.com/blog/">Steve</a> ruined it by sending out the following mass-email:</p>
<blockquote><p><a href="http://codegolf.com/saving-time" target="_blank">http://codegolf.com/saving-time</a> I dare you to beat my score!</p></blockquote>
<p>I go to the site and see that it gives you a problem definition, some examples, and ranks you on Perl, Ruby, Python, and PHP.</p>
<h2><strong>Saving Time</strong></h2>
<p><strong>Input</strong></p>
<p>A single 24-hour time into STDIN, such as 21:37 or 09:03.</p>
<p><strong>Output</strong></p>
<p>A clock laid out like the following:</p>
<blockquote>
<pre>        o
    o       o

 o             o

h               o

 o             o

    m       o
        o</pre>
</blockquote>
<p>This would be the associated output for 21:37. Minutes get rounded down to the nearest 5, and if the &#8216;h&#8217; and &#8216;m&#8217; fall on the same element of the clock, an &#8216;x&#8217; appears instead.</p>
<h2>First approach: Calculate each clock position</h2>
<p>Here is the end result of my first attempt. This may have been my second or third submission: (<strong>Note:</strong> I now know that there are optimizations I could have used)</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">I=<span style="color: #008000;">raw_input</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
T=<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><span style="color: #66cc66;">*</span><span style="color: #ff4500;">12</span>
T<span style="color: black;">&#91;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>I<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span>:<span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: #66cc66;">%</span>12<span style="color: black;">&#93;</span>+=<span style="color: #ff4500;">1</span>
T<span style="color: black;">&#91;</span><span style="color: #008000;">int</span><span style="color: black;">&#40;</span>I<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span>:<span style="color: #ff4500;">5</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>/<span style="color: #ff4500;">5</span><span style="color: black;">&#93;</span>+=<span style="color: #ff4500;">2</span>
L=<span style="color: black;">&#91;</span><span style="color: #483d8b;">'o'</span>,<span style="color: #483d8b;">'h'</span>,<span style="color: #483d8b;">'m'</span>,<span style="color: #483d8b;">'x'</span><span style="color: black;">&#93;</span>
F=<span style="color: #ff7700;font-weight:bold;">lambda</span> i:L<span style="color: black;">&#91;</span>T<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;%9s<span style="color: #000099; font-weight: bold;">\n</span>%5s%8s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%2s%14s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%s%16s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%2s%14s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%5s%8s<span style="color: #000099; font-weight: bold;">\n</span>%9s&quot;</span><span style="color: #66cc66;">%</span><span style="color: black;">&#40;</span>F<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">11</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">9</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">8</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">4</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">7</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>,F<span style="color: black;">&#40;</span><span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This baby clocked in at <strong>234 bytes</strong>.</p>
<p>I coded using the algorithm that I&#8217;d use if I had to write the program &#8220;for real&#8221; (for some value of real.)</p>
<p>We have a list that represents the following clock positions:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">T=<span style="color: black;">&#91;</span><span style="color: #ff4500;">12</span>:00, <span style="color: #ff4500;">1</span>:00, <span style="color: #ff4500;">2</span>:00, ... , <span style="color: #ff4500;">11</span>:00<span style="color: black;">&#93;</span></pre></div></div>

<p>We then add 1 in the proper position for the hours, and 2 for the minutes. When we&#8217;re printing, we print the proper character from L=['o','h','m','x']. Easy, huh?</p>
<p>When I came home from work the next day, I saw that my friends&#8217; scores were significantly lower than mine.</p>
<h2>Second Approach: Optimizations and Tricks</h2>
<p>Every byte counts. I talked to two of my friends, Rob (who now holds the lead in Python with 143), and <a href="http://www.stephenlombardi.com/blog/">Steve</a> (who has 148 in Perl, tied with my current score), and started comparing tricks for small parts of the program. I got hints for newlines and list slices, and found an <a href="http://marnanel.livejournal.com/183071.html">awesome ternary hack</a> for Python.</p>
<p>I also did some work to combine steps. Python&#8217;s good support of functional programming building blocks (lambda, apply, map, etc) played a big part in this. The algorithm is still basically the same, but more happens &#8220;on-the-fly.&#8221;</p>
<p>My code is now as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">I=<span style="color: #008000;">raw_input</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
h=<span style="color: #008000;">int</span><span style="color: black;">&#40;</span>I<span style="color: black;">&#91;</span>:<span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: #66cc66;">%</span>12
m=<span style="color: #008000;">int</span><span style="color: black;">&#40;</span>I<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span>:<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>/<span style="color: #ff4500;">5</span>
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: #483d8b;">''</span><span style="color: #483d8b;">'%9s
%5s%8s
&nbsp;
%2s%14s
&nbsp;
%s%16s
&nbsp;
%2s%14s
&nbsp;
%5s%8s
&nbsp;
%9s'</span><span style="color: #483d8b;">''</span><span style="color: #66cc66;">%</span><span style="color: #008000;">tuple</span><span style="color: black;">&#40;</span><span style="color: #008000;">map</span><span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> c:<span style="color: black;">&#91;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'o'</span>,<span style="color: #483d8b;">'m'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>c==m<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span><span style="color: #483d8b;">'h'</span>,<span style="color: #483d8b;">'x'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>h==m<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>c==h<span style="color: black;">&#93;</span>,<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">11</span>,<span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">10</span>,<span style="color: #ff4500;">2</span>,<span style="color: #ff4500;">9</span>,<span style="color: #ff4500;">3</span>,<span style="color: #ff4500;">8</span>,<span style="color: #ff4500;">4</span>,<span style="color: #ff4500;">7</span>,<span style="color: #ff4500;">5</span>,<span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This puts me at <strong>194 bytes</strong>. I have broken the 200 mark! Note that this program could still be a lot shorter as-is! I just didn&#8217;t know this when I submitted it. I was almost certain I was bottoming out for this method.</p>
<p>A conversation I had with Rob gave him the insight to cut his Python program down from a bytecount in the 160s to a bytecount in the 140s. My program is still a bloated cow in comparison.</p>
<h2>Third Approach: One-liner</h2>
<p>I decided to write a one-liner to see if I could gain any insight into the problem. It didn&#8217;t end up helping, but it is at least an example of functional programming in Python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">print</span><span style="color: #483d8b;">&quot;%9s<span style="color: #000099; font-weight: bold;">\n</span>%5s%8s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%2s%14s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%s%16s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%2s%14s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%5s%8s<span style="color: #000099; font-weight: bold;">\n</span>%9s&quot;</span><span style="color: #66cc66;">%</span>apply<span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> h,m:<span style="color: #008000;">tuple</span><span style="color: black;">&#40;</span><span style="color: #008000;">map</span><span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> c:<span style="color: black;">&#91;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'o'</span>,<span style="color: #483d8b;">'m'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>c==m<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span><span style="color: #483d8b;">'h'</span>,<span style="color: #483d8b;">'x'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>h==m<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>c==h<span style="color: black;">&#93;</span>,<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">11</span>,<span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">10</span>,<span style="color: #ff4500;">2</span>,<span style="color: #ff4500;">9</span>,<span style="color: #ff4500;">3</span>,<span style="color: #ff4500;">8</span>,<span style="color: #ff4500;">4</span>,<span style="color: #ff4500;">7</span>,<span style="color: #ff4500;">5</span>,<span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>,apply<span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> a,b:<span style="color: black;">&#40;</span>a<span style="color: #66cc66;">%</span>12,b/<span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>,<span style="color: #008000;">map</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span>,<span style="color: #008000;">raw_input</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">':'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>It is the equivalent to this (commented and slightly better spaced) program:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #808080; font-style: italic;"># Reads input, splits on ':', casts to int, and</span>
<span style="color: #808080; font-style: italic;"># processes hour and minute</span>
hour,minute=apply<span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> a,b:<span style="color: black;">&#40;</span>a<span style="color: #66cc66;">%</span>12,b/<span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>, <span style="color: #008000;">map</span><span style="color: black;">&#40;</span><span style="color: #008000;">int</span>, <span style="color: #008000;">raw_input</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">':'</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Based on hour/minute, returns a list of the proper</span>
<span style="color: #808080; font-style: italic;"># character for each location on the clock face</span>
clockChars=<span style="color: #008000;">map</span><span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> currentNum:<span style="color: black;">&#91;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'o'</span>,<span style="color: #483d8b;">'m'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>currentNum==minute<span style="color: black;">&#93;</span>,<span style="color: black;">&#91;</span><span style="color: #483d8b;">'h'</span>,<span style="color: #483d8b;">'x'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>currentNum==minute<span style="color: black;">&#93;</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>currentNum==hour<span style="color: black;">&#93;</span>,<span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">11</span>,<span style="color: #ff4500;">1</span>,<span style="color: #ff4500;">10</span>,<span style="color: #ff4500;">2</span>,<span style="color: #ff4500;">9</span>,<span style="color: #ff4500;">3</span>,<span style="color: #ff4500;">8</span>,<span style="color: #ff4500;">4</span>,<span style="color: #ff4500;">7</span>,<span style="color: #ff4500;">5</span>,<span style="color: #ff4500;">6</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># Prints out the clock face</span>
<span style="color: #ff7700;font-weight:bold;">print</span><span style="color: #483d8b;">&quot;%9s<span style="color: #000099; font-weight: bold;">\n</span>%5s%8s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%2s%14s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%s%16s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%2s%14s<span style="color: #000099; font-weight: bold;">\n</span><span style="color: #000099; font-weight: bold;">\n</span>%5s%8s<span style="color: #000099; font-weight: bold;">\n</span>%9s&quot;</span><span style="color: #66cc66;">%</span><span style="color: #008000;">tuple</span><span style="color: black;">&#40;</span>clockChars<span style="color: black;">&#41;</span></pre></div></div>

<h2>Fourth Approach: I&#8217;m Not Telling!</h2>
<p>In the end, I changed my line of attack completely and developed a version of my program that got me <a href="http://codegolf.com/leaderboard/competition/saving-time/python">2nd place for Python</a> (20th overall as of 10:00 Sunday, August 24th, tied with Steve&#8217;s Perl program).</p>
<p>I will have to bow down and pledge my allegiance to Rob, as I just can&#8217;t get the last 5 bytes.</p>
<p><strong>General Approach</strong></p>
<p>I had a few guidelines that got me down to this program length:</p>
<ul>
<li><strong>Lazy coding</strong>: Don&#8217;t do now what you can save for later.</li>
<li><strong>Count characters for sections</strong>: Count how many characters you use for each particular task. If it seems high, open a new file and try to reduce.</li>
<li><strong>Try everything</strong>: Open up your program reference and look for similar functions that take fewer arguments, or different paradigms.</li>
<li><strong>Every byte matters</strong>: Are you using Unix or Windows newlines? Do you have unnecessary spaces? Can you abuse the language syntax and remove a few bytes?</li>
<li><strong>Count-as-you-go</strong>: If you&#8217;re aiming to hit a certain character limit, count the total characters you&#8217;ve used as you go along. When you get close to that limit, it&#8217;s time to edit down. Is your method going to give you the character count needed? Are you repeating yourself too much?</li>
</ul>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=133&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2008/08/25/code-golf-my-thought-process-on-saving-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How Do You Write Programs So Non-Programmers Can Read Them?</title>
		<link>http://www.jakevoytko.com/blog/2008/06/23/how-do-you-write-programs-so-non-programmers-can-read-them/</link>
		<comments>http://www.jakevoytko.com/blog/2008/06/23/how-do-you-write-programs-so-non-programmers-can-read-them/#comments</comments>
		<pubDate>Mon, 23 Jun 2008 04:00:01 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[comments]]></category>
		<category><![CDATA[good variable names]]></category>
		<category><![CDATA[hash functions]]></category>
		<category><![CDATA[language]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/2008/06/23/how-do-you-write-programs-so-non-programmers-can-read-them/</guid>
		<description><![CDATA[The Setup In my final semester at TCNJ, I researched cryptographic hash algorithm with a math major. His programming experience was extremely limited: he had only programmed using Matlab, so he knew what functions and variables were, but that was the extent of his knowledge. In school I was a math minor, so we had [...]]]></description>
			<content:encoded><![CDATA[<h2>The Setup</h2>
<p>In my final semester at TCNJ, I researched cryptographic hash algorithm with a math major. His programming experience was extremely limited: he had only programmed using Matlab, so he knew what functions and variables were, but that was the extent of his knowledge. In school I was a math minor, so we had a common ground for communication. However, part of my job was to actually write the programs that our research needed. Not only that, but the math major had to be able to modify what I had written in meaningful ways. I needed to write the clearest code that I had ever written.</p>
<p>So how did I do it? Language, Good Names/Comments, Structure, and Examples. Basically, all of the ordinary tools for making code more readable to programmers.</p>
<h2>Language</h2>
<p>I had a few criteria for selection:</p>
<ul>
<li>To the casual Matlab user, the code written in the language should be readable.</li>
<li>I needed to quickly reach competency.</li>
<li>It should run on any OS platform without hassle.</li>
</ul>
<p>I ended up choosing Python. First of all, downloading it and installing it on any OS is a piece of cake, running a Python file takes exactly 1 step: fire up the interpreter with the file as an argument. Second of all, it&#8217;s essentially executable pseudo-code. The syntax necessary to write a Python program looks very clean, and the indentation of Python forces a certain level of readability.</p>
<p>Third, I&#8217;ve never explicitly &#8220;known&#8221; Python, but I&#8217;ve always reached the &#8220;get things done&#8221; level of competency within a day of picking it up for a project.</p>
<p>I ignored Java and C++ because getting the compiler tools installed on new Windows machines is a hassle compared to Python, the installation is in 2 steps, and there is just more syntax for the newbie to read. YMMV, but I&#8217;ve always picked Python back up quicker than I&#8217;ve picked up Java or C++ after a long hiatus, and I &#8220;know&#8221; Java and C++.</p>
<h2>Good Names/Comments</h2>
<p>Extensive comments obviously helped the project. Normally, I describe what a function does at a high level, and then comment any tricky parts. This is enough for programmers, but it turned out to be insufficient for non-programmers. I showed my partner some C++ code in the beginning of the project using this technique and he didn&#8217;t understand how it worked.</p>
<p>At the beginning of each file, I added a description of the structure, and pointers to where he could find key parts of the code. At the beginning of each function, I commented the hell out of it, describing what it was supposed to do. At every logical sub-section of the code, I left a comment describing what it accomplished.</p>
<p>The best addition to commenting was using good names: consistent case, minimize abbreviation for the most useful names, and consistent naming across functions.</p>
<h2>Structure</h2>
<p>Every file that I wrote had to have the same general structure:</p>
<ul>
<li>A big honkin&#8217; description of the contents and purpose of the file.</li>
<li>Functions in increasing order of importance:
<ul>
<li>Small utility functions at top.</li>
<li>Important control functions at the bottom.</li>
</ul>
</li>
<li>Main function with a few examples.</li>
</ul>
<p>Since everything was always in the same place, my partner never had to look hard to find what he needed.</p>
<h2>Examples</h2>
<p>The best thing I did to talk to my parter in programming languages was to implement hash algorithms that he already understood. For example, <a href="http://www.jakevoytko.com/src/sha1.txt">here is my implementation of SHA-1</a> [.txt file due to hosting limitations]. I sat down with him and made sure that he knew how to run it, and from there, let him on his merry way.</p>
<h2>The Result?</h2>
<blockquote><p>&#8220;Actually, it was a lot clearer than I thought it was going to be&#8221;</p></blockquote>
<p>The code was a success. He was able to modify and read it successfully, and do everything that needed to be done.</p>
<h2>Is This Relevent to Ordinary Programming?</h2>
<p>No. I haven&#8217;t added anything new to the discussion, only approached an unusual situation with the usual solutions.</p>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=91&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2008/06/23/how-do-you-write-programs-so-non-programmers-can-read-them/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacGyver&#8217;s MapReduce in Python! Part 2: Local Security</title>
		<link>http://www.jakevoytko.com/blog/2008/02/24/macgyvers-mapreduce-in-python-part-2-local-security/</link>
		<comments>http://www.jakevoytko.com/blog/2008/02/24/macgyvers-mapreduce-in-python-part-2-local-security/#comments</comments>
		<pubDate>Sun, 24 Feb 2008 22:00:42 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[cryptographic hash functions]]></category>
		<category><![CDATA[hash functions]]></category>
		<category><![CDATA[Macgyver's MapReduce]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/2008/02/24/macgyvers-mapreduce-in-python-part-2-local-security/</guid>
		<description><![CDATA[Scope of the Article Intermediate. This is a security discussion, but Python will be used. A basic overview of cryptographic hash functions will also be given. Note that the lowest version of Python is 2.3 (and I have no power to change this), so I may be restricted from performing currently optimal solutions. Feel free [...]]]></description>
			<content:encoded><![CDATA[<h2>Scope of the Article</h2>
<p>Intermediate. This is a security discussion, but Python will be used. A basic overview of cryptographic hash functions will also be given.</p>
<p>Note that the lowest version of Python is 2.3 (and I have no power to change this), so I may be restricted from performing currently optimal solutions. Feel free to make suggestions anyways.</p>
<p>For a refresher on the <em>MapReduce</em> process, check out <a href="http://www.jakevoytko.com/blog/2008/02/17/macgyvers-mapreduce-in-python-part-1-theory/">Part 1</a>.</p>
<h2>What I&#8217;m Up Against Storing Data Locally</h2>
<p><strong>Worker computers will be attacked.</strong> The people who I share the lab with are my friends, and they will therefore try to do everything they can to mess my day up. Forkbombs and unplugging are probably not out of the question. My friends skills run the gamut from *nix familiarity to mastery of college-level undergrad mathematics and cryptography courses.</p>
<p><strong>Attackers have physical access to the machines.</strong> All of the machines are in computer labs. Enough said.</p>
<p><strong>Very small shared drive between computers.</strong> The shared drive is large enough to store the final results, but not intermediate calculations.</p>
<p><strong>All computers have different environments</strong>, almost none of which I have control over. My personal laptop is an Intel Core 2 Duo with 1.6mHz, and I do have full control of what software it runs. The head machine in the network is a Blade server running Solaris, and it has Python 2.4. The worker machines are Linux machines running Python 2.3.</p>
<p>For the purposes of this exercise, I am considering my own area of the shared network drive secure. Anything that is not taking place within the computer&#8217;s memory and within my own shared network drive will be considered &#8220;secure&#8221;, the truth be damned.</p>
<h2>Storing Data on Worker Machines</h2>
<p>As I mentioned above, there is not enough room for me to store intermediate data in the shared network drive, so I need to take advantage of the storage capacity of the worker machines. All of the worker machines are sitting right in a computer lab. These labs see mild-to-moderate use every day, so I can&#8217;t just use a live CD to start a distributed computing program.</p>
<p>Since the local drives of the machines are open for general use, I need to make a strategy to protect my data. First, let&#8217;s look at the naive way to store <em>MapReduce</em> data:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> do_work<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># The only commas are guaranteed to be separators.</span>
    <span style="color: #808080; font-style: italic;"># The work splitting is simplified for this example.</span>
    todo = <span style="color: #008000;">self</span>.<span style="color: black;">work</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">','</span><span style="color: black;">&#41;</span>
    temp_file = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">tmpfile</span>, <span style="color: #483d8b;">'w'</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> todo:
         result = <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>mapreduce.<span style="color: #008000;">map</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
         temp_file.<span style="color: black;">write</span><span style="color: black;">&#40;</span>result + <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#41;</span>
    temp_file.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>This gets the job done, but this could give me some serious headaches. What&#8217;s to stop someone from potentially altering, inserting, or erasing messages (permissions aside)? We need something called <em>Cryptographic Hash Functions</em> for added security<em>.</em></p>
<h2>Cryptographic Hash Functions</h2>
<p>A hash function, in general, takes an arbitrary length message and converts it to a <em>digest</em> of a fixed size. There is no guarantee as to whether or not an attacker can determine the original message, or create a second message with the same digest.</p>
<p>A <em>Cryptographic Hash Function</em> is a hash function with the following properties:</p>
<p><strong>Preimage Resistance</strong>: Let <em>hash(message_a) = digest_a. </em>If the attacker is given <em>digest_a</em>, can <em>message_a</em> be determined? If a hash digest is <em>n</em> bits, the preimage resistance is expected to be <img src='/blog/wp-content/plugins/latexrender/pictures/9aa0ec0374c89d2f7f3d9cd2e05a4bc5_1.0pt.gif' title='2^{n}' alt='2^{n}'  style="vertical-align:-1.0pt;" >.</p>
<p><strong>Second Preimage Resistance:</strong> If the attacker has a message, <em>message_a</em>, it should be hard to find <em>message_b</em> so that <em>hash(message_a)=hash(message_b)</em>. If a hash digest is <em>n</em> bits, the preimage resistance is expected to be <img src='/blog/wp-content/plugins/latexrender/pictures/9aa0ec0374c89d2f7f3d9cd2e05a4bc5_1.0pt.gif' title='2^{n}' alt='2^{n}'  style="vertical-align:-1.0pt;" >.</p>
<p><strong>Collision Resistance:</strong> It should be hard for the attacker to make <em>message_a</em> and <em>message_b</em> so that <em>hash(message_a) = hash(message_b).</em> This is expected to have security of <img src='/blog/wp-content/plugins/latexrender/pictures/66512b246461090416f308a84b08c768_1.0pt.gif' title='2^{n/2}' alt='2^{n/2}'  style="vertical-align:-1.0pt;" >.</p>
<p><strong>Message extension resistance:</strong> It should be hard for the attacker, when given <em>hash(message_a)</em>, to easily calculate any possible extension <em>hash(message_a || message_b)</em>, where &#8220;||&#8221; is concatenation.<br />
A few examples of cryptographic hash functions:</p>
<ul>
<li>MD2, MD4, MD5 (Designed by Ron Rivest)</li>
<li>Whirlpool (Designed by the makers of AES)</li>
<li>SHA-0, SHA-1, SHA-224/256, SHA-384/512 (Designed by NiST)</li>
<li>Elliptic Curve Cryptography (Designed by Lenstra, et. al)</li>
</ul>
<p>It should be noted that the MD series is currently considered broken, it is now considered &#8220;practical&#8221; to find collisions of SHA-0 with beefy resources. SHA-1 has a theoretical collision attack by a group of Chinese researchers (Wang, Yin, Yu).</p>
<h2>Better Data Storage on Worker Machines</h2>
<p>I will use Python 2.3&#8242;s built-in SHA-0 digest. &#8220;But Jake,&#8221; I hear you say, &#8220;you said that it was insecure! Someone might alter one of your messages!&#8221; Let&#8217;s look at the tradeoff here: The Wang, Yin, Yu attack takes theoretically <img src='/blog/wp-content/plugins/latexrender/pictures/26f5c7401cdc59acc94842661057823d_1.0pt.gif' title='2^{33}' alt='2^{33}'  style="vertical-align:-1.0pt;" > operations to complete on average. It&#8217;s not out of the realm of possibility to find a collision on an overpowered home computer in a few weeks, or on a cluster within a few days.</p>
<p>However, all of my data results are going to exist on local machines for a few hours, max, before the computation completes and work is sent home. Even if someone wanted to devote the time and energy to find a collision, they would be altering a single data point, when there could potentially be millions. The attackers would either be creating a completely bad data point that can be sanity checked, or they will be spending all of the computation power to throw off my result by a fraction. Needless to say, I&#8217;m more worried about losing large sets of data than an attacker adding random messages.</p>
<p>I will be using a method of hashing called &#8220;salting&#8221; where an extra value is added to the hash to help further disguise the message. This is a common method for storing passwords. As long as the salt isn&#8217;t short, the attacker should be unable to create a collision in reasonable time. I will be using a large &#8220;nonce&#8221;, (or <strong>n</strong>umber used <strong>once</strong>).</p>
<p><strong>Note:</strong> I could use encryption for this task, especially with the freely available <em>crypto</em> libraries for Python, but the headache of getting them installed on all of the computers is not what my research is all about. Since it is irrelevant if attackers read the information, I think that it can be shown that encryption and my scheme are both equivalent to finding the encryption key / hash salt of these one-way functions. If you don&#8217;t agree, I&#8217;d really like to know why.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> do_work<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    <span style="color: #808080; font-style: italic;"># The only commas are guaranteed to be separators.</span>
    <span style="color: #808080; font-style: italic;"># The work splitting is simplified for this example.</span>
    todo = <span style="color: #008000;">self</span>.<span style="color: black;">work</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">','</span><span style="color: black;">&#41;</span>
    temp_file = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">tmpfile</span>, <span style="color: #483d8b;">'w'</span><span style="color: black;">&#41;</span>
    i = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> todo:
         result = <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>mapreduce.<span style="color: #008000;">map</span><span style="color: black;">&#40;</span>item<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
         sha_obj = <span style="color: #dc143c;">sha</span>.<span style="color: #dc143c;">new</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>.<span style="color: black;">salt</span>+result+<span style="color: #008000;">str</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
         t = <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>i<span style="color: black;">&#41;</span>+<span style="color: #483d8b;">','</span>+result+<span style="color: #483d8b;">','</span>+sha_obj.<span style="color: black;">hexdigest</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
         temp_file.<span style="color: black;">write</span><span style="color: black;">&#40;</span>t + <span style="color: #483d8b;">'<span style="color: #000099; font-weight: bold;">\n</span>'</span><span style="color: black;">&#41;</span>
         i+=<span style="color: #ff4500;">1</span>
    temp_file.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>The validation code is also simple:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">def</span> extract_value<span style="color: black;">&#40;</span>line, salt<span style="color: black;">&#41;</span>:
    values = line.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">','</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>values<span style="color: black;">&#41;</span> <span style="color: #66cc66;">!</span>= <span style="color: #ff4500;">3</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
    t = salt+values<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>+values<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    sha_obj = <span style="color: #dc143c;">sha</span>.<span style="color: #dc143c;">new</span><span style="color: black;">&#40;</span>t<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> sha_obj.<span style="color: black;">hexdigest</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">!</span>= values<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Hax&quot;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> values<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span></pre></div></div>

<h2>What’s next?</h2>
<p>Next week we will look into network transmissions of the data. Yes, that post is likely to be a lot like this one. Despite my wishes to the contrary, I need to spend more time researching than blogging.</p>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=54&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2008/02/24/macgyvers-mapreduce-in-python-part-2-local-security/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MacGyver&#8217;s MapReduce in Python! Part 1: Theory</title>
		<link>http://www.jakevoytko.com/blog/2008/02/17/macgyvers-mapreduce-in-python-part-1-theory/</link>
		<comments>http://www.jakevoytko.com/blog/2008/02/17/macgyvers-mapreduce-in-python-part-1-theory/#comments</comments>
		<pubDate>Sun, 17 Feb 2008 19:21:22 +0000</pubDate>
		<dc:creator>Jake</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[distributed computing]]></category>
		<category><![CDATA[Macgyver's MapReduce]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://www.jakevoytko.com/blog/2008/02/17/macgyvers-mapreduce-in-python-part-1-theory/</guid>
		<description><![CDATA[Scope of This Article Beginner / Intermediate. Assumes some knowledge of basic computer science. If you&#8217;re looking for some serious meat and potatoes, come back next week. If you&#8217;re looking for an article that isn&#8217;t about MapReduce because you&#8217;re sick of hearing about it, come back next month. Introductory B.S. TCNJ has about 20 Linux [...]]]></description>
			<content:encoded><![CDATA[<h2>Scope of This Article</h2>
<p>Beginner / Intermediate. Assumes some knowledge of basic computer science. If you&#8217;re looking for some serious meat and potatoes, come back next week. If you&#8217;re looking for an article that isn&#8217;t about <em>MapReduce </em>because you&#8217;re sick of hearing about it, come back next month.</p>
<h2>Introductory B.S.</h2>
<p>TCNJ has about 20 Linux and 30 Unix machines in computer labs, but a HUGE chunk of their time is spent idling. The labs, for the most part, are pretty sparsely used.</p>
<p>I&#8217;ve been looking for a good professor-excused reason to do some major cluster computing, and finally my time has come! Another student and I are doing cryptography research to design a <a href="http://csrc.nist.gov/groups/ST/hash/sha-3/index.html">new hash algorithm</a> (blog post to follow after Third Quarter 2008!).</p>
<p>Guess what cryptographic hash algorithms need? Lots and lots and lots of statistical testing of millions of separate hashes. It would take my laptop an unfortunate amount of time to process dozens of tests with millions of records even if we <em>could</em> parse 100mBps of data with our design. God help us when we want to change something: all of the testing needs to begin all over again. Yuck!</p>
<p>This situation sounds ideal for <em><a href="http://labs.google.com/papers/mapreduce.html">MapReduce</a></em>. Why should I use my own machine when I can make other machines do it? I am a geek, purveyor of the computer lab, master and commander of all that lay before me. Let&#8217;s put those things to work!</p>
<p>While I&#8217;m on the &#8220;Looking For An Excuse&#8221; train, I&#8217;ve played around with Python extensively in the past and never produced anything substantial. I&#8217;ve always been impressed with the speed of which I&#8217;ve been able to make things in Python, and in this situation, &#8220;Batteries Included&#8221; will help me immensely.</p>
<h2>So what&#8217;s the catch?</h2>
<p>I have absolutely no control over what software is on any of the machines. I get to decide between Java 1.5, Python 2.4, and gcc/g++ 3.4.1. I&#8217;m not allowed to run as root on any of the machines. Any task needs to be run as the user, so I need to be logged in to the machines when the commands are run (if I&#8217;m not right about that, I&#8217;d love to hear it, but every way I&#8217;ve found so far to run a job in the future requires the user to be logged in).</p>
<p>The consolation prize is that they are all pretty beefy, with decent hard drive space (most empty!), 2GB of memory, and dual-core AMD processors.</p>
<h2>System Architecture And Design</h2>
<p>The machines are behind a central server (<em>Named &#8220;TomHanks&#8221; for the sake of discussion&#8230; this is not the actual name of the computer</em><em>)</em> that also doesn&#8217;t do much. The machines don&#8217;t have any contact through the outside world except through <em>TomHanks</em>. They are not all necessarily in the same room, and they are not necessarily running the same operating system.</p>
<p><a title="networkarch1.png" rel="attachment wp-att-53" href="http://www.jakevoytko.com/blog/2008/02/17/macgyvers-mapreduce-in-python-part-1-theory/attachment/53/"><img src="http://www.jakevoytko.com/blog/wp-content/uploads/2008/02/networkarch1.png" alt="networkarch1.png" /></a></p>
<p>All of the computers are on the same architecture, and do have access to the same shared drive. However, my space restrictions on the shared drive are tiny, making any kind of shared storage impractical in the general case. However, this does provide interesting opportunities, as my school-given website is accessible through here, making report-generation a possibility. On the other hand, the important part of my semester is the research I&#8217;m doing, not the actual MapReduce client, so I&#8217;m going to try to keep it as simple as possible.</p>
<h2>Abstract Design of <em>MapReduce</em></h2>
<p>The idea of <em>MapReduce</em> is beautifully simple, and takes simple ideas from functional paradigms and applies them to the world of distributed computing. Let&#8217;s say that we have a list of data that we want to hash. Picking an arbitrary list, we&#8217;ll use<em> [1, 2, 3, 4, ... n].</em></p>
<p>First, we apply the<em> map()</em> function to every single value of the array. Assuming our function is named <em>f</em>, our result is <em>[f(1), f(2), f(3), ..., f(n)]</em></p>
<blockquote><p><strong>Aside: What is &#8220;map&#8221;?<br />
</strong></p>
<p>When we &#8220;map&#8221; something, all we are doing is applying the same function to every element of a list, and storing the results in another list. For example, if we have the list <em>[1,2,3,4]</em>, and we apply the function<em> f(x)=2x </em>to this list, we get the list<em> [2,4,6,8]</em>.</p>
<p>In this example, it&#8217;s obvious how individual elements were transformed, but that is not always the case.</p>
<p>What are the constraints on the mappings? For my purpose, it works out best if the map is not always one-to-one, meaning that each time we evaluate the function, we can choose to return nothing, one value, or more values. It *can* be one-to-one, and for some tasks, it is necessary.</p></blockquote>
<p>Then, we take the resulting array and evaluate a &#8220;reduce&#8221; function.</p>
<blockquote><p><strong>Aside: What is &#8220;reduce&#8221;?<br />
</strong><br />
&#8220;Reduce&#8221; is perhaps the least obvious name to pick. Other names for this idea are &#8220;fold&#8221; and &#8220;accumulate&#8221;.</p>
<p>When we &#8220;reduce&#8221;, we are just combining all of the values in a list to a single value by applying some function.</p>
<p>How do we reduce the list <em>[1,2,3,4]</em> using addition? <em>1+2+3+4 = 10</em>. We could also have converted it into a CSV string: <em>&#8220;1,2,3,4&#8243;</em> or &#8220;4,3,2,1&#8243;. There are, of course, subtleties that I am skipping. If you would like more of them, please go <a href="http://en.wikipedia.org/wiki/Reduce_%28higher-order_function%29">here</a>.</p></blockquote>
<p>And there you have it, <em>MapReduce</em>!</p>
<h2>That&#8217;s It?</h2>
<p>As you would expect, we need to make alterations (either above the hood or under the hood) in order to better apply this to the real world.</p>
<p>For our real model, we will borrow from Google&#8217;s design. Before I read <a href="http://labs.google.com/papers/mapreduce.html">Google&#8217;s paper on MapReduce</a>, I drew out how I thought that the model should work, and what I called <em>PreReduce</em> they called a <em>Combiner</em>, which I like better. It allows us, in certain cases, to do some of the heavy lifting of the <em>Reduce</em> on the satellite machines instead of the client machine.</p>
<p>For example, if we were trying to add the numbers from 1 to 100 trillion, why would we bother passing back the numbers <em>[1, 2, 3, 4, ... n]</em> for the host machine to add together? Addition of integers is commutative and associative, so we can add them on the satellite machine and then send back one large integer instead of an arbitrary number of small integers.</p>
<h2>What is our high-level algorithm?</h2>
<p>Ideally, what do we want a client to do?</p>
<ul>
<li>Get Work</li>
<li>Map()</li>
<li>Combine()</li>
<li>Return Work.</li>
<li>Reply to &#8220;Are you alive?&#8221; pings.</li>
<li>Securely send results (as I&#8217;m sure my friends will try to send LoLcat messages to home base).</li>
</ul>
<p>What do we want a server to do?</p>
<ul>
<li>Start all satellite machines.</li>
<li>Respond to requests for data.</li>
<li>Perform <em>Reduce</em></li>
<li>Occasionally check to see which machines are alive.</li>
<li>Generate progress reports (optional).</li>
<li>Track responses and keep a &#8220;Shit List&#8221; of computers that give a few bad responses (throwing the computer in jail for a few minutes and then trusting it again is probably the most practical).</li>
</ul>
<p>So what can we do at a high level to get all of this rolling?</p>
<p><strong>The <em>MapReduce</em> Process:</strong></p>
<ol>
<li><strong>Start a central server that will dole out work tasks</strong>.</li>
<li><strong>The central server starts the satellite machines</strong>. Fortunately, all of the machines share a network drive, so this is a relatively trivial task involving a little SSH action.</li>
<li><strong>The satellite machines phone home for work</strong>. Why do we do it this way instead of letting the central server give work with the start commands? In short, a friend of mine thinks it&#8217;s hysterical to fork-bomb a machine every now and then (and sometimes war erupts during class), so I REALLY can&#8217;t guarantee that ANY of the machines will be working at any given point. If the machines phone home for work, we are more likely to have work units parsed out sooner.</li>
<li><strong>The satellite machines perform map() on every input value. </strong>It is obvious that the satellite machines should be doing the heavy lifting of the map() function. There is no law that says that the resulting map has to be stored in one place, and in this case, it will be stored in 50 places for the time being.</li>
<li><strong>The satellite machines perform combine() on every input value.</strong> As mentioned above, this lets us partition off some of the work of the <em>Reduce</em> function to the satellite machines.</li>
<li><strong>The satellite machines return the results to the central machine.</strong></li>
<li><strong>The central machine pings satellite machines that are long in responding.</strong></li>
<li><strong>Repeat Steps 3-6 until all work units are done.</strong></li>
<li><strong>The central server performs <em>Reduce.</em></strong></li>
</ol>
<h2>What&#8217;s next?</h2>
<p>Next week we will look into how we will organize the SSH connections, how we will login to the machines, and how we will trust &#8220;secure&#8221; answers when the satellites don&#8217;t have any kind of built in cryptography.</p>
<img src="http://www.jakevoytko.com/blog/?ak_action=api_record_view&id=51&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://www.jakevoytko.com/blog/2008/02/17/macgyvers-mapreduce-in-python-part-1-theory/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
