Sunday, January 29, 2006

How much speed are you willing to give up for pretty code?

After experimenting a bit with Haskell and using it for some toy projects, I decided to see if there was any chance it could be used for a non-toy task; i.e., one where it might one day be necessary to actually have fast code. As a proxy for this, I decided to try a very simple task: counting the number of lines in a file. As input, I generated a file with a million lines of random data and an average of 40 bytes per line; all the Haskell programs were compiled with the current GHC in Debian unstable and with optimizations enabled.

The baseline is, of course, our friend "wc":

real    0m0.084s
user 0m0.043s
sys 0m0.040s


(this and all other timing data was taken after "cat /tmp/testwc.txt > /dev/null" to ensure that the file was cached) Now, it's probably too much to hope that a Haskell program will run as fast as a C program does, since the C program can run in a tight loop. But it seems reasonable that it should be possible for Haskell to be competitive with the trivial Python implementation of this program -- after all, Python is interpreted and Haskell is compiled.

import sys
n = 0
for line in sys.stdin:
n += 1
print n


Running this simple Python program under "time" produces the following statistics, a bit less than 10 times slower than the C version:

real    0m0.698s
user 0m0.643s
sys 0m0.050s


The trivial Haskell program to count the lines of its input is:

main = do c <- getContents ; print $ length $ lines c


Note that this is basically identical to the Python program, but expressed in a functional style. Pretty, but how does it stack up?

real    0m6.744s
user 0m6.426s
sys 0m0.213s


OW! Ten times slower than the Python code, and a good 80 times faster than "wc"! The gap can be lowered to "only" 50 times with the following program --

do c <- getContents; print $ length $ elemIndices '\n' c


-- but surely it's possible to do better. I'd expect that the expense here is in the use of lazy streams for IO; eliminating them gives us the following code:

wc :: Handle -> IO Int
wc h = do a <- mallocArray bufSize :: IO (Ptr Word8)
loop a 0
where loop a n = do amt <- hGetBuf h a bufSize
elts <- peekArray amt a
(if amt == 0
then return n
else loop a $! n + nlCount elts)


At this point, we're basically using Haskell as a (bad) imperative language. This function allocates a buffer, repeatedly reads into it, and counts the number of newlines in the buffer (I dropped some auxillary code to keep the post short). The $! is important; without it, even though this routine is tail-recursive, Haskell will still allocate an unbounded amount of memory to store the return value. (because, of course, it's terribly important to remember that you're actually returning "1+2+3", not "6")

So, how does this unreadable, hand-optimized version of the code perform?

real    0m1.980s
user 0m1.757s
sys 0m0.094s


That's right -- I've eliminated most uses of higher-level functional techniques, and the Haskell version is still two times slower than an ultra-high-level interpreted language, while also being uglier and harder to read! Ouch. It appears the old performance albatross of functional languages is not quite dead yet.

Thursday, January 05, 2006

Gainful employment rules

In December or so, I finally got offered a job with a small software company based in Seattle, starting on January 3rd (Tuesday). This means that I have income again (yay) and I have health insurance for the first time since September (more yay), but it does also mean that my time to work on free software, and Debian, and my blog, will be pretty limited, especially for the next few months.

Speaking of which, I am still looking for someone else to work with me on aptitude, particularly as I'd like to spend my no-longer-copious free time working on other stuff in a language that doesn't suck (probably Haskell). Any vic^H^H^Hvolunteers?