How much speed are you willing to give up for pretty code?
After experimenting a bit with Haskell and using it for some toy projects, I decided to see if there was any chance it could be used for a non-toy task; i.e., one where it might one day be necessary to actually have fast code. As a proxy for this, I decided to try a very simple task: counting the number of lines in a file. As input, I generated a file with a million lines of random data and an average of 40 bytes per line; all the Haskell programs were compiled with the current GHC in Debian unstable and with optimizations enabled.
The baseline is, of course, our friend "wc":
real 0m0.084s
user 0m0.043s
sys 0m0.040s
(this and all other timing data was taken after "cat /tmp/testwc.txt > /dev/null" to ensure that the file was cached) Now, it's probably too much to hope that a Haskell program will run as fast as a C program does, since the C program can run in a tight loop. But it seems reasonable that it should be possible for Haskell to be competitive with the trivial Python implementation of this program -- after all, Python is interpreted and Haskell is compiled.
import sys
n = 0
for line in sys.stdin:
n += 1
print n
Running this simple Python program under "time" produces the following statistics, a bit less than 10 times slower than the C version:
real 0m0.698s
user 0m0.643s
sys 0m0.050s
The trivial Haskell program to count the lines of its input is:
main = do c <- getContents ; print $ length $ lines c
Note that this is basically identical to the Python program, but expressed in a functional style. Pretty, but how does it stack up?
real 0m6.744s
user 0m6.426s
sys 0m0.213s
OW! Ten times slower than the Python code, and a good 80 times faster than "wc"! The gap can be lowered to "only" 50 times with the following program --
do c <- getContents; print $ length $ elemIndices '\n' c
-- but surely it's possible to do better. I'd expect that the expense here is in the use of lazy streams for IO; eliminating them gives us the following code:
wc :: Handle -> IO Int
wc h = do a <- mallocArray bufSize :: IO (Ptr Word8)
loop a 0
where loop a n = do amt <- hGetBuf h a bufSize
elts <- peekArray amt a
(if amt == 0
then return n
else loop a $! n + nlCount elts)
At this point, we're basically using Haskell as a (bad) imperative language. This function allocates a buffer, repeatedly reads into it, and counts the number of newlines in the buffer (I dropped some auxillary code to keep the post short). The $! is important; without it, even though this routine is tail-recursive, Haskell will still allocate an unbounded amount of memory to store the return value. (because, of course, it's terribly important to remember that you're actually returning "1+2+3", not "6")
So, how does this unreadable, hand-optimized version of the code perform?
real 0m1.980s
user 0m1.757s
sys 0m0.094s
That's right -- I've eliminated most uses of higher-level functional techniques, and the Haskell version is still two times slower than an ultra-high-level interpreted language, while also being uglier and harder to read! Ouch. It appears the old performance albatross of functional languages is not quite dead yet.