One for the Morning Glory: November 2005

Music for everyone

Several years ago (it seems like more than three) I was a student at Brown University. In addition to studying computer science and mathematics, I also took a bunch of courses in the music program, including three years of piano lessons. When I graduated, I gave a recital for friends and family, plus any members of the public who happened to wander in (not many did). And, this being the 21st century, we recorded it and I got a CD of the recording. I edited it, split it into tracks, and handed some copies out to my friends and professors as a going-away present.

Now I'm no longer a student of any sort, and Brown seems a lifetime away, but the music is still sitting on my hard drive. I've always meant to make it available for download, but I couldn't figure out how to get past technical issues of space or bandwidth. A few days ago, though, I had a rather belated bright idea: maybe the Creative Commons project would have links to places where you can host your music for free. Lo and behold, it turns out that the Internet Archive lets anyone upload audio to their server! Better yet, they're even willing to store lossless encodings -- more than willing, actually; FLAC is the preferred encoding of uploads.

So, to make a long story short, click here to hear my bad piano playing. :-)

Lentil-based protesters

NPR just mentioned that some protesters camped outside Bush's ranch are having lentil soup and rice for Thanksgiving to show their solidarity with the Iraqi people. Which I found amusing to listen to while eating my evening ration of lentil soup. Here I thought I was just a cheapskate, when all along I've been participating in superficial solidarity with the bombed-out people of Iraq! :-)

Thread pattern recognition

After reading the dpkg-sig thread, I finally realized that I could have saved a lot of my time by I taking Joey Hess's advice on thread patterns and deleting or at least ignoring most of its subthreads.

A type-system hack

WARNING: nothing in this post has been implemented or tested, except maybe at a minimal proof-of-concept level ("it compiles!"). It's just some musings on how you can make the computer do your work for you.

I was reading a defense of Hungarian notation when I started to get a familiar notion tickling the back of my brain. The author was explaining that the original idea behind Hungarian was not to replicate type information in variable names, but rather to include some sort of usage information: one example would be giving a string value holding a value that is untrusted the name utName. While it's more reasonable than lpschwzMyVariable, this suggesion made me suspicious that the author was really hacking around suckiness in the C language.

Tagging values with usWhatever and then verifying that you never mix usBlah with Blah isn't just tedious and error-prone, it smells a lot like doing work by hand that the computer ought to do for you. It seemed to me that it ought to be possible to "tag" unsafe values at the type level and let the language's type-checker sort things out. The simple approach is of course to define something like this:

newtype Unsafe t = Unsafe t

NB: I'm using Haskell because it was the first language that sprang to mind when I thought "type system hack".

This newtype declaration lets you attach an Unsafe tag to any type. Presumably the type signatures of routines that return untrusted data (such as a hypothetical getHTMLFormValue or a routine to retrieve previously entered form values from a database, if they aren't sanitized first) should be modified; for instance, if the signature of getHTMLFormValue was

getHTMLFormValue :: HTMLForm -> String -> String

it should instead read

getHTMLFormValue :: HTMLForm -> String -> Unsafe String

To create a new Unsafe value you simply apply the Unsafe constructor to the value:

getHTMLFormValue = Unsafe (oldGetHTMLFormValue)

To unwrap Unsafes you use pattern-matching:

let (Unsafe unwrapped) = wrapped in ...

So, this sort of works, but it's unsatisfying on several levels:

It allows the creation of types such as Unsafe Unsafe Integer and even Unsafe [Unsafe Integer]. If a value is already unsafe, how can it be "more unsafe"? And what in the world is an unsafe list, or an unsafe list of unsafe values?

Unfortunately, this is probably a limitation that any scheme that relies on type-tagging will have. You can get around it by either explicitly defining each new unsafe type:
```
newtype UnsafeString = Unsafe String
```
This is arguably a good idea anyway, since virtually all unsafe values (in the sense the previously linked article mentiones) are likely to be strings, and as I note later, it's not clear how to "sanitize" a non-string. OTOH, placing unnecessary type restrictions feels un-Haskellish to me, and it seems likely that you can trust the programmer not to pile up Unsafes.
This scheme makes it very tempting to strip off the unsafeness of a value and lose the fact that you did so.

Since the only way to strip off unsafeness is via pattern-matching, it seems to me that it would be tempting to pattern-match away the unsafeness in function declarations:
```
foo (Unsafe x) = ...
```
This may be just fine, but my preference would be to forbid pattern-matching altogether (by not exporting the constructor directly to other modules; just export a normal function that constructs an Unsafe value) and instead providing two routines to remove unsafety:
```
escapeData :: Unsafe t -> t
unsafeExtractRawData :: Unsafe t -> t
```
The first routine encodes the data into a "safe" form, while the second simply fetches the wrapped value:
```
unsafeExtractRawData (Unsafe x) = x

foo x = doSomeStuffWith (unsafeExtractRawData x)
```
The rationale here is twofold: first, by giving the dangerous extraction routine a scary and hard-to-type name, you remind yourself to think twice before invoking it (cf unsafePerformIO); second, you can easily audit all the locations where the unsafety is "stripped away" with a textual search.

There are some issues involving the safe encoding -- you probably ought to make a typeclass of things that can be safely encoded from an Unsafe value; not too hard and this entry is already too long.
The final issue involves manipulation of Unsafe values. So far you can pack and unpack Unsafes; however, this seems like it would get a bit annoying if you had to do any significant operations on them. For instance, searching in an unsafe string for a substring seems like it would be a pain and a bit error-prone (since you'd have to remember to wrap the string back up in an Unsafe). You might think that ideally, all normal String operations should work directly on an Unsafe String or pairs of Unsafe Strings -- you just shouldn't be able to convert between the two.

Unfortunately, I can't see any way of doing everything that you'd want to here. For some types you can. For instance, the following definitions let you compare values wrapped in Unsafe in the usual way:
```
instance (Eq t) => Eq (Unsafe t) where
    (==) (Unsafe v1) (Unsafe v2)   = v1 == v2
    (/=) (Unsafe v1) (Unsafe v2)   = v1 /= v2
```
This admits (Unsafe t) to the Eq class, and looks reasonable enough (assuming that it's safe to compare unsafe values, of course). However, for probably the most important Unsafe values -- Unsafe Strings -- I can't see how to do this. The problem is that String is not a typeclass, it's a synonym for [Char], a list of characters. The important string operations are either defined directly on the String type, or generically on [t]; i.e., "a list of anything". Because there is no typeclass, you simply can't overload these operations to apply to unsafe strings as well as regular strings.

One way around this is to provide an "unwrapping combinator":
```
liftUnsafe :: (t -> t) -> Unsafe t -> Unsafe t
liftUnsafe f (Unsafe v) = Unsafe (f v)
```
This would automatically convert string operations to unsafe-string operations; not ideal, but reasonable enough. Operators are still a little annoying, but you can mangle their names:
```
(++u) :: Unsafe [t] -> Unsafe [t] -> Unsafe [t]
(++u) = liftUnsafe (++)
```
So you have to use ++u instead of ++ to concatenate unsafe lists; not too bad.

While it may not be perfect -- it only makes sure that you don't accidentally violate safety, and doesn't stop you from deliberately doing something dumb -- this approach feels a lot safer to me than ad-hoc mangling of variable names. I used Haskell because you can do type hacks fairly easily in it, but I don't see why you couldn't play a similar game (albeit much less conveniently) in C++, or even in a dynamic language like Python. Of course, Python would give you runtime errors about safety violations instead of static ones, but that still beats getting runtime security holes.

I would add as a hypothesis that it should be possible to express just about any variable-name-mangling technique whose goal is to "make wrong code look wrong" in the type system, provided your type system doesn't suck; and that doing so is preferable to doing it manually, since it means that a lot of repetitive error-checking can be handled by the computer. And since the computer is better than you are at verifying formal invariants, forcing it to handle them whenever possible is always a good idea.

One for the Morning Glory

Thursday, November 24, 2005

Music for everyone

Lentil-based protesters

Wednesday, November 23, 2005

Thread pattern recognition

Monday, November 21, 2005

A type-system hack

About Me

Links

Previous Posts

Archives