Tuesday, May 31, 2005

Vacations and Camera Frustration

Last evening, I got back from a three-day trip to San Juan with Kate. It was a great vacation -- the islands are beautiful, we had some really nice (albiet expensive!) meals, and there's a lot of nice hiking and pretty countryside. We stayed at a B&B called "Olympic Lights", which is basically an old farmhouse that has been converted to serve as an inn by a nice older couple; I would recommend it to anyone else spending a night out there (but they only have 4 rooms, so you'll need to book in advance if it's a busy weekend).

Anyway, in the course of this vacation, I took a pile of photographs with my camera. This was sort of the last straw for me: I've been slowly accumulating tons of digital images and just dumping them in a nearly-unstructured pile, and the thought of adding these new images onto the heap was too much. So, I decided to bite the bullet and try out some of the image-management programs that have cropped up over the last few years. Sadly, none of them could really do what I want. It seems pretty simple to me; I really just need a program that can:

  • Import my existing images into its database, preserving structure if appropriate.
  • Store my images in some sort of database structure which allows me to quickly find pictures corresponding to a particular date (or a particular series of images copied off the camera at once) or relating to a particular subject. For instance, sorting into albums/subalbums (i.e., directories) or attaching tags to the images would fulfill this requirement.
  • View subsets of the database (e.g., individual albums, or all the images more recent than 5/2/2005) as thumbnails.
  • Easily and efficiently perform lossless rotation of images by selecting one with the arrow keys and hitting a hotkey -- clicking down to a sub-sub-menu option does not qualify. Bonus points for batch operations (or the ability to safely perform rotations in the background).

Unfortunately, my brief survey came up empty. Only a few programs really seem to provide an abstract "view" of my collection of images, and those often make it far too annoying to perform the one image manipulation operation that I really want, lossless rotation (and that's despite providing a huge number of redundant editing functions that would be better performed in the GIMP). For instance, to rotate images in digikam, you have to pop up a new window (obnoxious) and perform a lossy rotation. (Note: after installing the kipi-plugins package, I now have a 'rotate' option from a context sub-menu, but I have no idea if it's lossless; in any event, it's far too inconvenient to be seriously useful in fixing up a batch of images that I'm copying off my camera). album-shaper struck out immediately for not having a recursive import function -- yeah, I'm going to individually click down to every single one of the dozens of directories where I already have pictures. Right.

Perhaps the closest program to what I want was f-spot, which basically puts your whole image collection in date order and lets you place images in categories using "tags". Unfortunately, it seems to be very immature -- double-clicking on an image reliably crashes it, some perfectly fine images show up with blank thumbnails, and while rotation buttons/menu items are present, they don't actually do anything (it's not documented whether they are supposed to do anything, and if so, whether they'd be lossless). And I have to say that the tagging interface is annoying enough that I wonder if I'd actually want to use this over the long-term: you have to navigate a bunch of submenus in order to pick a tag, rather than (say) choosing from a permanently available list of checkboxes. This is made especially annoying by the fact that there is a permanently available list of checkboxes, but it's used to limit the display rather than modifying the selected image.

I guess the old adage is right: if you want something done properly, you need to do it yourself. So many things to do properly, so little time...

Friday, May 27, 2005

Employment Catch-22

In order to get a job doing {X}, you must first have 5 years of professional experience doing {X}. Fun, no?

Thursday, May 19, 2005

I did it!

I am now officially the holder of an MS degree in Computer Science [and Engineering], and will be joining the ranks of the .. er .. unemployed. Yay me. (I have graduation photos, but I'm too lame to get real web hosting, so they aren't available for now)

Life Lesson of the Week: do not under any circumstance place all your information about job applications on a laptop, then fly across the country with said laptop -- at least, not if you're going to leave its charger behind.

Thursday, May 12, 2005

License Insanity

I've generally managed to avoid non-free software lately, but the other day I was asked to install the new Adobe reader at work, and (having an attack of morbid curiosity) I decided to read its license. Some of the provisions in there are truly jaw-dropping, at least if you've been away from the Gollum software culture for long enough to forget what it's like on the "other side". Bear in mind that you can download this software for free off Adobe's Web site; nonetheless, you are forbidden to:

  • Use anything other than NFS on UNIX or Windows Terminal Services to remotely access the software. This clause excludes, as far as I can tell, NFS on Linux (Linux is not UN*X), SMB on any operating system, more obscure filesystems (Coda, AFS, Intermezzo, etc); it also forbids you from running acroread over an SSH tunnel or via VNC.
  • Back the software up more than once. (<BSA>rotating back-up tapes, are you? Take that, lawbreaker! *smack*</BSA>)
  • Make it more convenient in any way to install the software.
  • Make it more convenient to use the software, by integrating it with other graphics software.
  • Save data in PDF forms on your local filesystem, and don't you dare even THINK about trying to find a workaround! (I think this means that it's illegal for you to use cups-pdf with acroread. You hear that, you cups-pdf-using pirates, you??)
  • Install the software on two computers that might be simultaneously used, unless you download a separate copy for the other computer. Actually, I'm not clear on whether you're allowed to ever use this on more than two computers (they only seem to give permission to install it once, with a special exception for non-simultaneous use on two).
  • Use certain features of the software (not clear what) with documents that haven't obtained permission from Adobe to use those features.
  • Be a citizen of Serbia, Sudan, or Iran (among others).
  • Refuse to perform an audit of your licenses for this software, should Adobe demand that you do so.
  • Install an upgrade of acroread without purging all older versions. (Death to those vile IP-destroying staged rollouts!!)
  • There's also the usual "thou shalt not reverse-engineer" clause. (but if you are permitted by law, then you may reverse-engineer it -- PROVIDED that you first ask Adobe to give you the information and they refuse. That's nice of them, don't you think? Oh, did I mention that requests for information go through Customer Service? And that you are forbidden to tell anyone else what you learned?)

...however, if you refrain from doing all that, Adobe will graciously deign to permit you to run its software. Did I mention that all this is for software you can download for free off the Internet?

With every passing day, I become more convinced that Wonko the Sane had the right idea. And just to reinforce that thought, I am not a lawyer and this really, really, really is not legal advice.

Monday, May 09, 2005

Eat Cold Logic, Feeble Dependency Problems!

As I mentioned in a previous post, I'm exploring new ways of handling dependency problems in the experimental aptitude branch. This (very long) post explains the basic idea and some of its ramifications. Hopefully, I've also told the blog to just put this first paragraph into the RSS feed, so I can ramble at length without making Planet readers want to burn me in effigy.

I started seriously thinking about writing a new resolver back around New Years, when I had a minor insight. I realized that a core behavior of aptitude, apt-get, and even dselect, a behavior that I've just accepted for years without much thought, is actually a UI deficiency. You see, all of these programs -- indeed, all dependency resolvers that I've used (although I'd love to learn of software that doesn't behave this way!) -- work in a simple way: they find a solution to the dependency problem that you hand them, using some simple heuristics to generate a "reasonable" one, and ask you if it's OK. If not, well, you get to solve the whole thing yourself.

The problem here is that there's no middle ground between fully automatic operation and fully manual operation. But a fully automatic solver will always produce some results you don't want -- this is demonstrated by the fact that two different people, confronted with the same dependency problem, might prefer different resolutions to it. My idea was that the process of resolving dependencies should really be much more interactive, with the user able to instruct the program to go look for a "better" solution. In this model, it might even be acceptable for the initial solution to be "poorer" than apt's solution, as long as it was easy to get to the user's favorite way of handling the situation. If you think of the space of all possible solutions, the program becomes a tool for interactively examining this space (ignoring non-solutions).

It happens that there are a lot of algorithms designed for this sort of situation. My first thought was that best-first search might be a good starting point, as it's simple and it makes it easy to get the "next best" solution. Now, one danger with this algorithm is that it can take exponential time to find a solution -- in fact, any algorithm that is guaranteed to find a solution if one exists will have this problem (assuming P!=NP). However, I was relying (and my experience so far seems to prove me right) on the fact that the Debian archive only produces "easy" instances of this problem. To put it another way, we don't have an archive full of crazy dependency conflicts, and so almost anything halfway sensible that you do will solve a dependency problem -- the difficulty is not in finding solutions, but in generating solutions that are "reasonable".

However, at the time I had to finish a masters thesis. As I was wrapping up work on that, I started in on the dependency resolver again. Now, best-first search works by keeping a priority queue of search nodes (these are 'partial solutions'; in my case, collections of actions that might resolve the dependencies). At each step, the algorithm pulls the "best" node off the queue and, if it's not a solution (doesn't resolve the dependencies), enqueues its successors -- these are the search nodes that can be reached in "one step" from the current node. Thus, there are two main things that have to be defined before the algorithm can be used: the ordering of search nodes (which ones are "better"?) and the generation of successors.

Ordering search nodes has two main goals: you want to find a solution fast and you want to produce a good solution (note that these goals may be opposed to each other). Figuring out the general shape of the ordering was easy -- it would penalize broken dependencies and lots of changes (to guide the search towards solutions), and it would weigh the "desirability" of various solutions (for instance, the removal of an installed package should be penalized relative to the installation of a non-installed package). It takes effort to find the right values for the various factors that are involved, but I deferred this problem for later.

The successor generation was not so obvious, although it seems obvious in hindsight, and this was where I stumbled on the second simple-yet-cool idea in the new resolver. Now, it's obvious that you could generate successors by just blindly performing every possible installation, removal, and upgrade. However, this would be completely unmanageable: the number of successor nodes at each step would be so large that the algorithm would get bogged down almost immediately. A better approach is to do a dependency-directed search. Find an unsatisfied dependency and enqueue all the different ways of satisfying it; these are the successors of your search node.

But what are the ways of satisfying a dependency? For a straight dependency it's fairly simple: install any package version that fulfills the dependency, or remove the source package, or install a different version of the source package. But for a Conflict, you have to remove every version of any package that is conflicted against. Plus, you have to deal with provided packages and other "interesting" special cases.

At this point I did what anyone with too much academic training would do: I started looking for a simple model of the problem that I could solve without much trouble. Of course, the satisfaction of Boolean expressions is one option, but this is kind of unsatisfying, since you end up with a totally different problem than you started with. It might be possible to solve the problem this way, but as a tool for reasoning about it, SAT is unilluminating. I wanted a model that was closer to the world of package dependencies.

What I discovered is that all you need to represent dependencies is, well, dependencies. You do not need Conflicts, each dependency targets an ORed set of individual package versions, and (perhaps even more useful for eliminating annoying special cases) you do not need to distinguish between "installed" and "not installed" packages. The elimination of virtual and not-fully-versioned dependencies (leaving you with ORed dependencies and conflicts that target individual package versions) is pretty obvious, but the elimination of Conflicts and removals might be less so. As a CS weenie, I think this is a rather cute little trick; skip the next paragraph if you don't.

Ok, now that the non-CS-weenies are all gone, I'll continue. Look at it this way: the core property of package versions is that they are mutually exclusive; you can't have versions 1 and 2 of "foo" installed at the same time. So to eliminate removals, I just introduced a new version of every package (call it, say, UNINST) that represents the removal of that package. This models removals perfectly, since a package can't be removed at the same time that it's installed. Now it should be obvious how Conflicts are modeled: if "foo" has versions 1, 2, 3, and 4, then a conflict on versions 2 and 3 is equivalent to a dependency on version 1 or 4; this follows because with the elimination of the ability to remove packages, we know that exactly one version of each package is installed.

Now, the really interesting thing here, aside from the fact I was able to prove cool theorems about my algorithm using this model, is that the model is so close to the problem at hand that it is possible to perform transformations between the two on the fly using thin wrapper classes. If you've used libapt, this is your cue to start drooling :-). If you haven't, I should explain that because libapt gives you a very low-level view of the package cache, finding answers to apparently simple questions -- such as "what actions will satisfy this dependency?" or "what packages depend on this one?" -- require complex and error-prone iteration constructs.

Explicitly instantiating the abstract model for APT, and writing the algorithm against this model, effectively factors the code of the resolver into two parts: a module of wrapper objects that define the "physics" of APT's world, and the core resolver algorithm. The nasty iteration constructs are isolated in one place and can be verified without more than a small amount of pain, while the algorithm itself is free of them. The magic of C++ templates is used to bind the two pieces together at compile-time (although I suspect that the iterators aren't, or shouldn't be, fully inlined, since some of them are actually quite complicated; this is an optimization to look at later).

With this model in place, generating successors becomes almost trivial. The technique that's actually used is a bit more sophisticated than what I described above; you can find the details in my more formal LaTeX writeup. The most interesting thing (IMO) is the various tricks that are used to restrict successor generation; they have the double effect of avoiding "silly" solutions (like "remove A, install B, install A, install C, remove A, install D, ..."), while also decreasing the number of successors generated at each step (hence making the algorithm quite a bit faster).

In case it's not obvious from the preceding comments, I should point out that one major benefit of this approach is that all solutions can be generated (technically this is not true, but the caveat is not important for our purposes -- see the paper). In particular, this kills the problem of "apt doesn't know how to install from experimental" once and for all. It also should kill off the problem of ambiguous virtual dependencies: aptitude will score solutions that install the package with the highest Priority above all the alternatives.

Some work still needs to be done. Recommends are currently ignored by the resolver; I have a plan for handling them (described in the writeup). With complicated problems, it's sometimes hard to find the solution that you want because there are just so many to wade through; I have a plan for this too (it involves more back-and-forth, so you can guide the resolver more explicitly towards your solution). Oh, and with really complicated problems, aptitude might just give up -- as a hedge against blowup, it only tests 5000 nodes before aborting. However, if you have such a complicated situation, it is likely that apt will also fail to find a solution, and you always have the option of asking aptitude to "try harder".

Hopefully that explains a bit more what I've been muttering and talking people's ears off about lately. :-)

Saturday, May 07, 2005

All Hail Lentils, That Shall Be Soup Here, After

When I reappeared on #debian-devel a few weeks ago, one or two old-timers commented that it had been a while since they'd seen me. The opening of my blog seems like as good a time as any to explain, or at least begin to explain, why I dropped off the face of the Net for a couple years.

The short answer is: grad school. I enrolled in a Masters of Computer Science program at Penn State University, and I picked an extremely theoretical thesis topic that turned out to also be very complicated. Maybe even too complicated for a masters degree. The resulting thesis is available on my department Web site for those who want the gory details (all 200 pages of them), along with some slides I used to present it (made with the excellent latex-beamer package).

If anyone reading this is considering becoming a grad student, I have one important piece of advice on how to survive that you probably won't see anywhere else. It's nothing to do with academics or advisors; it's just this simple admonition: Learn To Cook -- at least, either that or get a job that pays well enough to eat at restaurants on a regular basis. Malnourishment from attempting to survive on pasta alone can really ruin your semester, and that's just one fun way for a bad diet to screw you over. Not only that, but you can save a heck of a lot of money if you buy food in its raw form, and to top things off, it tastes better!

At the moment, I'm working my way through a large batch of lentil soup. By "large", I mean "I can probably eat this as either a main course or a side dish with every meal this week". The total cost for this is, well, I haven't worked it out exactly, but I'd guess all the ingredients come to about $6. It took me maybe a half hour to get all the ingredients in the pot (then you just let everything boil for a while). Not a bad deal, I say. Other useful recipes to know include potato-cheddar soup, minestrone, baked potato, and various ham-based things.

As for Debian, the first thing that I started working on when I got back was a brand shiny new problem resolver for aptitude. More on this in another post; I may be a bit clueless about blogs, but I think they're supposed to be shorter than this ;-). For now, you can just read my ramblings on the subject, or try aptitude from experimental and see what brea^H^H^H^Hhappens.

This Dang Blog Thingamajig

Well, I finally broke down and set up a blog. I'm still not sure what they're good for, or even if I'll use this more than occasionally, but time will tell how this works out.