Saturday, April 3, 2010

Transparency in tools

I've been dealing with a bunch of new tools and services lately, and running into the inevitable little problems along the way.  This got me thinking about the subject of transparency.  I like to say that any product must meet at least one of the following two goals:

1. Work more-or-less perfectly, virtually all of the time.

2. Be sufficiently transparent that, when the product fails, the user has some idea what to do about it.

Developers usually focus on #1 and ignore #2.  However, perfection is rarely achieved, so users are often left in a bind -- the thing they are trying to use isn't working, but they don't know how to fix it.  If any sufficiently advanced technology is indistinguishable from magic, then insufficiently advanced technologies should not pretend to be magic.

Examples abound.  My (anti-)favorite is a cell phone that can't connect.  My old Blackberry had a habit of dropping calls, with the message "call failed".  Why did the call fail, you might ask?  Was the signal too weak?  (Perhaps I should move to a less occluded spot.)  Was the tower handling too many calls?  (Perhaps I should move far enough to get in range of another tower.)  Is the phone's software glitching?  (Perhaps I should reboot.)  Was it a problem at the remote end?  (Perhaps I should wait for them to call back, or try another number.)  Of course, the phone gives me no information about any of this.  Just "call failed".

(Often, retrying the call succeeds.  It didn't occur to the phone designers to do this automatically, of course, or even prompt me with the option to redial.  The phone just sits there, staring blankly at me, as if nothing had happened.)

My Nexus One, which I otherwise love, has a similar habit with data connections.  It's pretty reliable when I'm on wifi, but otherwise attempts to access the net have a habit of failing.  Sometimes the phone provides no diagnostic information, just "connection failed" or some such.  Those are the good cases.  In other situations, it doesn't even tell me the connection failed, it just never gets around to doing the thing I'd asked it to do.  (One test I've found is invoking the Refresh command in Gmail.  If my connection is working, it spins for a few seconds.  If my connection is down, it spins for a fraction of a second.  No message is displayed in either case.)

The cases that have been troubling me lately all involve software systems.  I've been using Eclipse to develop a simple servlet-based web site on Tomcat and running it on my Mac and Amazon's EC2.  Sometimes it works, but often it can't find my servlet, or doesn't get the latest version.  When it doesn't work, I have a hard time figuring out why, because none of these systems are transparent:
  • Often, Tomcat can't find the compiled version of my servlet -- ClassNotFoundException.  Where is this supposed to be?  (Somewhere under WEB-INF/lib, I think, but I'm not sure and neither the error message nor the documentation of any relevant tool talks about it.)  Which Eclipse command is supposed to create it?  (Build?  Publish?  Something I haven't found yet?)  Did it even get built?  Where would Eclipse be putting it prior to copying it to the Tomcat server?  (No idea.)  When Eclipse publishes the site to Tomcat, where the heck in my filesystem does it put the site?  (Complete mystery.  It's not under Tomcat's webapps folder; at least, not when I run locally.  Tomcat's admin tool displays a list of configured applications, but doesn't show their filesystem path.)
  • When I'm running under EC2, things get even more complicated.  Amazon's Eclipse plugin can launch an EC2 instance and copy my site to it.  This often takes several minutes.  What is going on during that time?  There are some status messages, but I don't know what they mean.  I don't know what actual commands or operations are being performed.  When it fails -- and it often does -- I don't know what operation failed, what context it was invoked in, or how it failed.  I had to dig around for 20 minutes to even find where the heck, in the filesystem of the EC2 image the plugin uses by default, Amazon placed Tomcat.  (/env/tomcat.)
Imagine how much easier all this would have been if the tools involved were more transparent about their own operation.  When Tomcat can't find a class file, the error message could tell me where it looked.  When Eclipse publishes a site to a server, it could tell me what files it's copying, where they came from, and where they went to.  Commands like "Build", "Publish", and "Clean" could tell me what files they affected.  Amazon's EC2 plugin documentation could include a page on "what the plugin actually does when you ask it to launch a server".  None of this would be terribly hard, and it would make the tools much more transparent and debuggable.  This wouldn't just make life better for users, it would make the developers' lives easier, by reducing support requests, enabling better bug reports, and (for open-source projects) giving outsiders a better shot at fixing bugs directly.

No comments:

Post a Comment