Confessions of a Social Tools Architect
27 Jan
The world we know has quickly changed from one of 10-digit phone numbers to significantly longer URIs, URLs for us common folk. For some members of the web community, especially the blogging and web standards aficionados, this outgrowth of the URL as a universal identifier has lead to a new feeling of urgency to protect the sanctity of the URI.
Generally speaking, cruft refers to “Excess; superfluous junk” (see Dictionary.com). In the web purists world, however, a slightly more specific definition has been applied — “all that junk the average web user doesn’t care about that makes URL’s long and annoying” (as defined by Mark Pilgrim, a respected gladiator for the cause.
To make this somewhat more relevant, let’s take a quick look at an example of a crufty URL and an un-crufted one:
Crufty
http://cnn.com/news/2004/0004.html
Cruft-Free
http://cnn.com/news/2004/4/
I tried to formulate my own respectable list of reasons why Cruft was a bad thing, but I found a valuable, clear definition over at Oli Studholme’s blog, Boblet.
An important but overlooked aspect of websites is the URL, or Uniform Resource Locator. This is the ‘web address’, usually of a page on the internet. While it seems minor, this is part of the page’s interface, and some simple rules can make a big difference in ease of use. Important ease-of-use elements for people are the URL should be:
- easy to type
- easy to remember
- short if possible
- ‘hackable’ ie predictable enough to guess
- and permanent
All this being said, I understand the merits of the process, but I certainly don’t understand the need for it from a strictly “consumer” point of view. Let’s look at the different aspects of a truly cruft-less URL.
In general, the less someone has to type, the better the situation. This is true for many very good reasons, not excluding our generally atrocious spelling habits and tendency for other forms of error.
Truly easy to type URLs only matter as far as the domain name itself is concerned. Why? Simple — most people don’t type URLs, they click on them. Don’t believe me? Consider all the following tools that allow us to NOT type URLs:
Anti-Typing Techniques
Anti-Typing Tools and Technologies
If you can honestly tell me that the average person doesn’t spend the majority of their computer- and Internet-related time utilizing or operating one of these techniques or applications then I will concede that an easy-to-type URL is important.
You should note that above, I made a specific distinction between an easy to remember URL and an easy to remember domain. Unlike a fully-qualified URL, a domain name is very important — it’s the entrance sign to the highway that is your site.
Again, you might be thinking if you think the domain is all there is that matters, consider this list of terms:
All of the above terms are the terms I have heard people, both technical and non-technical, use to refer to a web site, not any specific internal page, just the plain home page that greets them. You’ll notice that URL is listed as one of those terms.
The honest truth is that the most important thing for a person to remember is the way to the front of your site. A well thought out Information Architecture, Site Search, and Site Map should be able to lead people to essentially everything within the site itself.
The length of the URL is certainly a matter to be concerned about. Why? Browsers cannot handle an infinitely long URL. In fact, just based on Internet Explorer (a safe assumption purely on browser penetration, the maximum length is 2,083 characters.
Before we go further, let’s put something in perspective. A well-written English sentence should be approximately 20 words in length. Assuming 5 characters per word, that’s only 100 characters. In short, this is a huge length when you really think about it.
Some might argue that URLs need to be short to make them easier to type or remember, but we’ve already seen why that’s really not relevant. Others might argue that it is more efficient to store a shorter URL than a longer one. And they are absolutely right. BUT, those people tend to forget two incredibly unstoppable trends in the computer industry: BROADBAND and MASS STORAGE. The technology industry is delivering Terabyte drives to consumers’ desktops and living rooms and ultra-high speed connections through their satellites and walls. Exactly what are we saving up for by collapsing our URLs?
Search engines, and search in general, have changed the landscape. It is no wonder that one of the primary interfaces to information (going back to the initial library systems through to Google and beyond) is search. We write things down, store them away, and index them because we openly admit that we can’t remember things and acknowledge that memory is collective far more than individual. History books are full of lessons and compromises of consensus. And search is only getting more and more detailed by the second.
Often, many advocates of the Cruft-Free URL will make use of time and date stamps as an integral part of the URL schema. Unfortunately, people are very imprecise with time. Consider the profound degree of tardiness in your average business day (an environment ruled by time) and your average person’s inability to remember what they did last week. Is building this information into one’s URL really making it any better? Doesn’t the other meta-data associated with time-related content suffice (Date Posted, Date Last Modified, File Creation and Modification times, Server Logs, etc.).
Some cite that URLs need to be short so they print well? Who prints anymore anyways? See the points relating to “Easy To Type”. When it comes to URL length, size truly does not matter.
Of all the reasons provided for Cruft-Free URLs, I think this is the only one that I agree with whole-heartedly. Of course, I would contest that the removal of cruft is not required to accomplish this. Imagine this scenario:
Requested URL
http://socialtwister.com/news/whathappened.html
The problem with the above URL is that it may or may not exist. Note the .html is crufty, by definition. How can this resolved and remain hackable? Simple — use a search-sensitive error handler. What does that mean?
All web servers can be configured to handle 404 errors with either a standard page or can be adapted to process those errors in a more useful manner. In the example above, two scenarios are possible:
/news exists:
In this scenario the user could be directed to the default page or template for the news page and the user is automatically given immediate alternatives.
/news does not exist:
In the event that a file was requested and the directory does not exist, the remaining parts of the missing URL, from the DOMAIN name onwards, can quickly and easily be converted into a search string such as “news + whathappened”. This is not only painless for the user, but leaves them with more information without requiring detailed preparation and management of hackable pages down the URL string.
Noting in this world is permanent by any means. The assumption of most content publishers is that their content and archival system will survive all changes for the near future. It is noble to attempt to create truly universal, permanent resources on the Internet. However, it requires a truly deep financial and technical commitment to make this a reality – something most people do not possess.
The additional argument for permanence is what is called “Future-Proofing” URLs. The notion here is that since web technologies and developers obsession change constantly over time, any information stored in a URL that identify either the application server or server state are both meaningless and potentially harmful.
In the example provided just before, I noted that a .html extension was seen as cruft. This is the reason. If in 3 months the site changed and attempted to utilize Cold Fusion (.cfm), PHP (.php), Java Server Pages (.jsp), or some other technology, all references to the .html file would be broken. In theory, this is sound reasoning. Unfortunately, there are too many opportunities to prevent this from becoming a larger problem, ironically enough broached by the same people moving away from crufty URLs.
Three immediate solutions are available for the above scenario:
Granted these techniques are not going to resolve the issues relating to passing application state and configuration data around in the URL, but generally that’s frowned upon as a development practice and quite possibly ignorable from an application development point of view.
In general, the transition of a system from chaos to order is both desirable and fruitful. The efforts that are opposing Cruft are worthy tasks and worth considering as part of the design and development of a URL-rich application. However, no system of organization should compromise or otherwise straddle innovation when other, forgiving and flexible alternatives exist.
4 Responses for "The Case For Cruft"
Hi Greg,
I was interested to read your article. I agree with some of the things you’ve discussed, like your thoughts on “easy to type” and “easy to remember”, but I still come to a different conclusion. I’ve written up my comments in a reply on my site because they got a little long for a comment:
http://oli.boblet.net/2004/01/29/cruft
Feel free to shoot me down here or there!
peace - oli
Greg I completely agree with your point of view on “cruft” in URLs. I might take this a step further and say that a URL as an entity is cruft no matter how short and uncluttered. Perhaps a URL should never be considered an appropriate user friendly way for any human to find web content. This was not always the case and I know the origin of the URL was to get away from less desirable IP addressing. As the web has developed from the quaint small town of hand hewn personal sites to the machine readable database backed monster it is today the structure and, at times, the existence of a readable URL should be dictated by the efficiency of another machine’s ability to read and parse it and NOT the deficiencies of humans. This leads me to point out one of the errors in Oli’s assumptions. The good search engines DO read the URL query string. They HAVE to. The idea of “cruft free” URLs is appealing from the human standpoint but a machine really could care less. We are developing a web that uses machines to handle our data and assist us sifting through mounds of it to get the thing we’re after. This is, perhaps, the best case for paying close attention to the storage format and information architecture for the data, and less attention to a URL. The URL, by its very nature, is ephemeral and undesirable (read: cruft). We hope that the data is not. ;-)
http://www.canadiantire.ca/assortments/product_detail.jsp?FOLDER%3C%3Efolder_id=2534374303514237&PRODUCT%3C%3Eprd_id=845524442914373&ASSORTMENT%3C%3East_id=1408474395348027&bmUID=1075397093603&assortment=primary
all i wanted was a desklamp.
also:
http://www.urbandictionary.com/define.php?term=cruft
http://e-stacja.net/sms.htm sms - http://e-stacja.net/dzwonki.htm dzwonki - http://e-stacja.net/radio.htm radia
Leave a reply