On Simplicity

Reflections on the advantages of simplicity.

This blog post is a reflection on what is probably the most important lesson I've learned since I started programing: to always prefer simplicity over complexity. First, a word of warning: what I am saying here is anything but new. And yet many people don't seem to get it. I did not until not too long ago. And in a way I still don't heed to it properly.

Since I am mostly doing web development these days (at least for larger projects), I will outline my most crucial experience with simplicity in that. I use PHP, so many would probably argue that bloat is embedded in anything I do, but oh well.

String Operations vs SimpleXML

My largest project in scope is Aklaman, a tool that extracts bibliographic data exported from bibliography managers like JabRef or Zotero, meshes multiple databases together to support groups working collaboratively in one environment and seeing what the other is doing, and then adding non-bibliographic data on the entries. E.g., I can note down my reading progress or important citations from a book (say, it started with the idea to create a low level, non-corporate version of Goodreads that supports all different kinds of text, not only books) and create projects, in which I can link multiple books or articles.

The data I add on top of the bibliographic entries is a relatively little problem. The bibliographic databases can however get quite large. I originally started with XML files extracted from JabRef (using MODS), which I read using PHP's SimpleXML. I have to add, that my main bibliography database is currently at about 4000 entries, equaling about seven megabyte of an exported XML file. SimpleXML is also a tool coming with quite many functions, OOP support et cetera, say, it's not the fastest solution either. In the end, Aklaman took about 0.7 seconds to load on the server, disregarding any delays in the case of weak connectivity.

I decided I had to put a halt to it. But how? Well, the first and most obvious thing to do was to get rid of XML. While MODS is a nice and rather readable standard for bibliographic data, a much slimmer one exists that works just as well, BibTeX. The equivalent to my seven megabyte XML file was a four megabyte BibTeX file. There are also BibTeX libraries for PHP (e.g. this one), but I decided not to use it. I had learned by then, that I didn't need all the functionalities provided. And if I don't need it - and speed is crucial - I should not have it. So I just wrote a function reading BibTeX data into a PHP array. That's 33 lines of code, including empty ones added for readability. This already reduced loading times to about a fourth.

Adding complexity in implementation, but reducing the number of calculations, I then introduced checks on the data. For example, Aklaman now reads from which database an entry stems from directly from a GET variable. I can thus determine, if I only need to read one database or all. Also, if I know how many entries I want to load, it's just logical to break the loop reading entries after having loaded all I need. With restrictions like these, I reduced loading times to about a tenth (depending on the page even much less in some cases).

Recooking Old News...

...is essentially what I am doing here. Back sometime between 1989 and 1991, Richard Gabriel called a somewhat exaggerated version of this approach the philosophy of "worse is better", even if he later wrote an article in critique of the former (See also: WP). So, let's be content with the even older "concrete analysis of concrete conditions" and appropriate actions - without adding unnecessary things.

Looking at much of what I see around though, I repeat myself here, it seems to have become unfashionable. Hardware has gotten better over the years, and resource scarcity seems to not be an issue anymore most of the time. If I was not such an impatient person with my PC (and if I didn't care about scalability), I would have probably not cared much either when Aklaman was starting to go slow.

But how fast is a website to load in the a Third World Country if it needs to load five JavaScript libraries before getting anywhere? Certainly too long. Will a modern version of Windows - or even a Linux using Gnome - run on my old laptop, which has "only" one GB RAM? It won't.

In return, will a website that needs to load five JavaScript libraries before getting anywhere start really quickly with my comparatively good connection at home? Well, not as quick as a HTML + CSS solution, that only gets half done but loads much, much faster. Will my new laptop with four GB RAM run satisfyingly quick with Windows? No.

Simplicity plays a crucial role if there is resource scarcity. The benefits of simplicity however remain even if the scarcity is removed. Programs run quickly. And eventually, one gets used to programs running quickly and gets annoyed at unnecessary delays caused by bloated designs. Fortunately, simplicity also does not need to mean a worse product - e.g. less features or a worse UI. The strengths of HTML5 + CSS3 are a case in point. What more could one need in web development (on the client side)? Sometimes increased interactivity is indeed necessary, but good old vanilla javascript suffices for that.

What simplicity does mean however, is a more thorough understanding of how things work - or how one can get things to work in some way, at the very least - and slightly more work, because one needs to reimplememt many things that are already there, or, in the less rosy case, copypaste things from stackoverflow rather than downloading a library containing the wanted feature plus five other ones one doesn't need.

I think the message of concrete analysis of concrete conditions and appropriate actions remains important in many areas, not just programming. In programming, it has the longest and most successful history, that's why it's a nice example. And one gets to directly feel the benefits from it. In other areas, feedback loops are slower. But they are comparable.

I have yet to learn much to get my programs to be more simple. And I have to reimplememt much, too. The blog I publish this on is using XML for storage. JSON would be a nicer way. CSV or, with a different folder structure, markdown probably the best. But I've learned my lesson for future projects, and I hope to adhere to it as best as I can.

P.S.: There are many benefits I have not written about here. Resilience, for example. Well, just remember this.