Wednesday, September 12

statsb.in

It happens occasionally that I need to validate an optimization by benchmarking, and the benchmark results are a bit noisy. Usually the situation does not ask for a heavy-duty statistical analysis, and I am too lazy for that anyway. However, sometimes having at least the fundamental statistics of a sample would be handy.

Obviously, calculating the standard deviation of a sample in the 21st century by itself is not a problem. There is LibreOffice, there is R, there's a bazillion statistical tools out there. They do feel a bit heavyweight for the task though, and the friction involved in starting them up and entering the data in the right format means that often I just do not bother. Google brings up some dynamic webpages for the task, a much more lightweight solution, but most of these are ugly and ad-riddled.

What's a programmer to do? Code up another weekend app, statsb.in, of course. And it is the way I like it: there is a textarea where it is easy to paste and edit inline snippets of text, say, shell script output in full; non-numeric data are ignored; statistics are computed on the fly (no stupid "Submit" buttons); and you can save the snippet on the server and get a short link (useful for footnotes pointing to data sources).

The service probably will not be expanded much further. After all, if those basic statistics are not enough, you are probably better off firing up a real statistics tool. However, I do plan to add another page for the most common task that I run into, namely, comparing the means of two populations (did my optimization have an effect?). In other words, calculating the p-value of the hypothesis that the means of two populations are the same. Stay tuned.

Wednesday, August 15

Why Software Engineering Is Not Politics

A recent post by Steve Yegge has made quite a splash. The main thesis of his post is:

Software engineering has its own political axis, ranging from conservative to liberal.

Software engineering, like any other social enterprise, does have a political aspect, but I do not think this thesis is apt. The essential property of politics is that you can not have it both ways. A political problem is a coordination problem, and political decisions by its nature apply for an entire political body. Software engineering as a whole does not have these properties. There is absolutely no reason why any two engineering projects have to follow the same ideals and guidelines. Politics in engineering only appears within boundaries of a project (or, at a slightly higher level, a company), at which point it does not really warrant the name of politics, but rather character traits or personal preferences. If Johnny goes for the puck more often than Louie, that does not really warrant a label of a "liberal".

The fundamental quality that Steve is looking at is actually risk aversiveness, as he noted himself. He should have stopped there. Making that next step of conceptualization in political language does not bring any new insights and just muddles the matter.

It seems that the political labels of "liberal" and "conservative" were picked not for their meaning, but rather for their connotations, as political views are perceived to be 1) very stable, part of a person's self-identity and (therefore) hard to change, 2) inherently polarizing, and 3) difficult or impossible to evaluate objectively. Again, these do not apply to software engineering. For example, probably like many engineers, I am clearly a liberal on personal and small projects, and a conservative on larger and more fundamental ones; so much for the idea of a core identity. As for polarization, it may be prevalent in internet echo chambers, but I have not seen much of that in my professional life, especially when working with experienced engineers. When they do have objections, those objections are grounded in engineering. This leads to the third point, claiming that the approaches can not be compared objectively. This one is the most pernicious of all, because it provides an escape hatch from arguments with substance on what is an engineering decision.

Here is my, admittedly less catchy and impressive, counter thesis: Conservativeness of a software engineering project is an objective decision to be based on circumstances. TL;DR: It depends. (Very original, isn't it?) It is a risk management decision, and those are very tricky, but not subjective.

One thing to note is that the architecture of the system should correspond to the implementation style. Schemaless, heavy on magic, untyped approaches are suited for modular systems where each individual module is of limited complexity, and where the modules are isolated from each other and can recover from failures. The more complex the individual modules (and sometimes the complexity is inherent to the problem), the more benefits from "conservative" techniques. (By the way, this lines up nicely with Steve being liberal and a proponent of service-oriented architectures). If you have the freedom to pick between these two approaches, the decision making does get a bit more ideological. However, neither the "liberal" and "conservative" labels, nor risk averseness by itself is enough for a decision here. Say, if you are developing a service, what is more risky: having a "traditional", monolithic service that is inevitably sensitive to failures and is likely to scale poorly, or a cluster of components that are generally more resistant, but have more complicated aggregate behavior?

Despite the criticisms, I have to give fat credit to Steve. His post was thought-provoking, and it brought attention to the subjective side of engineering, one which is often overlooked. That side should not be left to grow - on the contrary - but the only way to reduce its domain it is to be aware of it. I know I will think twice before making another gut decision where risk averseness plays an important factor.

Thursday, May 10

Haskell online typechecker (2)

haskellonline.org, my recent experiment to make learning Haskell easier, has been doing quite well. More than a thousand people have checked the site out since my last blog post. Yay!

I have since prettied up the interface a little bit. The editor now highlights not just the line, but also the error token itself. Simple code folding is now available too: try clicking in the gutter (near the line numbers) on the first line of a multiline definition.

Kudos to Marijn Haverbeke, author of CodeMirror, the Javascript code editing component, which made it possible to write and deploy haskellonline.org in a weekend rather than a month.

Thursday, May 3

Online Haskell typechecker

Last weekend I put up haskellonline.org, which is basically a thin web frontend over ghc, the Haskell compiler. It is limited to one Haskell module, but it makes the experience of studying Haskell much more pleasurable by running the typechecker in the background as you type. Plus, it is web-based - no local installation necessary!

The thing about Haskell is that it has a fabulous type system which can weed out a broad class of bugs at compile time. However, the usual interfaces with the compiler leave a lot to be desired. Vim + ghci in a terminal of course works, but I have already been spoilt by IDEs, and the "Alt-Tab; Up; Enter; wait; Alt-Tab + jump to line manually" dance does not appeal to me at all. Emacs integration might be better, but I do not know emacs well, and it is never a good idea to start learning two things at the same time.

Note: I suspect that many of the problems I encountered were due to the fact that I use MacOSX.

I have looked a few of the existing IDEs, but that did not go well. EclipseFP did not offer any benefits over a text editor at all. I found a blog post saying that implied that it would work better with a newer version of GHC. Cue a few hours of reinstalling GHC and compiling Haskell packages, and I got the system set up, but that made the situation even worse. Eclipse would simply crash on startup every time until I wiped the configuration directory clean. So much for EclipseFP.

I also tried Leksah. At least it did not crash, nevertheless, I found it quite unwieldy, especially for learning purposes. The autocompletion is broken in annoying ways. For example, if you type "f a = a", Leksah will autocomplete the second "a" into "as". It took me a while to discover an error in my program that was due to this autocorrection mechanism.

Leksah does have a "compile in background" mode, which was admittedly a source of inspiration for haskellonline.org, but the implementation is subpar. It simply invokes the compiler and simply redirects the output to a display pane on the window. As you write code, the text in the pane constantly scrolls, which is very distracting. The implementation is also buggy. Occasionally errors are interleaved with status output:


There is also tryhaskell.org, but it does not work not nearly as well for Haskell as its cousins trypython.org and tryruby.org do for Python and Ruby, respectively. All are limited to evaluating expressions, but in Python/Ruby that actually gets you a long way, while nontrivial Haskell programs are complex static structures that are unwieldy without a persistent text editor.

Despite all the trouble, which, again, probably has to do with my choice of MacOSX, these issues do not detract from the beauty of Haskell as a language. If you are interested, I can heartily recommend learnyouahaskell.com as very good reading material, and Tony Morris's course exercises (in combination with haskellonline.org) for practice.