Monday, March 21

Poetry in Translation

I have recently discovered a very funny page which abuses Google translation services to get really funny results. Ever found oddly simple insights in automatic translation? Well, there's a fair dose of them in Poetry in Translation, which translates English to German to French to German to English. The comments section is a bit indecent, but there are some outright hilarious results:

  • "I have a broken heart" -> "I have a defective heart"
  • "Get busy living, get busy dying" -> "If you receive a life employed, you receive death employed"
  • "The quick brown fox jumped over the lazy dog." -> "the fast brown fox jumped on the putrefied dog."
  • "Just die, why don't you?" -> "Cubic straight lines, why not him?"
  • "you won't fool the children of the revolution" -> "children of rotation tromp you"
  • "the pen is mightier" -> "the feather is more powerful"
  • "Oh how i love my girl" -> "How how my girl is expensive!"
  • "One for all, all for one" -> "For all, all"
  • "Be Afraid, Very Afraid" -> "Have fear very timidly,"

Hard drive benchmarking

Marius Gedminas experimented a bit with hdparm and his results showed no difference in linear disk read rate at the beginning of the disk as compared to at the end. That made me curious. I whipped up a very simple Python script to time some plain reads from the disk and the results are consistent with ones from dd, and with another benchmark of my new drive. Yes, using Python for benchmarking is a stupid idea, and the results are not stable, but I do consistently get almost 40MB/s at the start and no more than 28MB/s at the end of the disk. You can try the script for yourself:

#!/usr/bin/env python

import sys
import time

MiB = 2**20

BLOCKS = 128 # number of megabytes to read at a time
SPACING = 4 * 1024 # number of megabytes to seek forward


if len(sys.argv) < 2:
    print "You must supply a device (e.g., '/dev/hda') as an argument"
    sys.exit(1)
try:
    f = open(sys.argv[1], 'r')
except IOError, e:
    if e.errno == 13:
        print ("Permission denied to read the device, you"
               " may need root privileges")
        sys.exit(1)

offset = 0
while True:
    start = time.time()
    try:
        f.seek(offset * MiB)
    except IOError:
        break # We probably hit the end of the disk
    for i in range(BLOCKS):
        f.read(MiB)
    delta = time.time() - start
    rate = BLOCKS / delta
    print 'Offset: %d GB, read rate: %3.2f MB/s' % (offset / 1024, rate)
    offset += SPACING

print 'Finished.'

Monday, March 14

The Commonly Confused Words Test

English Genius
You scored 100% Beginner, 93% Intermediate, 93% Advanced, and 88% Expert!
You did so extremely well, even I can't find a word to describe your excellence! You have the uncommon intelligence necessary to understand things that most people don't. You have an extensive vocabulary, and you're not afraid to use it properly! Way to go!

Thank you so much for taking my test. I hope you enjoyed it!

For the complete Answer Key, visit my blog: http://shortredhead78.blogspot.com/.




My test tracked 4 variables How you compared to other people your age and gender:
You scored higher than 77% on Beginner
You scored higher than 39% on Intermediate
You scored higher than 61% on Advanced
You scored higher than 97% on Expert
Link: The Commonly Confused Words Test written by shortredhead78 on Ok Cupid

New hard disk for my laptop

Just a few days ago I bought a new 7200rpm Hitachi hard disk (Travelstar E7K60) to replace my old 4200rpm one by Toshiba that came together with the laptop. Now I really regret that I did not do this earlier. This is easily the best investment in a laptop's performance, unless you have a really old CPU or less than 256MB RAM.

The speed increase is very noticeable. Bootup now takes only half the time it used to, and applications start significantly faster. Seeks are more silent in the new drive, and I have not noticed any background noise because of the increased rotational speed. Battery usage has not changed at all. In general, I noticed only improvements and no regressions after upgrading.

While partitioning the new disk, I noticed that in my old Toshiba drive the root (/) partition was located at the very end of the disk, because of hysterical raisins. I did not have a separate partition for /usr, so its contents were there too. Make sure not to make my mistake of putting frequently accessed data at the end: hard disks are usually faster at the start. This is because the rotational speed of the disk is constant, but the circumference of outer tracks is larger, therefore, if the data density is uniform over the disk, the transfer rate is greater.

Speed of some drives may be more sensitive to track diameter than others. If you are curious, you can do a quick benchmark. My new one shows about 38MB/s linear read speed on the outer tracks (start of disk) and about 27MB/s on the inner ones (end of disk), a quite significant difference. To get these numbers, I used this command on Linux: sudo dd if=/dev/hda of=/dev/null bs=1M count=1024 skip=0. It reads a gigabyte of data from a given offset in the disk, the operation should take about half a minute. dd even counts the transfer rate for you. To measure performance on the inner tracks, adjust the skip parameter (e.g., about 37000 for a 40GB drive). You might want to repeat the command a few times and average the results. Do not pick an amount of data (the count parameter) less than twice your RAM, because the results may be skewed because of caching.

Sunday, March 6

One-button testing

I wrote about some inefficiencies in my procedure of running tests a while ago. I really dislike the repetitiveness of commands to switch context, as I use the vim editor for editing code and a terminal for running the tests. Had it been code, I would have refactored it long ago. Now I decided that it's time to optimize this part a little bit.

My very first idea which I had come up with long ago was to write a small script called loop, which would loop a command given as an argument infinitely, waiting for a keypress between runs. The script was an extremely simple one-liner:

while true; do $@ ; read; done
However, it only helped me with losing the Up,Enter habit a little bit, as Enter would be sufficient.

The second go at the problem was on the right track. I decided to write a small Python script to behave much like the loop script, but it would register a global shortcut handler so that I could do an iteration without having to switch to the terminal.

The idea of handling global shortcuts was OK, but the implementation gave me some pains. I tried to look around for some examples of registering global shortcuts with GNOME, but found nothing really useful. I then remembered that a multitude of KDE apps register global shortcuts, and decided to try the KDE Python bindings. In the end I wasted several hours scouring the web for information and watching my app segfault because of odd reasons. It took me a long while to get the details mostly right, and because of reentrancy problems I managed to wedge my keyboard completely so that I had to login remotely and kill the Python process to get control back. I did not quite like having to load KDE libraries either, which took a whole second to import on script startup. In the end I dumped this solution for a more simplistic approach.

After playing with the KDE shortcuts for a while, I finally understood that I don't really need a global shortcut, as 99% of the time I need to run the tests while I am working in Vim. This allowed for a much simpler system. The loop script has remained, but has evolved significantly from the one-liner. Notably, it now checks the return status of the executed command and prints a red or a green horizontal bar with a timestamp. It is very nice to have coloured feedback on whether the tests have passed. There is also an option to run an alternate hardcoded command.

I implemented interprocess communication in a slightly hacky way, by having the loop script invoke another shell script for user input (this way I sidestepped smart signal handling in a shell script). The client, in our case vim, can then send a signal to the primitive sub-script with a simple killall command. Using the process name as a unique identifier is not very clean, but good enough for me.

So, there are three scripts in total:

  • loop (to be used from the command line)
  • dumbass.sh (the stupid sub-process, only used internally)
  • dumbass-kick.sh, which abstracts the killall command in case I want to implement it in a cleaner way. It accepts '1' or '2' as an argument. If you pass '2', loop runs the alternate command.

To be able to run the tests from vim by a single keypress, I added these lines to my .vimrc to bind the given command to F12, and the hardcoded alternate command to Shift+F12:

nmap <silent> <F12> :wall<CR>:silent !dumbass-kick.sh 1<CR>
imap <silent> <F12> <C-o>:wall<CR><C-o>:silent !dumbass-kick.sh 1<CR>
nmap <silent> <S-F12> :wall<CR>:silent !dumbass-kick.sh 2<CR>
imap <silent> <S-F12> <C-o>:wall<CR><C-o>:silent !
These commands also save all active files before running the tests. I added that because occasionally I forget or mistype the write command in vim and then waste some time trying to understand why the tests are misbehaving. All in all, this provides true one-button testing, you don't even have to exit from insert mode.

If you really want a global mapping, you can map a key to invoke dumbass-kick.sh in your window manager. That's a more lightweight solution than importing KDE libraries just to use their shortcut mechanism.

Even though I mostly use these scripts for running unit tests, they could be useful whenever you need to repeat the same command lots of times. For example, I have been toying around with Lilypond (music typesetting software) a bit, and I used loop on lilypond to generate DVI output on a keypress so that I could see my changes immediately without stopping to type and switching windows. The script could be useful with make when working with compiled languages.

Tuesday, March 1

Dictionaries

Having a computer look up words for you in a dictionary can be a great timesaver sometimes, especially if you need to check many words. There are several choices of a computerised dictionary.

A simple and straightforward choice is to use a web-based dictionary. You can usually find some quite complete and verbose ones with many examples. Some sites even provide other linguistic data (this database really impressed me). Besides, Google is always handy to search for extra information.

If you have a text that has many unknown words, it might be faster to run it through a general-purpose translator, such as translate.google.com. You will lose precision, but at least you might get a good laugh from the results.

Having the dictionary installed locally is more convenient (you do not need an internet connection) and faster (almost zero latency). For some languages specialised dictionary software is available, but there is also the dict network protocol which defines a standard way for a generic dictionary client and a dictionary server to communicate. This model is quite powerful.

Setting up a dict server on Linux is not hard, in Debian it's just an apt-get install dictd away. Note that you will need to install the dictionaries yourself. Debian provides some dictionaries, e.g., dict-de-en. There is also a number of dict-freedict-* packages, but I have the impression that they are not very complete.

Now that you have a server running, you need a client to use it. There are quite a few clients available, I will mention several:

  • dict - the command-line client. Just type dict foo in a terminal and you'll get the query results immediately. Very handy but not convienient for looking up many words
  • gnome-dictionary - the GNOME dictionary client (screenshot, look on the right - not to the point, but will do). Looks nice at first but in my opinion it is not very usable, I hate the popup window when no matches are found for a query. And it pushes GNOME's "live preferences" to the uncomfortable limit - when you enter a new server, the same preferences dialog box is immediately reconfigured to that server, which looks very awkward.
  • kdict - the KDE dictionary client (screenshot). I slightly disliked it because the input box would lose focus after executing a query, so I would have to use the mouse to enter another word. Jeez, even the web-based dictionaries get this right with some JavaScript. The problem can be worked around with by mapping Ctrl+L to the "Clear Input Field" action. I would rather it selected the word instead of deleting it, but this solution is satisfactory. In addition, kdict offers database sets, which turns out to be very helpful. In most dict clients you can only query either a single dictionary or all dictionaries available. Database sets are like virtual dictionaries. An example of a use case is when you want to translate from English to another language and you have several dictionaries for this purpose, but you don't want the general-purpose ones to get in the way.

After finding out about the Ctrl+L tweak I liked KDict best. I would prefer it to be a GTK+ application rather than Qt, but it is practical, which matters most to me.