Monday, August 14

Localization coordination for Debian #6

With the deadline approaching, it appears that my productivity is rising. Seems like the typical efficiency=1/days_left (and at T=0 my brain explodes).

Basically I'm happy with what I have for DDTP. It will need some testing to replace the current system. One thing that I would like to have is package priorities: this has some support in the old system. I am not sure how to approach this yet, but I'm sure I'll think of something. Anyway, this is not a priority and will probably be settled after the deadline.

Priorities for next (and last) week are to get the relational backend running and to write a guide to the API. I have already started work on the RDB backend. I am using SQLAlchemy as the wrapper. A nice side-effect is that SQLAlchemy supports SQLite, which means that getting the thing to run would be a snap, without requiring fancy configuration of MySQL or PostgreSQL. On the other hand, having these options allows better scalability as the database and the web server(s) can be separated. As for the guide, I am aiming for a style similar to library documentation on python.org. I expect no trouble here because I already have some experience with LATEX.

Last but not least, I have to remember to clean up quite a few leftover TODO and XXX markers in the things that I have. Most of them are benign, however.

Thursday, August 10

Localization coordination for Debian #5.5

I have been working on getting the DDTP to work with Pootle. Import from DDTP translation files works fine now (although it takes a while) as well as export to Translation-?? file format. I have also created a layer to split the translations up in groups by name, to avoid humongous multi-megabyte .po files.

More information and plans for next week coming up tomorrow.

Tuesday, August 1

Firefox tip-of-the-day

A small tip for Firefox: use mouse gestures for basic operations. They are provided by the mouse gestures extension. I am no big expert here, as I only use three gestures: go back (left), go forward (right) and close tab (down,right). Nevertheless, having these around is already very convenient, because they are so simple and so frequently needed. I usually do browsing with (surprise) my mouse and these gestures streamline the process because now I don't have to care about the toolbar (and the close tab buttton in particular) any more.

If you are only interested in the simple gestures, I found that it helps to disable diagonal gestures: in the extension's properties, Additional Settings tab, set diagonal tolerance to 0 percent. This will disable some advanced gestures but I never wanted those anyway, so it reduces the chance of my gesture being incorrectly recognized. I also disable the New window (down) gesture ("General" tab, "Edit gestures" button) because it would sometimes be activated instead of close tab when the "right" in my "down, right" motion were too small.

Mouse gestures, like anything, need some time to get used to, but after a while they require much less effort than seeking out a button. In a few days you will learn that very little motion is required for the gesture to be recognized. For other applications usually the keyboard is actively used, so keystroke shortcuts would be faster, but for web browsing in particular, mouse gestures fit the bill very well in my opinion.

By the way, Opera supports mouse gestures natively.

In other news, my work on Debian l10n project was mentioned on Debian Weekly News. I found it very funny that they linked to Worldforge, which is an open sorce MMORPG, instead of WordForge, the localisation project. Admittedly I had made this mistake myself previously, and unfortunately WordForge has a much lower profile than WorldForge. Oh well, real news publishers do this kind of stuff all the time :)

Localization coordination for Debian #5

There have not been updates for an extended period of time as I have taken a completely unplanned and unexpected holiday. Now I'm back on track.

Work I have accomplished since the last update:

  • updated TranslationUnit and the .po parser. In the end I decided that it's not worth making it too different from Pootle's pofile. Comments are now dealt with and the header is treated specially.
  • merging templates. I defined an interface for objects able to merge translation stores and provided a simple implementation. Pootle already has an implementation of this too, I'll try to encapsulate that as well.
  • help with move to Subversion for the Wordforge project. The Subversion repository is already operational at https://svn.sourceforge.net/svnroot/translate/trunk, although it's read-only at this moment.
  • some small shufflings in the API. Generally the defined interface are holding out fairly well new code, so only small modifications are needed.

Here's what I want to do next week:

  • Wire this thing up to DDTP. Should be easy, I just need to ask about a good way to fetch DDTP translations. Probably this will be a download-only demo in the beginning.
  • Same for debconf l10n templates.
  • Come up with a translation review and approval model. This is one of the cornerstones of the API and may be tricky to get right. I think that it is essential to have a sort of a 'diff' tool working at translation unit level (much like ordinary 'diff' works on text). To get usable diffs several copies of the template (upstream & local) should be stored. With this in place it shouldn't be hard to implement pushing of new DDTP / debconf translations upstream.
  • further updates to the API as flaws and missing features pop up.

Saturday, July 8

Localization coordination for Debian #4

This was a week of prototyping.

I performed some more basic changes to the API. I introduced folders so that arbitrary tree-like grouping structures would be possible. I also moved the language layer below the module layer (previously it was above, like in pootle), because it eliminates parallel hierarchies in the whole structure. It also just appears more logical to me. While the bottom-most TranslationUnit class needs more work until it's complete, I hope that the container interfaces will not need to be changed much more, as there would be more and more work to update the corresponding implementations.

I have written a prototype .po parser that uses Wordforge's tools to parse a .po file and converts the result into structures defined by the new API. Admittedly it's at a very early stage. There is also a preliminary implementation that can read Pootle's backend storage directory structure. It is read-only at the moment, and does not read Pootle's metadata files yet, but it is usable. There's also an XML-RPC server that demonstrates this backend: you can send XML-RPC requests to translate words in specified Pootle projects.

After having dealt with containers this week, next week I will probably concentrate on the actual translation objects and look deeper into structure of XLIFF to ensure decent compatibility. I would like to accommodate most of what Wordforge currently supports in this regard. I will also probably start thinking about the security subsystem (I have a feeling that this will spur a few discussions on the mailing list).

I have also tried to gently push Wordforge to migrate to Subversion. Sourceforge has its own migration facilities, so hopefully the transition will be easy and painless. CVS has been a little of a pain, and now I noticed that it's actually threatening me!

...
revision 1.1
date: 2006/07/08 12:06:43;  author: gintautasm;  state: dead;
branches:  1.1.2;
...

There was some interest on the Wordforge mailing list about using the XML-RPC interface to co-operate with thick clients (KBabel, etc.). For the moment I have enough work on Pootle itself and XML-RPC is not a priority, but if anyone would be interested in implementing the client side for some translation application, I would be very happy to cooperate and coordinate the server's XML-RPC interface.

Hopefully next week the weather will be a bit cooler too: it's so hot that it is difficult to fall asleep because of the heat. The heat definitely does not help to get actual work done.

Friday, June 30

Localization coordination for Debian #3

This time the weekly report is late, sorry about that. I spent the better part of the week without access to the internet.

I did have my computer with me, and I have worked a bit on the Pootle backend API, which was received without major complaints on the Pootle mailing list. There were some very slight changes to the interfaces. Most importantly I whipped up a simple proof-of-concept implementation of the interfaces. That revealed a few more small problems, but now I am confident that the interfaces are consistent and can actually be implemented. This demo implementation is not persistent, but it might be useful as a base for other implementations and in tests, or it could be trivially made persistent with use of pickle.

The current API and the implementation are in the Pootle CVS repository (see here).

Next, I will attempt to wrap .po file classes that Pootle currently has. Implementing basic functionality should not be very difficult there, especially with recent changes by David Fraser.

Tuesday, June 20

Localization coordination for Debian #2

I sent a new revision of the suggested translation storage API to the mailing list, addressing all the shortcomings that have been pointed out. Hopefully it will not change much any more and I can start working on an implementation this week.

My plan is to first produce a very simple pickle-based implementation of the API. This should be relatively quick to build and would highlight the weak points of the API. I think that this can be finished this week. After that (and possibly revisions to the API) comes the important part: writing a nice wrapper for the existing pofile class. This should not take too long either, another week at most (especially since my exams would already be over).

When a .po-based implementation is in place, the fun starts. I'd then like to try to have another application that concurrently uses Pootle's backend storage (currently po-based). Another week of work.

Tuesday, June 13

Localization coordination for Debian #1

I have been working on an integrated l10n infrastructure for Debian. Instead of starting from scratch, the Wordforge project was chosen to be used as the base. Currently Wordforge has a toolkit for various l10n operations as well as a web frontend Pootle that uses this toolkit. My job is to adapt these tools to Debian needs.

I see three major areas of my future work in Wordforge:

  • scalability: importing all translations of all packages in Debian into Wordforge as it is would require a very powerful computer. I believe that a planned transition to a relational database as the backend of Wordforge (instead of plain old files) will help immensely here.
  • frontends: there is only one frontend to the system, Pootle. Debian will need an e-mail interface, an XML-RPC interface, a Subversion interface and possibly others in addition to the web interface. This is more difficult than it sounds because Pootle is intertwined with the backend. Work is underway to define a clear API for the backend that could be used by various clients.
  • process: Debian translators have strict translation review and ownership processes. These would have to be incorporated into Pootle.

A (to me almost distressingly) large amount of discussions took place on the debian-i18n and Wordforge mailing lists on these topics and more as I dug into Wordforge to better understand the current situtation and plans of other developers. After the initial communication peak, now things are really starting to roll. Here's what we have now:

  • A general concensus for a move in Wordforge towards a relational database. This took quite a bit of convincing ;)
  • Agreement on the ideas of a new backend API
  • A sketch (in code) of the new API

I posted the sketch of the API to the Pootle mailing list. There was little opposition and mostly constructive remarks. Unless a disagreement pops up, I would expect the API design to be finished by next week.

My longer term plans:

  • write an implementation for the new API (a week)
  • cover the current file-based backend under the API (difficult to estimate)
  • migrate Pootle to use the new backend (difficult to estimate)
  • write a new frontend or two: e-mail, XML-RPC (a week)
  • write a backend based on a relational database (a week)

One thing that is still keeping me down a is a couple of exams due in a week. After that I should have much more time to work on this project.

Tuesday, May 16

z3reload moved to the Zope 3 base

A small project of mine, z3reload, which implements very basic view code reloading, has moved to the Zope 3 Base. Hopefully that will make it a bit more visible. Zope 3 is already fragmented as is, some consolidation can't hurt.

Monday, May 1

Russian to English dictionaries on Linux

Update (2008-09-13): the dictionaries in http://dictd.xdsl.by/dicts are probably more complete than the ones provided here, see this post for more information.

It is very convenient for me to have a local dictd server set up, not only to look up obscure English words and jargon, but also for translation from foreign languages. Dictd is quite fast and powerful, with a multitude of dict clients to choose from. There are some pretty good free dictionaries too. For German, the dictionary in the Debian (Ubuntu) package dict-de-en (German-English + English-German) is fairly good. For Russian, the Mueller English-Russian dictionary (mueller7accent-dict) is great.

Unfortunately, the Mueller dictionary does not contain translations from Russian to English. I have been using ksocrat for that purpose, but KSocrat is a little buggy and its usability does not shine. I looked around on the web for a downloadable Russian-English dictionary in dict format, but surprisingly could not find anything useful. There are quite a few online translation sites, heaps of small DOS-era utilities and some home-made Windows stuff, but nothing that would be easy to feed to dictd. What I did find was Sdictionary.

Sdictionary is a dictionary application that has a fairly large community. Its Russian-English dictionary with 300 000 words was more than adequate for my purposes, but the database format was incompatible with dictd, and I couldn't find a converter on the web, so I wrote one.

The converter from Sdictionary to Dictd formats is a bash script that dumps an Sdictionary file to a plain text file and massages it a bit so that it can be fed to dictfmt. To use the script you will need to install PTkSdict from swaj.net (Debian package) and the Ubuntu packages dictfmt and dictzip. Then just drop the script in the same directory as some dictionary foo.dct and run ./sdict2dictd.sh foo (where foo is the name of the dct file). The script should produce foo.dict.gz and foo.index which you can install into /usr/share/dictd. The script is far from perfect and discards some formatting but it's better than nothing. The best solution would probably be to extend dictfmt so that it could read the Sdict dump files directly.

If you just want a dictd Russian-English dictionary, I have uploaded the files converted from Sdict here: rus_eng_full.dict.dz (8MB), rus_eng_full.index (15MB).

By the way, when running in daemon mode, dictd eats an unreasonable amount of RAM (it's taking 40MB resident memory on my machine). If the dict server is only for personal use, it makes sense to run it through inetd as described in /usr/share/doc/dictd/README.inetd.gz. It appears that the speed difference is not noticeable on modern computers.