Friday, September 12, 2003

I just came back from this great mountain-bike trail in Hareskoven north of Copenhagen. I was tagging along with Jarl on his new Rockhopper. It is a great hardtail bike with disc-brakes that really proved their worth on the muddy forest trails.

I returned home covered in mud with a big, wide grin on my face.

The feeling of being immersed in the scent of the trees, ferns, shrubberies and berries, the clean air and that special quiet sound in the forest without all the noise from cars and people that surround us every day in the city is a great blessing. The weather was just perfect, too - a nice, sunny 19 C day. Being able to take the afternoon off on a day like this is a very convincing reason for working with people who are 9 timezones away.

Thursday, September 11, 2003

Borland has shipped a C#/.NET version their excellent Together tool. A free promo-version is available for download. Yum yum!
David Mertz and Michele Simionato have written some interesting articles on meta-class programming with Python. They teach a good deal of magic and are an example of how powerfully efficient high-level language constructs can be in dynamic languages like Python.

Metaclass programming in Python

Metaclass programming in Python, Part 2
Real Programmers and Efficiency

Until recently, I thought it all went wrong in 1993 when I switched to PC and installed Linux. The instruction set on the 80486 seemed messy compared to the 68k series, and the hardware was even worse, so, reluctantly, I let down my Real Programmer defenses, took up C programming and joined the ranks of lazy, cycle-wasting high-level-language bums.

I just discovered that it went wrong long before that.

I learned programming in the eighties on the Spectrum - in BASIC. However, I soon discovered that it was not efficient enough for writing my own games, so I switched to machine code on the C64. Direct hardware access and no waiting for anything to compile - being a Real Programmer, I coded directly into a machine code monitor.

Soon I started using an assembler. This introduced a few seconds delay but it made writing larger programs, like game and music engines much simpler. I was still a Real Programmer, though, letting no wasting no cycles go to waste during program execution. I knew the number of cycles needed for every instruction by heart.

Some time later I switched to the Commodore Amiga and started building an operating system to provide a fast and efficient foundation for my game development efforts. After all, the existing Amiga OS was written in winpy, low-performance languages like BCPL and C, so I set out to write a fast OS myself, with absolute control and top-notch performance. A lean Real Program in machine code wasting no cycles on the way.

I should have seen it coming, though. Goodbuy custom code, hello code reuse.

I needed a GUI so I wrote my own version of the Amiga's "Intuition" GUI, to power my animation tool (Deluxe Paint was written in C, too... what a cycle-burner). My own code was so much more efficient - the way I tuned it it probably had the fastest code ever written to wait for the user to do something - and that was definitely a Good Thing.

When it came to use it, of course, it soon turned out that it took several lines of code just to print a message to the screen.

Being lazy in the good Perlish sense, I soon constructed a package of assembler macros that effectively provided a high-level language to control the GUI and reduce the lines-of-code count.

Since then it has all been going wrong.

I went from assembly macros to C, from C to C++, Sed and Awk to Perl, from C++ to Java and on and on, always going for higher abstractions further and further from the hardware, letting more and more CPU cycles go to waste. It was the reprogramming of the Real Programmer.

In the end the experience is almost Orwellian. Just like Winston changed at the end of 1984, so did I. Today, where cycles are abundant and life is short, programmer efficiency is so much more valuable than CPU-cycles.

So, save for a few inner loops, I choose to live by Donald Knuth's (or was it Tony Hoare's?) rule of thumb, Premature optimization is the root of all evil.

Tuesday, September 09, 2003

Akiyoshi Kitaoka has published some at great optical illusions. There is an English version, too.

Monday, September 08, 2003

I think most developers have experienced something like this:

"I work for a small startup that has [...] build up about 300K lines of functionality. Up to now we've made it by being smart and conscientious hackers, but I'm increasingly embarrassed by our shortcomings in testing."

The quote is taken from a question to the Slashdot community. This one has spawned some very good answers that I recommend to anyone who are new to test-driven development, read it here: Retrofitting XP-style Testing onto a Large Project?.

Going of on a wild tangent one of the replies mentions a semantic analysis tool for Java called PMD. It sports a feature to detect duplicate code. If you have ever been charged with maintaining a program based on copy-paste programming rather than reuse through method calls you will know the pains associated with this. But we will not be talking about that now.

The really interesting part is - how do you locate duplicate code?

Well, this is where my old favourite, information theory, comes in handy. Since data compression is a way of eliminating redundancy in the input data by representing it using an optimal code we can use it to gauge the relative redundancy of one piece of code to another. If they are very similar the redundancy will be high and we have found something that could probably be rewritten.

There are many ways to do this but a simple approach will work. Let call the code snippets A and B. First, we compress A which gives us a optimal code, CA, for representing A. Then we compute CB, and we compare its code size to B encoded using the CA code - if the sizes are close to each other we have a likely match.

For all you shell hackers this leads to very simple solution for classifying documents - just concatenate the unknown document with the documents of a candidate category, compress and note the file size. Subtract the size of the compressed size of the documents in the candidate category. Call this the "delta" for the category. Compute the deltas for each category. The best matching category has the lowest delta.

Voila - a complete document classifier in less than ten lines of shell script!

The PMD tool uses the fascinating Burrows-Wheeler Transform to find duplicates. Dr. Dobb's Journal featured an article on it in 1994. If you haven't seen this transformation before go read about it. It is fascinating and ... strange. I still wonder how they discovered it. For me it is far from intuitive to look at shifted, sorted versions of the data in the first place, but once you get it, it is a very nice algorithm.

You can find a lot of articles about the BWT here [citesser.nj.nec.com].

This page is powered by Blogger. Isn't yours?