2006-04-05

Ruby and Reality

I was lucky enough to be able to choose Ruby for implementing a small server for 'geo data'. Basically a service not unlike Google Maps. But much more simple and with vector output. The whole thing is used for providing data that can be used on a J2ME client.

Anyway, because of some weird problems inside the company, I had to read a 2 GB big file containing the geo data in a stupid textual format.

I started with a trivial 'each_line' approach. A 'data entry' in the file consisted of about 10 lines. The first 9 lines contained some attributes. The last line contained the coordinates of the represented geographic feature. So within the each_line block, i collected the data and attached a resulting 'GeoObject' instance to an output array whenever a data entry has been read completely.

This thing took about 4 hours to process the 2 GB.. wtf!? Alright, I didn't think much when writing the code. And I only worked with Ruby for a few days. But some things were obvious to me: For example extracting all the 'values' out of a 'line' by extracting a string and calling 'to_i' is pretty inefficient.

So I started implementing a few straight forward optimizations: Instead of extracting substrings from the line I directly add the 'bytes' to determine the integer value. And instead of 'each_line' I read 16 MB chunks of data and worked with offset/index pairs on these chunks.

This improved performance by more than 50%. But still close to 2 hours.

Funny me, I fired up Intelli/J IDEA. This took about 2 minutes.. (I hate how bloated IDEA is by now.. I'd love to see an IDEA Light!) And I started hacking away a Java solution. Using IDEA this took me less than 30 minutes for this problem. I used the NIO features. With 'getChannel' and some 'map' call to do memory mapped IO my first version of this app took about 5 minutes to process 2 GB.

How's that?

I can't explain all of this huge performance difference.. A part might be the memory mapped IO. But look at this Java API. Scanning the ByteBuffer using these 'get' calls.. I assume the Java VM is a lot more powerful than what Ruby's 'foundation' offers.

Anyway.. just a quick post on this topic. If I find the time I'll post the code. Unfortunately I have to change some things to protect the innocent..

I'll post another note on extending Ruby soon. I needed access to a polygon clipping library.. More on that soon..

tfdj

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home