onsdag 6. august 2008

Choosing the right database

Databases come in all choices, on the high end, there are Oracle, IBM and others I forgot. Then you have a set of free alternatives to those, think MySQL and Postgres, just to name some. For smaller applications, which don't need concurrent access, there are several embedded once, SQLite, Derby, HSQL. And if you don't have many tables and you don't like SQL, you can choose file-based databases, i.e. BerkleyDB, ndb, gdb and all of those who have been so popular in the 90th before every programmer needed to know SQL. And all non-programmers believe, that the file-system or even the tape library are databases, and honestly, aren't they right.

I might have had a quite common learning curve with databases: I started using CSV files, then I was very pleased with BerkleyDB creating an graphical overview of all runners of the Köln-Marathon, and since I had to learn SQL at the end of the 90th, I installed postgreSQL and played with it, building an online-calendar for my band.

I started more serious work, that is projects which still exist, in 2002.

The online dictionary Heinzelnisse was first build using PostgreSQL. It was an obvious choice: It was installed on the machine I intended to use, I knew SQL by then good enough, I had need for several tables and concurrent access. When Heinzelnisse started to have some users a year later, I recognized how bad the choice was: PostgreSQL6 was extremely slow on Solaris Sparc machines with 32bit installed. And the machine was already 5 years old, running at 166MHz (and 4GB) at that time. The installation was so old, that the administrator never managed to get mod-perl running there, at the end it took more than 10s to make a query - that was much to slow. Then I went back using BerkleyDB again. It was very well-integrated into Perl, and fast enough on the computer, queries were running at under 2s. Again a year or two later, I switched to the extremely cheap professional hoster, which cost about 1$/month. They had only MySQL installed and I had to switch again. I was never really happy with MySQL, since I had very often problems with the character-columns which behave quite strange in some cases, i.e. Tür = tür = tyr, but it was only some learning and using binary data instead of characters. The dictionary was running nicely, but I was thrown out from the cheap hosters, since my application was too CPU demanding - they were only limited by bandwidth and disk-space. Currently Heinzelnisse is hosted at a virtual machine hoster Tektonic. Even though I can now install whichever database I like, I still stick to MySQL-MyISAM since it is working with very few memory, but I don't think I would have MySQL if I haven't been limited by the cheap webhosters.

Another application I was heavily involved in is Webdab, the European air-emission database. This database started with Sybase, and switched later to Oracle7. When we had to make it available on the Internet in 2002, we knew that the license-costs would be to much for our budget and we had to find an alternative. We found one in PostgreSQL and never regretted that decision, though we had to figure out some performance problems in the beginning.

So, choosing the right database is not simple. There are in my opinion often more non-technical reasons to select the right database than the technical ones. Maybe the most important thing to remember is to embed the database into the application in such a way, that it is possible to switch to another database at a later stage.

Ingen kommentarer: