søndag 16. oktober 2011

Testing hsqldb database for the cloud

Cloud services have been around now for some time. While my current preferred deployment architecture are virtual servers (VPS), those VPS have not seen much attention lately.  My current VPS costs still 15$/month, the same as in 2005, though it now has 512MB of RAM rather than 64MB and 3x the disk-space. A comparable server in the cloud should be much cheaper (approx. 0$/month). But before switching to a PaaS like Google AppEnginge or IaaS like Amazon EC2, some technical issues need to be overcome, in particular, none of these offer access to the filesystem or SQL databases for free. Google is just started to solve the reprogramming issue with Google Cloud SQL, but this is not available for free.

My application Heinzelnisse comes with 5 different information databases, 3 of them are static, or at least upgraded less frequently than the application itself, while the remaining 2 are a forum and a wiki. The idea is now to move the static dbs from the MySQL database to the application war/jar file. The 3 static dbs come from spreadsheet tables and are in spreadsheet format less than 5MB in total. Therefore, I'm trying to redeploying them to an embedded database which can be run from a jar.

I used hsqldb in the current version: 2.2.5. Smaller changes to the SQL-schema was needed to create a database. I had to make sure to use the removeAbandoned=true for the DBCP connection pool settings.

When using the default in-memory tables table-data increased from 1MB in the spreadshead table to 30MB in the data.script startup file. The performance was comparable to MySQL, I still could serve about 30 requests/s as long as I used the server jvm. The problem with in memory tables is the long startup and huge memory requirements. It took 184s to read the .script file on startup. While the application was running stable with -Xmx76MB before, I had to increase this now to more than 160MB. Using that amount of memory for a 1MB file was not acceptable.

Then I change the in-memory tables to disk-cached tables. The startup time decreased to 12s again, and application was stable again with 76MB jvm memory. But the data is now stored on disk in a .data file, which is 48MB in my case. Performance didn't change, at least not in my test with no change on the query and therefore perfect caching possibilities. So far, so nice, but file-access is not allowed in the cloud. hsqldb allows resource tables in a jar, but it wasn't clear if they work with disk-cached tables.

Testing disk-cached tables from a jar didn't work, so I had to ask in the mailing list if it is supported at all. The answer was fast and positive, even with some tips to do so, but I still didn't get it working. After scanning through the code a found a obvious bug, and my patch is accepted for the coming 2.2.6, but it is obvious that this type of deployment has not been used since the start of the 1.9 release some years ago.

I tried then also with the 1.8 release, which is still the most stable version of hsqldb. This required again minor changes in the schema, but I got it installed fast. Unfortunately,  performance of 1.8 seems to be much worse than in 2.2. I didn't manage more than 4 requests/s versus 30 in mysql and hsqldb 2.2. I didn't investigate where the bottleneck was, though.

hsqldb seems to be a nice, feature-rich java database. But there still seems to be a gap between in-memory databases and full file-based databases. I didn't manage to run about 100000 rows in memory with fast startup time and low memory consumption. My next try will be to hand tune these tables.