Cloud services have been around now for some time. While my current preferred deployment architecture are virtual servers (VPS), those VPS have not seen much attention lately. My current VPS costs still 15$/month, the same as in 2005, though it now has 512MB of RAM rather than 64MB and 3x the disk-space. A comparable server in the cloud should be much cheaper (approx. 0$/month). But before switching to a PaaS like Google AppEnginge or IaaS like Amazon EC2, some technical issues need to be overcome, in particular, none of these offer access to the filesystem or SQL databases for free. Google is just started to solve the reprogramming issue with Google Cloud SQL, but this is not available for free.
My application Heinzelnisse comes with 5 different information databases, 3 of them are static, or at least upgraded less frequently than the application itself, while the remaining 2 are a forum and a wiki. The idea is now to move the static dbs from the MySQL database to the application war/jar file. The 3 static dbs come from spreadsheet tables and are in spreadsheet format less than 5MB in total. Therefore, I'm trying to redeploying them to an embedded database which can be run from a jar.
I used hsqldb in the current version: 2.2.5. Smaller changes to the SQL-schema was needed to create a database. I had to make sure to use the removeAbandoned=true for the DBCP connection pool settings.
When using the default in-memory tables table-data increased from 1MB in the spreadshead table to 30MB in the data.script startup file. The performance was comparable to MySQL, I still could serve about 30 requests/s as long as I used the server jvm. The problem with in memory tables is the long startup and huge memory requirements. It took 184s to read the .script file on startup. While the application was running stable with -Xmx76MB before, I had to increase this now to more than 160MB. Using that amount of memory for a 1MB file was not acceptable.
Then I change the in-memory tables to disk-cached tables. The startup time decreased to 12s again, and application was stable again with 76MB jvm memory. But the data is now stored on disk in a .data file, which is 48MB in my case. Performance didn't change, at least not in my test with no change on the query and therefore perfect caching possibilities. So far, so nice, but file-access is not allowed in the cloud. hsqldb allows resource tables in a jar, but it wasn't clear if they work with disk-cached tables.
Testing disk-cached tables from a jar didn't work, so I had to ask in the mailing list if it is supported at all. The answer was fast and positive, even with some tips to do so, but I still didn't get it working. After scanning through the code a found a obvious bug, and my patch is accepted for the coming 2.2.6, but it is obvious that this type of deployment has not been used since the start of the 1.9 release some years ago.
I tried then also with the 1.8 release, which is still the most stable version of hsqldb. This required again minor changes in the schema, but I got it installed fast. Unfortunately, performance of 1.8 seems to be much worse than in 2.2. I didn't manage more than 4 requests/s versus 30 in mysql and hsqldb 2.2. I didn't investigate where the bottleneck was, though.
hsqldb seems to be a nice, feature-rich java database. But there still seems to be a gap between in-memory databases and full file-based databases. I didn't manage to run about 100000 rows in memory with fast startup time and low memory consumption. My next try will be to hand tune these tables.
Abonner på:
Legg inn kommentarer (Atom)
3 kommentarer:
You have posted a nice comparision in your blog.I am working on Eucalyptus 1.6 and HSQL 1.8. I want to redsign the schema of Eucalyptus script files which store users information to ensure privacy of my cloud users.For doing so I want to split few of the existing tables into smaller tables. I wanted to calculate the amount of extra storage space thatll need.since these are memory type tables so i am intersetd in the amount of memory used up by adding these tables and few other things...can u please indicate how have u done these tests and how can i proceed with my work..thanks
Hi waqdeel,
estimating the size of data in memory was unfortunately not simple. I tried the memory-debugger from netbeans, which gives good hints where memory is used at different stages in the program-flow, but it wasn't good enough at estimating total amount of memory needed. What I finally did is kind of embarrassing: I switched on the verbose-gc, but some load on my servlet, and tested with different -Xmx settings until I got a OutOfMemoryException, or a permanently running FullGC. Since I've only be changing the database, I could thus estimate the perceived memory consumption (+-5MB).
A small follow-up: I rewrote now the database to run completely from memory, using standard java datastructures rather than SQL. I had an access-layer, so it was quite easy to change the code.
Results where about 10% performance increase for the whole application, i.e. load of the machine. About 30MB more memory consumption in Java compared to the pure mySQL implementation. About 30s slower startup-time (for loading the tables).
Legg inn en kommentar