fredag 29. august 2008

Microbenchmark of SSE in C++ revisited

In my last blog, I described a microbenchmark I've been using, which should explain the performance differences I've been seeing in Fimex when using different sqrt hardware implementations. It found a 20% speedup with SSE, which didn't explain the doubling of speed I've got when using SSE in Fimex.

I managed now to create a working microbenchmark of my vector-normalization program, which shows the differences between using SSE and non-SSE hardware sqrt implementations. While the FP-unit of x86 systems is not IEEE754 aware, the SSE-unit is. I have been using INFINITY for representing undefined values in my code.

Results of the benchmark (in MFLOPS, more is better):
  • normal values, no SSE: 237
  • normal values, SSE: 280
  • INFINITY, no SSE: 12
  • INFINITY, SSE: 1380
With a factor of 100 between non-SSE and SSE, I finally understand my performance improvements.

onsdag 27. august 2008

Microbenchmark of SSE in C++ and Java

Currently, I develop a file-converter for meteorological data, called Fimex. Those files are usually in NetCDF or GRIB data-format and contain several GB of regularly gridded multi-dimensional data. The data is thus similar to films, with the exception that you might have more dimensions, usually x,y, time plus height, and I don't want to apply any lossy compression algorithm.

When I wanted to put the program into production, the program was much to slow. I had tested it on my laptop, and there it was running fine, but on the server, I was expecting it to be running faster. On the one hand, the server had faster disks, and on the other hand, though being a bit older, the server had a CPU running at approx. the same frequency. The application is mainly doing IO, so what was going wrong?

The main differences between the server and the laptop are cache. The server has 0.5MB, while the laptop has 4MB. Analysing the program showed, that I was jumping around the data. A 200x200 pixel of double data fit nicely into 0.5MB, but since I have it in 60 different heights, I had to make sure, that I don't apply the operation on all of the height dimensions before having finished all x,y data. This cache-alignment doubled the performance.

According to gprof, the remaining and most important bottleneck was a simple sqrt which I needed to normalize my vectors. I tested different software implementations, but the performance only got worse. After a while, I recognized that SSE2 and AltiVec SIMD extensions of modern chips have a hardware implementation of sqrt, but this isn't used by default using gcc on a x86 system. It is only the default on the x86-64. Enabling SSE2 with -mfpmath=sse -msse2 increased the performance again considerably and I was finally happy with the results, even on the server.

I tried to repeat the performance-gain with a micro-benchmark, doing exactly the same operation with dummy data. That C++ and Java code can be downloaded here. The results were a bit disappointing. I got only less than 20% performance gain SSE enabled. In addition, the code running on the server was faster than the code on the laptop, while this is in contrast to fimex. I translated the same code to java. The java-code is a bit simpler, but running at approximately the same speed as the SSE code (- 1%), so SSE is on by default in java-hotspot. I'm not 100% happy with the micro-benchmark results, but at least, I'm happy with my code, which is now in production and running fast enough.

onsdag 6. august 2008

Choosing the right database

Databases come in all choices, on the high end, there are Oracle, IBM and others I forgot. Then you have a set of free alternatives to those, think MySQL and Postgres, just to name some. For smaller applications, which don't need concurrent access, there are several embedded once, SQLite, Derby, HSQL. And if you don't have many tables and you don't like SQL, you can choose file-based databases, i.e. BerkleyDB, ndb, gdb and all of those who have been so popular in the 90th before every programmer needed to know SQL. And all non-programmers believe, that the file-system or even the tape library are databases, and honestly, aren't they right.

I might have had a quite common learning curve with databases: I started using CSV files, then I was very pleased with BerkleyDB creating an graphical overview of all runners of the Köln-Marathon, and since I had to learn SQL at the end of the 90th, I installed postgreSQL and played with it, building an online-calendar for my band.

I started more serious work, that is projects which still exist, in 2002.

The online dictionary Heinzelnisse was first build using PostgreSQL. It was an obvious choice: It was installed on the machine I intended to use, I knew SQL by then good enough, I had need for several tables and concurrent access. When Heinzelnisse started to have some users a year later, I recognized how bad the choice was: PostgreSQL6 was extremely slow on Solaris Sparc machines with 32bit installed. And the machine was already 5 years old, running at 166MHz (and 4GB) at that time. The installation was so old, that the administrator never managed to get mod-perl running there, at the end it took more than 10s to make a query - that was much to slow. Then I went back using BerkleyDB again. It was very well-integrated into Perl, and fast enough on the computer, queries were running at under 2s. Again a year or two later, I switched to the extremely cheap professional hoster, which cost about 1$/month. They had only MySQL installed and I had to switch again. I was never really happy with MySQL, since I had very often problems with the character-columns which behave quite strange in some cases, i.e. Tür = tür = tyr, but it was only some learning and using binary data instead of characters. The dictionary was running nicely, but I was thrown out from the cheap hosters, since my application was too CPU demanding - they were only limited by bandwidth and disk-space. Currently Heinzelnisse is hosted at a virtual machine hoster Tektonic. Even though I can now install whichever database I like, I still stick to MySQL-MyISAM since it is working with very few memory, but I don't think I would have MySQL if I haven't been limited by the cheap webhosters.

Another application I was heavily involved in is Webdab, the European air-emission database. This database started with Sybase, and switched later to Oracle7. When we had to make it available on the Internet in 2002, we knew that the license-costs would be to much for our budget and we had to find an alternative. We found one in PostgreSQL and never regretted that decision, though we had to figure out some performance problems in the beginning.

So, choosing the right database is not simple. There are in my opinion often more non-technical reasons to select the right database than the technical ones. Maybe the most important thing to remember is to embed the database into the application in such a way, that it is possible to switch to another database at a later stage.