tirsdag 16. desember 2008

Comparing 64bit and 32bit Linux and Java

I'm running a Tomcat server which needs to allocate large (>1GB) of memory. For that, it would make sense to run Linux and Java in 64bit mode, thus being able to allocate much more than 2GB (Java) or 3.5GB (Linux) of memory. On the other hand, there are a lot of rumours on the net that 64bit Linux uses twice or more the memory. There is a good article including measurements explaining that 64bit java needs approx. 50% more memory for strings and integers.

Here are my results tested on the same machine as virtual xen-hosts, using debian-etch in i386 and amd64 bit mode. The machines got both 256MB memory and no swap:











Action32bit64bit
free after boot219M217M
file-size c-prog10M12M
data-allocationmax 217M217M
free after data-allocation224M224M
java -servermax 208M @ Xmx214Mmax 203M @ Xmx234M
java -clientmax 209M @ Xmx214Mn.a.


Both the C and the java-program simply allocate some memory. The java-program can be seen at the end of this blog. The java-program tried to allocate 256 * 1MB, until it threw a OutOfMemoryError. The Xmx settings have been adapted so high, that Linux didn't kill the JVM with 'Out of Memory'. It was astonishing to see that the Xmx settings could be set higher on the 64bit jvm, while the maximum available memory within
java was slightly lower.

The results are an assurance that 64bit Linux does not require much more memory than the 32bit Linux. At least not for a server machine with only 64bit libraries installed, and applications which don't require many pointer. Desktop machines may require much more if also the 32bit libraries need to run due to some 32bit only programs. The biggest difference is code size (20%). But gcc is known to have different default options in 32bit and in 64bit mode.

Example of code run to test data allocation:

public class Main {
public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Main: buffers size (in m)");
System.exit(1);
}
int buffers = Integer.parseInt(args[0]);
int size = Integer.parseInt(args[1]) * 1024 * 1024 / 4;
java.util.Vector store = new java.util.Vector(size);
for (int i = 0; i < buffers; i++) {
store.add(i, new int[size]);
System.err.println("allocated buffer " + i);
}
}
}

onsdag 19. november 2008

Don't be simple - xml

Recently, I'm working with Metamod, a metadata database-application, which can be seen among others on the Damocles website. The metadata of scientifical data is nowadays usually expressed in xml, e.g. DIF or ISO19115. There are several ways to work with xml-files, the best known are maybe DOM, SAX or StAX, often connected to XPath. These come with a lot of commands, implemented in different languages. It is quite of a learning curve to process xml-data, and this is maybe a reason why a lot of people still believe that "just use ASCII" is much better.

Help is coming along by modules like XML::Simple (Perl) or SimpleXML (PHP). These integrate xml nicely into their respective language, converting it to a perl-structure or a php-class. These modules have their right to exist when somebody needs to parse an xml-document written by somebody else and he doesn't want to learn about XML.

Whenever I had to work with XML, it started with something very simple, a good case for the simple modules. But shortly after I have to extend the XML, start with namespaces, modify the original file or something else. And that's where the simple modules fail. The author of XML::Simple has recognized the same problem and written an article how to step up from XML::Simple to DOM/XPATH.

The problem with the "simple modules seems to be old , quoting Einstein: Make everything as simple as possible, but not simpler.


fredag 29. august 2008

Microbenchmark of SSE in C++ revisited

In my last blog, I described a microbenchmark I've been using, which should explain the performance differences I've been seeing in Fimex when using different sqrt hardware implementations. It found a 20% speedup with SSE, which didn't explain the doubling of speed I've got when using SSE in Fimex.

I managed now to create a working microbenchmark of my vector-normalization program, which shows the differences between using SSE and non-SSE hardware sqrt implementations. While the FP-unit of x86 systems is not IEEE754 aware, the SSE-unit is. I have been using INFINITY for representing undefined values in my code.

Results of the benchmark (in MFLOPS, more is better):
  • normal values, no SSE: 237
  • normal values, SSE: 280
  • INFINITY, no SSE: 12
  • INFINITY, SSE: 1380
With a factor of 100 between non-SSE and SSE, I finally understand my performance improvements.

onsdag 27. august 2008

Microbenchmark of SSE in C++ and Java

Currently, I develop a file-converter for meteorological data, called Fimex. Those files are usually in NetCDF or GRIB data-format and contain several GB of regularly gridded multi-dimensional data. The data is thus similar to films, with the exception that you might have more dimensions, usually x,y, time plus height, and I don't want to apply any lossy compression algorithm.

When I wanted to put the program into production, the program was much to slow. I had tested it on my laptop, and there it was running fine, but on the server, I was expecting it to be running faster. On the one hand, the server had faster disks, and on the other hand, though being a bit older, the server had a CPU running at approx. the same frequency. The application is mainly doing IO, so what was going wrong?

The main differences between the server and the laptop are cache. The server has 0.5MB, while the laptop has 4MB. Analysing the program showed, that I was jumping around the data. A 200x200 pixel of double data fit nicely into 0.5MB, but since I have it in 60 different heights, I had to make sure, that I don't apply the operation on all of the height dimensions before having finished all x,y data. This cache-alignment doubled the performance.

According to gprof, the remaining and most important bottleneck was a simple sqrt which I needed to normalize my vectors. I tested different software implementations, but the performance only got worse. After a while, I recognized that SSE2 and AltiVec SIMD extensions of modern chips have a hardware implementation of sqrt, but this isn't used by default using gcc on a x86 system. It is only the default on the x86-64. Enabling SSE2 with -mfpmath=sse -msse2 increased the performance again considerably and I was finally happy with the results, even on the server.

I tried to repeat the performance-gain with a micro-benchmark, doing exactly the same operation with dummy data. That C++ and Java code can be downloaded here. The results were a bit disappointing. I got only less than 20% performance gain SSE enabled. In addition, the code running on the server was faster than the code on the laptop, while this is in contrast to fimex. I translated the same code to java. The java-code is a bit simpler, but running at approximately the same speed as the SSE code (- 1%), so SSE is on by default in java-hotspot. I'm not 100% happy with the micro-benchmark results, but at least, I'm happy with my code, which is now in production and running fast enough.

onsdag 6. august 2008

Choosing the right database

Databases come in all choices, on the high end, there are Oracle, IBM and others I forgot. Then you have a set of free alternatives to those, think MySQL and Postgres, just to name some. For smaller applications, which don't need concurrent access, there are several embedded once, SQLite, Derby, HSQL. And if you don't have many tables and you don't like SQL, you can choose file-based databases, i.e. BerkleyDB, ndb, gdb and all of those who have been so popular in the 90th before every programmer needed to know SQL. And all non-programmers believe, that the file-system or even the tape library are databases, and honestly, aren't they right.

I might have had a quite common learning curve with databases: I started using CSV files, then I was very pleased with BerkleyDB creating an graphical overview of all runners of the Köln-Marathon, and since I had to learn SQL at the end of the 90th, I installed postgreSQL and played with it, building an online-calendar for my band.

I started more serious work, that is projects which still exist, in 2002.

The online dictionary Heinzelnisse was first build using PostgreSQL. It was an obvious choice: It was installed on the machine I intended to use, I knew SQL by then good enough, I had need for several tables and concurrent access. When Heinzelnisse started to have some users a year later, I recognized how bad the choice was: PostgreSQL6 was extremely slow on Solaris Sparc machines with 32bit installed. And the machine was already 5 years old, running at 166MHz (and 4GB) at that time. The installation was so old, that the administrator never managed to get mod-perl running there, at the end it took more than 10s to make a query - that was much to slow. Then I went back using BerkleyDB again. It was very well-integrated into Perl, and fast enough on the computer, queries were running at under 2s. Again a year or two later, I switched to the extremely cheap professional hoster, which cost about 1$/month. They had only MySQL installed and I had to switch again. I was never really happy with MySQL, since I had very often problems with the character-columns which behave quite strange in some cases, i.e. Tür = tür = tyr, but it was only some learning and using binary data instead of characters. The dictionary was running nicely, but I was thrown out from the cheap hosters, since my application was too CPU demanding - they were only limited by bandwidth and disk-space. Currently Heinzelnisse is hosted at a virtual machine hoster Tektonic. Even though I can now install whichever database I like, I still stick to MySQL-MyISAM since it is working with very few memory, but I don't think I would have MySQL if I haven't been limited by the cheap webhosters.

Another application I was heavily involved in is Webdab, the European air-emission database. This database started with Sybase, and switched later to Oracle7. When we had to make it available on the Internet in 2002, we knew that the license-costs would be to much for our budget and we had to find an alternative. We found one in PostgreSQL and never regretted that decision, though we had to figure out some performance problems in the beginning.

So, choosing the right database is not simple. There are in my opinion often more non-technical reasons to select the right database than the technical ones. Maybe the most important thing to remember is to embed the database into the application in such a way, that it is possible to switch to another database at a later stage.