fredag 29. august 2008

Microbenchmark of SSE in C++ revisited

In my last blog, I described a microbenchmark I've been using, which should explain the performance differences I've been seeing in Fimex when using different sqrt hardware implementations. It found a 20% speedup with SSE, which didn't explain the doubling of speed I've got when using SSE in Fimex.

I managed now to create a working microbenchmark of my vector-normalization program, which shows the differences between using SSE and non-SSE hardware sqrt implementations. While the FP-unit of x86 systems is not IEEE754 aware, the SSE-unit is. I have been using INFINITY for representing undefined values in my code.

Results of the benchmark (in MFLOPS, more is better):
  • normal values, no SSE: 237
  • normal values, SSE: 280
  • INFINITY, no SSE: 12
  • INFINITY, SSE: 1380
With a factor of 100 between non-SSE and SSE, I finally understand my performance improvements.

Ingen kommentarer: