Viser innlegg med etiketten mmap. Vis alle innlegg
Viser innlegg med etiketten mmap. Vis alle innlegg

tirsdag 7. august 2012

Performance of memory mapped files for streams

Memory mapped files, or mmap-files map the file into virtual memory, allowing to address the files data as if it is in memory, without the overhead of actually copying everything into memory. Mmap-files are part of POSIX and therefore available on practically all platforms. I investigated the performance of mmap-files for numerical weather prediction files, i.e. files larger 1GB, where large chunks of data must be read as a stream.

The library to read the data uses many seekg and read on C++ iostreams. By switching the C++ iostream to an boost::iostream::device::mapped_file_source the program could be used without larger changes:

        using namespace boost::iostreams;
        typedef stream mmStream;
        mmStream* mmistream = new mmStream();
        mmistream->open(mapped_file_source(fileName));
        feltFile_ = mmistream;

This worked as expected, and the feltFile_ was working with the old data-reading commands:
    feltFile_->seekg(pos, ios_base::beg);
    feltFile_->read((char*) ret.get(), blockWords * sizeof(word));
On my 32bit platform, I ran into the first problem with files larger 2GB. mmap files need a address-space which is larger than the file-size, that is on linux 32bit ~3.5GB. For files between 2GB and 3.5GB, there is a problem with the file-size pointer of iostream, which seems to use a signed int.

64bit platforms are around the corner, so I continued my test with 1.5GB files, ignoring the 32bit issues. For compatibility, my code just catches the mmap-exception and reopens the large files as standard-streams.

Performance measurements showed no benefit on using mmap-files or standard streams, even when reading the same file in parallel. There was even a slight, insignificant better performance for std::streams. Mmap files don't seem to make sense for stream-like readers, even if it is very simple to switch from the one to the other. I guess, mmap makes more sense where the file otherwise would be slurped into memory. I removed the use of mapped_files after the test, but just to simplyfy the code.