The library to read the data uses many seekg and read on C++ iostreams. By switching the C++ iostream to an boost::iostream::device::mapped_file_source the program could be used without larger changes:
using namespace boost::iostreams; typedef streammmStream; mmStream* mmistream = new mmStream(); mmistream->open(mapped_file_source(fileName)); feltFile_ = mmistream;
This worked as expected, and the feltFile_ was working with the old data-reading commands:
feltFile_->seekg(pos, ios_base::beg); feltFile_->read((char*) ret.get(), blockWords * sizeof(word));On my 32bit platform, I ran into the first problem with files larger 2GB. mmap files need a address-space which is larger than the file-size, that is on linux 32bit ~3.5GB. For files between 2GB and 3.5GB, there is a problem with the file-size pointer of iostream, which seems to use a signed int. 64bit platforms are around the corner, so I continued my test with 1.5GB files, ignoring the 32bit issues. For compatibility, my code just catches the mmap-exception and reopens the large files as standard-streams. Performance measurements showed no benefit on using mmap-files or standard streams, even when reading the same file in parallel. There was even a slight, insignificant better performance for std::streams. Mmap files don't seem to make sense for stream-like readers, even if it is very simple to switch from the one to the other. I guess, mmap makes more sense where the file otherwise would be slurped into memory. I removed the use of mapped_files after the test, but just to simplyfy the code.
1 kommentar:
The problems with the 2.5GB file are due to missing LARGE_FILE_SUPPORT in boost::iostream 1.40. This has been fixed in 1.41 in late 2009.
Legg inn en kommentar