Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

François Schiettecatte Wed, 24 Oct 2012 18:37:17 -0700

Aaron

The best way to make sure the index is cached by the OS is to just cat it on 
startup:


        cat `find /path/to/solr/index` > /dev/null

Just make sure your index is smaller than RAM otherwise data will be rotated 
out.

Memory mapping is built on the virtual memory system, and I suspect that ramfs 
is too, so I doubt very much that copying your index to ramfs will help at all. 
Sidebar - a while ago I did a bunch of testing copying indices to shared memory 
(/dev/shm in this case) and there was no advantage compared to just accessing 
indices on disc when using memory mapping once the system got to a steady state.

There has been a lot written about this topic on the list. Basically it come 
down to using MMapDirectory (which is the default), make sure your index is 
smaller than your RAM, and allocate just enough memory to the Java VM. That 
last part requires some benchmarking because it is so workload dependent.

Best regards

François

On Oct 24, 2012, at 8:29 PM, Aaron Daubman <daub...@gmail.com> wrote:

> Greetings,
> 
> Most times I've seen the topic of storing one's index in memory, it
> seems the asker was referring (or understood to be referring) to the
> (in)famous "not intended to work with huge indexes" Solr RAMDirectory.
> 
> Let me be clear that that I am not interested in RAMDirectory.
> However, I would like to better understand the oft-recommended and
> currently-default MMapDirectory, and what the tradeoffs would be, when
> using a 64-bit linux server dedicated to this single solr instance,
> with plenty (more than 2x index size) of RAM, of storing the index
> files on SSDs versus on a ramfs mount.
> 
> I understand that using the default MMapDirectory will allow caching
> of the index in-memory, however, my understanding is that mmaped files
> are demand-paged (lazy evaluated), meaning that only after a block is
> read from disk will it be paged into memory - is this correct? is it
> actually block-by-block (page size by page size?) - any pointers to
> decent documentation on this regardless of the effectiveness of the
> approach would be appreciated...
> 
> My concern with using MMapDirectory for an index stored on disk (even
> SSDs), if my understanding is correct, is that there is still a large
> startup cost to MMapDirectory, as it may take many queries before even
> most of a 20G index has been loaded into memory, and there may yet
> still be "dark corners" that only come up in edge-case queries that
> cause QTime spikes should these queries ever occur.
> 
> I would like to ensure that, at startup, no query will incur
> disk-seek/read penalties.
> 
> Is the "right" way to achieve this to copy the index to a ramfs (NOT
> ramdisk) mount and then continue to use MMapDirectory in Solr to read
> the index? I am under the impression that when using ramfs (rather
> than ramdisk, for which this would not work) a file mmaped on a ramfs
> mount will actually share the same address space, and so would not
> incur the typical double-ram overhead of mmaping a file in memory just
> o have yet another copy of the file created in a second memory
> location. Is this correct? If not, would you please point me to
> documentation stating otherwise (I haven't found much documentation
> either way).
> 
> Finally, given the desire to be quick at startup with a large index
> that will still easily fit within a system's memory, am I thinking
> about this wrong or are there other better approaches?
> 
> Thanks, as always,
>     Aaron

Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

Reply via email to