Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

Mark Miller Wed, 24 Oct 2012 18:41:02 -0700

Was going to say the same thing. It's also usually a good idea to reduce paging 
(eg 0 swappiness in linux).


- Mark

On Oct 24, 2012, at 9:36 PM, François Schiettecatte <fschietteca...@gmail.com> 
wrote:

> Aaron
> 
> The best way to make sure the index is cached by the OS is to just cat it on 
> startup:
> 
>       cat `find /path/to/solr/index` > /dev/null
> 
> Just make sure your index is smaller than RAM otherwise data will be rotated 
> out.
> 
> Memory mapping is built on the virtual memory system, and I suspect that 
> ramfs is too, so I doubt very much that copying your index to ramfs will help 
> at all. Sidebar - a while ago I did a bunch of testing copying indices to 
> shared memory (/dev/shm in this case) and there was no advantage compared to 
> just accessing indices on disc when using memory mapping once the system got 
> to a steady state.
> 
> There has been a lot written about this topic on the list. Basically it come 
> down to using MMapDirectory (which is the default), make sure your index is 
> smaller than your RAM, and allocate just enough memory to the Java VM. That 
> last part requires some benchmarking because it is so workload dependent.
> 
> Best regards
> 
> François
> 
> On Oct 24, 2012, at 8:29 PM, Aaron Daubman <daub...@gmail.com> wrote:
> 
>> Greetings,
>> 
>> Most times I've seen the topic of storing one's index in memory, it
>> seems the asker was referring (or understood to be referring) to the
>> (in)famous "not intended to work with huge indexes" Solr RAMDirectory.
>> 
>> Let me be clear that that I am not interested in RAMDirectory.
>> However, I would like to better understand the oft-recommended and
>> currently-default MMapDirectory, and what the tradeoffs would be, when
>> using a 64-bit linux server dedicated to this single solr instance,
>> with plenty (more than 2x index size) of RAM, of storing the index
>> files on SSDs versus on a ramfs mount.
>> 
>> I understand that using the default MMapDirectory will allow caching
>> of the index in-memory, however, my understanding is that mmaped files
>> are demand-paged (lazy evaluated), meaning that only after a block is
>> read from disk will it be paged into memory - is this correct? is it
>> actually block-by-block (page size by page size?) - any pointers to
>> decent documentation on this regardless of the effectiveness of the
>> approach would be appreciated...
>> 
>> My concern with using MMapDirectory for an index stored on disk (even
>> SSDs), if my understanding is correct, is that there is still a large
>> startup cost to MMapDirectory, as it may take many queries before even
>> most of a 20G index has been loaded into memory, and there may yet
>> still be "dark corners" that only come up in edge-case queries that
>> cause QTime spikes should these queries ever occur.
>> 
>> I would like to ensure that, at startup, no query will incur
>> disk-seek/read penalties.
>> 
>> Is the "right" way to achieve this to copy the index to a ramfs (NOT
>> ramdisk) mount and then continue to use MMapDirectory in Solr to read
>> the index? I am under the impression that when using ramfs (rather
>> than ramdisk, for which this would not work) a file mmaped on a ramfs
>> mount will actually share the same address space, and so would not
>> incur the typical double-ram overhead of mmaping a file in memory just
>> o have yet another copy of the file created in a second memory
>> location. Is this correct? If not, would you please point me to
>> documentation stating otherwise (I haven't found much documentation
>> either way).
>> 
>> Finally, given the desire to be quick at startup with a large index
>> that will still easily fit within a system's memory, am I thinking
>> about this wrong or are there other better approaches?
>> 
>> Thanks, as always,
>>    Aaron
>

Re: MMapDirectory, demand paging, lazy evaluation, ramfs and the much maligned RAMDirectory (oh my!)

Reply via email to