You may well have already seen this, but in case not:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

FWIW,
Erick

On Wed, Oct 24, 2012 at 9:51 PM, Shawn Heisey <s...@elyograg.org> wrote:
> On 10/24/2012 6:29 PM, Aaron Daubman wrote:
>>
>> Let me be clear that that I am not interested in RAMDirectory.
>> However, I would like to better understand the oft-recommended and
>> currently-default MMapDirectory, and what the tradeoffs would be, when
>> using a 64-bit linux server dedicated to this single solr instance,
>> with plenty (more than 2x index size) of RAM, of storing the index
>> files on SSDs versus on a ramfs mount.
>>
>> I understand that using the default MMapDirectory will allow caching
>> of the index in-memory, however, my understanding is that mmaped files
>> are demand-paged (lazy evaluated), meaning that only after a block is
>> read from disk will it be paged into memory - is this correct? is it
>> actually block-by-block (page size by page size?) - any pointers to
>> decent documentation on this regardless of the effectiveness of the
>> approach would be appreciated...
>
>
> You are correct that the data must have just been accessed to be in the disk
> cache.This does however include writes -- so any data that gets indexed will
> be in the cache because it has just been written.  I do believe that it is
> read in one page block at a time, and I believe that the blocks are 4k in
> size.
>
>
>> My concern with using MMapDirectory for an index stored on disk (even
>> SSDs), if my understanding is correct, is that there is still a large
>> startup cost to MMapDirectory, as it may take many queries before even
>> most of a 20G index has been loaded into memory, and there may yet
>> still be "dark corners" that only come up in edge-case queries that
>> cause QTime spikes should these queries ever occur.
>>
>> I would like to ensure that, at startup, no query will incur
>> disk-seek/read penalties.
>>
>> Is the "right" way to achieve this to copy the index to a ramfs (NOT
>> ramdisk) mount and then continue to use MMapDirectory in Solr to read
>> the index? I am under the impression that when using ramfs (rather
>> than ramdisk, for which this would not work) a file mmaped on a ramfs
>> mount will actually share the same address space, and so would not
>> incur the typical double-ram overhead of mmaping a file in memory just
>> o have yet another copy of the file created in a second memory
>> location. Is this correct? If not, would you please point me to
>> documentation stating otherwise (I haven't found much documentation
>> either way).
>
>
> I am not familiar with any "double-ram overhead" from using mmap.  It should
> be extroardinarily efficient, so much so that even when your index won't fit
> in RAM, performance is typically still excellent.  Using an SSD instead of a
> spinning disk will increase performance across the board, until enough of
> the index is cached in RAM, after which it won't make a lot of difference.
>
> My parting thoughts, with a general note to the masses: Do not try this if
> you are not absolutely sure your index will fit in memory!  It will tend to
> cause WAY more problems than it will solve for most people with large
> indexes.
>
> If you actually do have considerably more RAM than your index size, and you
> know that the index will never grow to where it might not fit, you can use a
> simple trick to get it all cached, even before running queries.  Just read
> the entire contents of the index, discarding everything you read.  There are
> two main OS variants to consider here, and both can be scripted, as noted
> below.  Run the command twice to see the difference that caching makes for
> the second run.  Note that an SSD would speed the first run of these
> commands up considerably:
>
> *NIX (may work on a mac too):
> cat /path/to/index/files/* > /dev/null
>
> Windows:
> type C:\Path\To\Index\Files\* > NUL
>
> Thanks,
> Shawn
>

Reply via email to