Was going to say the same thing. It's also usually a good idea to reduce paging (eg 0 swappiness in linux).
- Mark On Oct 24, 2012, at 9:36 PM, François Schiettecatte <fschietteca...@gmail.com> wrote: > Aaron > > The best way to make sure the index is cached by the OS is to just cat it on > startup: > > cat `find /path/to/solr/index` > /dev/null > > Just make sure your index is smaller than RAM otherwise data will be rotated > out. > > Memory mapping is built on the virtual memory system, and I suspect that > ramfs is too, so I doubt very much that copying your index to ramfs will help > at all. Sidebar - a while ago I did a bunch of testing copying indices to > shared memory (/dev/shm in this case) and there was no advantage compared to > just accessing indices on disc when using memory mapping once the system got > to a steady state. > > There has been a lot written about this topic on the list. Basically it come > down to using MMapDirectory (which is the default), make sure your index is > smaller than your RAM, and allocate just enough memory to the Java VM. That > last part requires some benchmarking because it is so workload dependent. > > Best regards > > François > > On Oct 24, 2012, at 8:29 PM, Aaron Daubman <daub...@gmail.com> wrote: > >> Greetings, >> >> Most times I've seen the topic of storing one's index in memory, it >> seems the asker was referring (or understood to be referring) to the >> (in)famous "not intended to work with huge indexes" Solr RAMDirectory. >> >> Let me be clear that that I am not interested in RAMDirectory. >> However, I would like to better understand the oft-recommended and >> currently-default MMapDirectory, and what the tradeoffs would be, when >> using a 64-bit linux server dedicated to this single solr instance, >> with plenty (more than 2x index size) of RAM, of storing the index >> files on SSDs versus on a ramfs mount. >> >> I understand that using the default MMapDirectory will allow caching >> of the index in-memory, however, my understanding is that mmaped files >> are demand-paged (lazy evaluated), meaning that only after a block is >> read from disk will it be paged into memory - is this correct? is it >> actually block-by-block (page size by page size?) - any pointers to >> decent documentation on this regardless of the effectiveness of the >> approach would be appreciated... >> >> My concern with using MMapDirectory for an index stored on disk (even >> SSDs), if my understanding is correct, is that there is still a large >> startup cost to MMapDirectory, as it may take many queries before even >> most of a 20G index has been loaded into memory, and there may yet >> still be "dark corners" that only come up in edge-case queries that >> cause QTime spikes should these queries ever occur. >> >> I would like to ensure that, at startup, no query will incur >> disk-seek/read penalties. >> >> Is the "right" way to achieve this to copy the index to a ramfs (NOT >> ramdisk) mount and then continue to use MMapDirectory in Solr to read >> the index? I am under the impression that when using ramfs (rather >> than ramdisk, for which this would not work) a file mmaped on a ramfs >> mount will actually share the same address space, and so would not >> incur the typical double-ram overhead of mmaping a file in memory just >> o have yet another copy of the file created in a second memory >> location. Is this correct? If not, would you please point me to >> documentation stating otherwise (I haven't found much documentation >> either way). >> >> Finally, given the desire to be quick at startup with a large index >> that will still easily fit within a system's memory, am I thinking >> about this wrong or are there other better approaches? >> >> Thanks, as always, >> Aaron >