Re: In-memory collections?

Per Steffensen Wed, 07 Aug 2013 01:57:21 -0700

On 8/7/13 9:04 AM, Shawn Heisey wrote:

On 8/7/2013 12:13 AM, Per Steffensen wrote:

Is there a way I can configure Solrs so that it handles its shared
completely in memory? If yes, how? No writing to disk - neither
transactionlog nor lucene indices. Of course I accept that data is lost
if the Solr crash or is shut down.

The lucene index part can be done using RAMDirectoryFactory.  It's
generally not a good idea, though.  If you have enough RAM for that,
then you have enough RAM to fit your entire index into the OS disk
cache.  I don't think you can really do anything about the transaction
log being on disk, but I could be incorrect about that.


Relying on the OS disk cache and the default directory implementation
will usually give you equivalent or better query performance compared to
putting your index into JVM memory.  You won't need a massive Java heap
and the garbage collection problems that it creates.  A side bonus: you
don't lose your index when Solr shuts down.

If you have extremely heavy indexing, then RAMDirectoryFactory might
work better -- assuming you've got your GC heavily tuned.  A potentially
critical problem with RAMDirectoryFactory is that merging/optimizing
will require at least twice as much RAM as your total index size.

Here's a complete discussion about this:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

NB: That article was written for 3.x, when NRTCachingDirectoryFactory
(the default in 4.x) wasn't available.  The NRT factory *uses*
MMapDirectory.

Thanks,
Shawn

Thanks, Shawn

The thing is that this will be used for a small ever-changingcollection. In our system we load a lot of documents into a SolrCloudcluster. A lot of processes across numerous machines work in parallel onloading those documents. Those processes needs to coordinate (hold eachother back) from time to time and they do so by taking distributedlocks. Until now we have used the ZooKeeper cluster at hand for takingthose distributed locks, but the need for locks is so heavy that itcauses congestion in ZooKeeper, and ZooKeeper really cannot scale inthat area. We could use several ZooKeeper clusters, but we have decidedto use a "locking" collection in Solr instead - that will scale. You canimplement locking in Solr using versioning and optimistic locking. Sothis collection will at any time just contain the few locks (counted inmax a few hundreds) that are current "right now". Lots of locks will betaken, but each of them will only exist in a few ms before deletedagain. Therefore it will not take up a lot of memory, I guess?

Guess we will try RAMDirectoryFactory, and I will look into how we canavoid Solr transactionlog being written (to disk at least).


Regards, Per Steffensen

Re: In-memory collections?

Reply via email to