On 8/7/13 9:04 AM, Shawn Heisey wrote:
On 8/7/2013 12:13 AM, Per Steffensen wrote:
Is there a way I can configure Solrs so that it handles its shared
completely in memory? If yes, how? No writing to disk - neither
transactionlog nor lucene indices. Of course I accept that data is lost
if the Solr crash or is shut down.
The lucene index part can be done using RAMDirectoryFactory.  It's
generally not a good idea, though.  If you have enough RAM for that,
then you have enough RAM to fit your entire index into the OS disk
cache.  I don't think you can really do anything about the transaction
log being on disk, but I could be incorrect about that.

Relying on the OS disk cache and the default directory implementation
will usually give you equivalent or better query performance compared to
putting your index into JVM memory.  You won't need a massive Java heap
and the garbage collection problems that it creates.  A side bonus: you
don't lose your index when Solr shuts down.

If you have extremely heavy indexing, then RAMDirectoryFactory might
work better -- assuming you've got your GC heavily tuned.  A potentially
critical problem with RAMDirectoryFactory is that merging/optimizing
will require at least twice as much RAM as your total index size.

Here's a complete discussion about this:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

NB: That article was written for 3.x, when NRTCachingDirectoryFactory
(the default in 4.x) wasn't available.  The NRT factory *uses*
MMapDirectory.

Thanks,
Shawn


Thanks, Shawn

The thing is that this will be used for a small ever-changing collection. In our system we load a lot of documents into a SolrCloud cluster. A lot of processes across numerous machines work in parallel on loading those documents. Those processes needs to coordinate (hold each other back) from time to time and they do so by taking distributed locks. Until now we have used the ZooKeeper cluster at hand for taking those distributed locks, but the need for locks is so heavy that it causes congestion in ZooKeeper, and ZooKeeper really cannot scale in that area. We could use several ZooKeeper clusters, but we have decided to use a "locking" collection in Solr instead - that will scale. You can implement locking in Solr using versioning and optimistic locking. So this collection will at any time just contain the few locks (counted in max a few hundreds) that are current "right now". Lots of locks will be taken, but each of them will only exist in a few ms before deleted again. Therefore it will not take up a lot of memory, I guess?

Guess we will try RAMDirectoryFactory, and I will look into how we can avoid Solr transactionlog being written (to disk at least).

Regards, Per Steffensen

Reply via email to