On 8/7/13 9:04 AM, Shawn Heisey wrote:
On 8/7/2013 12:13 AM, Per Steffensen wrote:
Is there a way I can configure Solrs so that it handles its shared
completely in memory? If yes, how? No writing to disk - neither
transactionlog nor lucene indices. Of course I accept that data is lost
if the Solr crash or is shut down.
The lucene index part can be done using RAMDirectoryFactory. It's
generally not a good idea, though. If you have enough RAM for that,
then you have enough RAM to fit your entire index into the OS disk
cache. I don't think you can really do anything about the transaction
log being on disk, but I could be incorrect about that.
Relying on the OS disk cache and the default directory implementation
will usually give you equivalent or better query performance compared to
putting your index into JVM memory. You won't need a massive Java heap
and the garbage collection problems that it creates. A side bonus: you
don't lose your index when Solr shuts down.
If you have extremely heavy indexing, then RAMDirectoryFactory might
work better -- assuming you've got your GC heavily tuned. A potentially
critical problem with RAMDirectoryFactory is that merging/optimizing
will require at least twice as much RAM as your total index size.
Here's a complete discussion about this:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
NB: That article was written for 3.x, when NRTCachingDirectoryFactory
(the default in 4.x) wasn't available. The NRT factory *uses*
MMapDirectory.
Thanks,
Shawn
Thanks, Shawn
The thing is that this will be used for a small ever-changing
collection. In our system we load a lot of documents into a SolrCloud
cluster. A lot of processes across numerous machines work in parallel on
loading those documents. Those processes needs to coordinate (hold each
other back) from time to time and they do so by taking distributed
locks. Until now we have used the ZooKeeper cluster at hand for taking
those distributed locks, but the need for locks is so heavy that it
causes congestion in ZooKeeper, and ZooKeeper really cannot scale in
that area. We could use several ZooKeeper clusters, but we have decided
to use a "locking" collection in Solr instead - that will scale. You can
implement locking in Solr using versioning and optimistic locking. So
this collection will at any time just contain the few locks (counted in
max a few hundreds) that are current "right now". Lots of locks will be
taken, but each of them will only exist in a few ms before deleted
again. Therefore it will not take up a lot of memory, I guess?
Guess we will try RAMDirectoryFactory, and I will look into how we can
avoid Solr transactionlog being written (to disk at least).
Regards, Per Steffensen