as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note in the solrconfig.xml which says the same
when a core is reloaded and solr tries to grab lock, it will fail - even if the instance is configured to be read-only, so i am using 'single' lock for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > I have auto commit after 40k RECs/1800secs. But I only tested with manual > commit, but I don't see why it should work differently. > Roman > On 7 Jun 2013 20:52, "Tim Vaillancourt" <t...@elementspace.com> wrote: > >> If it makes you feel better, I also considered this approach when I was in >> the same situation with a separate indexer and searcher on one Physical >> linux machine. >> >> My main concern was "re-using" the FS cache between both instances - If I >> replicated to myself there would be two independent copies of the index, >> FS-cached separately. >> >> I like the suggestion of using autoCommit to reload the index. If I'm >> reading that right, you'd set an autoCommit on 'zero docs changing', or >> just 'every N seconds'? Did that work? >> >> Best of luck! >> >> Tim >> >> >> On 5 June 2013 10:19, Roman Chyla <roman.ch...@gmail.com> wrote: >> >> > So here it is for a record how I am solving it right now: >> > >> > Write-master is started with: -Dmontysolr.warming.enabled=false >> > -Dmontysolr.write.master=true -Dmontysolr.read.master= >> > http://localhost:5005 >> > Read-master is started with: -Dmontysolr.warming.enabled=true >> > -Dmontysolr.write.master=false >> > >> > >> > solrconfig.xml changes: >> > >> > 1. all index changing components have this bit, >> > enable="${montysolr.master:true}" - ie. >> > >> > <updateHandler class="solr.DirectUpdateHandler2" >> > enable="${montysolr.master:true}"> >> > >> > 2. for cache warming de/activation >> > >> > <listener event="newSearcher" >> > class="solr.QuerySenderListener" >> > enable="${montysolr.enable.warming:true}">... >> > >> > 3. to trigger refresh of the read-only-master (from write-master): >> > >> > <listener event="postCommit" >> > class="solr.RunExecutableListener" >> > enable="${montysolr.master:true}"> >> > <str name="exe">curl</str> >> > <str name="dir">.</str> >> > <bool name="wait">false</bool> >> > <arr name="args"> <str>${montysolr.read.master:http://localhost >> > >> > >> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1</str></arr> >> > </listener> >> > >> > This works, I still don't like the reload of the whole core, but it >> seems >> > like the easiest thing to do now. >> > >> > -- roman >> > >> > >> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla <roman.ch...@gmail.com> >> > wrote: >> > >> > > Hi Peter, >> > > >> > > Thank you, I am glad to read that this usecase is not alien. >> > > >> > > I'd like to make the second instance (searcher) completely read-only, >> so >> > I >> > > have disabled all the components that can write. >> > > >> > > (being lazy ;)) I'll probably use >> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl >> > after >> > > commit, or write some IndexReaderFactory that checks for changes >> > > >> > > The problem with calling the 'core reload' - is that it seems lots of >> > work >> > > for just opening a new searcher, eeekkk...somewhere I read that it is >> > cheap >> > > to reload a core, but re-opening the index searches must be definitely >> > > cheaper... >> > > >> > > roman >> > > >> > > >> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <peter.stu...@gmail.com >> > >wrote: >> > > >> > >> Hi, >> > >> We use this very same scenario to great effect - 2 instances using >> the >> > >> same >> > >> dataDir with many cores - 1 is a writer (no caching), the other is a >> > >> searcher (lots of caching). >> > >> To get the searcher to see the index changes from the writer, you >> need >> > the >> > >> searcher to do an empty commit - i.e. you invoke a commit with 0 >> > >> documents. >> > >> This will refresh the caches (including autowarming), [re]build the >> > >> relevant searchers etc. and make any index changes visible to the RO >> > >> instance. >> > >> Also, make sure to use <lockType>native</lockType> in solrconfig.xml >> to >> > >> ensure the two instances don't try to commit at the same time. >> > >> There are several ways to trigger a commit: >> > >> Call commit() periodically within your own code. >> > >> Use autoCommit in solrconfig.xml. >> > >> Use an RPC/IPC mechanism between the 2 instance processes to tell the >> > >> searcher the index has changed, then call commit when called (more >> > complex >> > >> coding, but good if the index changes on an ad-hoc basis). >> > >> Note, doing things this way isn't really suitable for an NRT >> > environment. >> > >> >> > >> HTH, >> > >> Peter >> > >> >> > >> >> > >> >> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com> >> > >> wrote: >> > >> >> > >> > Replication is fine, I am going to use it, but I wanted it for >> > instances >> > >> > *distributed* across several (physical) machines - but here I have >> one >> > >> > physical machine, it has many cores. I want to run 2 instances of >> solr >> > >> > because I think it has these benefits: >> > >> > >> > >> > 1) I can give less RAM to the writer (4GB), and use more RAM for >> the >> > >> > searcher (28GB) >> > >> > 2) I can deactivate warming for the writer and keep it for the >> > searcher >> > >> > (this considerably speeds up indexing - each time we commit, the >> > server >> > >> is >> > >> > rebuilding a citation network of 80M edges) >> > >> > 3) saving disk space and better OS caching (OS should be able to >> use >> > >> more >> > >> > RAM for the caching, which should result in faster operations - the >> > two >> > >> > processes are accessing the same index) >> > >> > >> > >> > Maybe I should just forget it and go with the replication, but it >> > >> doesn't >> > >> > 'feel right' IFF it is on the same physical machine. And Lucene >> > >> > specifically has a method for discovering changes and re-opening >> the >> > >> index >> > >> > (DirectoryReader.openIfChanged) >> > >> > >> > >> > Am I not seeing something? >> > >> > >> > >> > roman >> > >> > >> > >> > >> > >> > >> > >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman < >> > >> > jhell...@innoventsolutions.com> wrote: >> > >> > >> > >> > > Roman, >> > >> > > >> > >> > > Could you be more specific as to why replication doesn't meet >> your >> > >> > > requirements? It was geared explicitly for this purpose, >> including >> > >> the >> > >> > > automatic discovery of changes to the data on the index master. >> > >> > > >> > >> > > Jason >> > >> > > >> > >> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com> >> > >> wrote: >> > >> > > >> > >> > > > OK, so I have verified the two instances can run alongside, >> > sharing >> > >> the >> > >> > > > same datadir >> > >> > > > >> > >> > > > All update handlers are unaccessible in the read-only master >> > >> > > > >> > >> > > > <updateHandler class="solr.DirectUpdateHandler2" >> > >> > > > enable="${solr.can.write:true}"> >> > >> > > > >> > >> > > > java -Dsolr.can.write=false ..... >> > >> > > > >> > >> > > > And I can reload the index manually: >> > >> > > > >> > >> > > > curl " >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1 >> > >> > > > " >> > >> > > > >> > >> > > > But this is not an ideal solution; I'd like for the read-only >> > >> server to >> > >> > > > discover index changes on its own. Any pointers? >> > >> > > > >> > >> > > > Thanks, >> > >> > > > >> > >> > > > roman >> > >> > > > >> > >> > > > >> > >> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla < >> > roman.ch...@gmail.com> >> > >> > > wrote: >> > >> > > > >> > >> > > >> Hello, >> > >> > > >> >> > >> > > >> I need your expert advice. I am thinking about running two >> > >> instances >> > >> > of >> > >> > > >> solr that share the same datadirectory. The *reason* being: >> > >> indexing >> > >> > > >> instance is constantly building cache after every commit (we >> > have a >> > >> > big >> > >> > > >> cache) and this slows it down. But indexing doesn't need much >> > RAM, >> > >> > only >> > >> > > the >> > >> > > >> search does (and server has lots of CPUs) >> > >> > > >> >> > >> > > >> So, it is like having two solr instances >> > >> > > >> >> > >> > > >> 1. solr-indexing-master >> > >> > > >> 2. solr-read-only-master >> > >> > > >> >> > >> > > >> In the solrconfig.xml I can disable update components, It >> should >> > be >> > >> > > fine. >> > >> > > >> However, I don't know how to 'trigger' index re-opening on (2) >> > >> after >> > >> > the >> > >> > > >> commit happens on (1). >> > >> > > >> >> > >> > > >> Ideally, the second instance could monitor the disk and >> re-open >> > >> disk >> > >> > > after >> > >> > > >> new files appear there. Do I have to implement custom >> > >> > > IndexReaderFactory? >> > >> > > >> Or something else? >> > >> > > >> >> > >> > > >> Please note: I know about the replication, this usecase is >> IMHO >> > >> > slightly >> > >> > > >> different - in fact, write-only-master (1) is also a >> replication >> > >> > master >> > >> > > >> >> > >> > > >> Googling turned out only this >> > >> > > >> >> > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912- >> > >> > > no >> > >> > > >> pointers there. >> > >> > > >> >> > >> > > >> But If I am approaching the problem wrongly, please don't >> > hesitate >> > >> to >> > >> > > >> 're-educate' me :) >> > >> > > >> >> > >> > > >> Thanks! >> > >> > > >> >> > >> > > >> roman >> > >> > > >> >> > >> > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> >