Re: Two instances of solr - the same datadir?

Roman Chyla Tue, 02 Jul 2013 11:32:06 -0700

as i discovered, it is not good to use 'native' locktype in this scenario,
actually there is a note in the solrconfig.xml which says the same


when a core is reloaded and solr tries to grab lock, it will fail - even if
the instance is configured to be read-only, so i am using 'single' lock for
the readers and 'native' for the writer, which seems to work OK

roman


On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla <roman.ch...@gmail.com> wrote:

> I have auto commit after 40k RECs/1800secs. But I only tested with manual
> commit, but I don't see why it should work differently.
> Roman
> On 7 Jun 2013 20:52, "Tim Vaillancourt" <t...@elementspace.com> wrote:
>
>> If it makes you feel better, I also considered this approach when I was in
>> the same situation with a separate indexer and searcher on one Physical
>> linux machine.
>>
>> My main concern was "re-using" the FS cache between both instances - If I
>> replicated to myself there would be two independent copies of the index,
>> FS-cached separately.
>>
>> I like the suggestion of using autoCommit to reload the index. If I'm
>> reading that right, you'd set an autoCommit on 'zero docs changing', or
>> just 'every N seconds'? Did that work?
>>
>> Best of luck!
>>
>> Tim
>>
>>
>> On 5 June 2013 10:19, Roman Chyla <roman.ch...@gmail.com> wrote:
>>
>> > So here it is for a record how I am solving it right now:
>> >
>> > Write-master is started with: -Dmontysolr.warming.enabled=false
>> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
>> > http://localhost:5005
>> > Read-master is started with: -Dmontysolr.warming.enabled=true
>> > -Dmontysolr.write.master=false
>> >
>> >
>> > solrconfig.xml changes:
>> >
>> > 1. all index changing components have this bit,
>> > enable="${montysolr.master:true}" - ie.
>> >
>> > <updateHandler class="solr.DirectUpdateHandler2"
>> >                  enable="${montysolr.master:true}">
>> >
>> > 2. for cache warming de/activation
>> >
>> > <listener event="newSearcher"
>> >       class="solr.QuerySenderListener"
>> >       enable="${montysolr.enable.warming:true}">...
>> >
>> > 3. to trigger refresh of the read-only-master (from write-master):
>> >
>> >     <listener event="postCommit"
>> >       class="solr.RunExecutableListener"
>> >       enable="${montysolr.master:true}">
>> >       <str name="exe">curl</str>
>> >       <str name="dir">.</str>
>> >       <bool name="wait">false</bool>
>> >       <arr name="args"> <str>${montysolr.read.master:http://localhost
>> >
>> >
>> }/solr/admin/cores?wt=json&amp;action=RELOAD&amp;core=collection1</str></arr>
>> >     </listener>
>> >
>> > This works, I still don't like the reload of the whole core, but it
>> seems
>> > like the easiest thing to do now.
>> >
>> > -- roman
>> >
>> >
>> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla <roman.ch...@gmail.com>
>> > wrote:
>> >
>> > > Hi Peter,
>> > >
>> > > Thank you, I am glad to read that this usecase is not alien.
>> > >
>> > > I'd like to make the second instance (searcher) completely read-only,
>> so
>> > I
>> > > have disabled all the components that can write.
>> > >
>> > > (being lazy ;)) I'll probably use
>> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl
>> > after
>> > > commit, or write some IndexReaderFactory that checks for changes
>> > >
>> > > The problem with calling the 'core reload' - is that it seems lots of
>> > work
>> > > for just opening a new searcher, eeekkk...somewhere I read that it is
>> > cheap
>> > > to reload a core, but re-opening the index searches must be definitely
>> > > cheaper...
>> > >
>> > > roman
>> > >
>> > >
>> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <peter.stu...@gmail.com
>> > >wrote:
>> > >
>> > >> Hi,
>> > >> We use this very same scenario to great effect - 2 instances using
>> the
>> > >> same
>> > >> dataDir with many cores - 1 is a writer (no caching), the other is a
>> > >> searcher (lots of caching).
>> > >> To get the searcher to see the index changes from the writer, you
>> need
>> > the
>> > >> searcher to do an empty commit - i.e. you invoke a commit with 0
>> > >> documents.
>> > >> This will refresh the caches (including autowarming), [re]build the
>> > >> relevant searchers etc. and make any index changes visible to the RO
>> > >> instance.
>> > >> Also, make sure to use <lockType>native</lockType> in solrconfig.xml
>> to
>> > >> ensure the two instances don't try to commit at the same time.
>> > >> There are several ways to trigger a commit:
>> > >> Call commit() periodically within your own code.
>> > >> Use autoCommit in solrconfig.xml.
>> > >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
>> > >> searcher the index has changed, then call commit when called (more
>> > complex
>> > >> coding, but good if the index changes on an ad-hoc basis).
>> > >> Note, doing things this way isn't really suitable for an NRT
>> > environment.
>> > >>
>> > >> HTH,
>> > >> Peter
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > Replication is fine, I am going to use it, but I wanted it for
>> > instances
>> > >> > *distributed* across several (physical) machines - but here I have
>> one
>> > >> > physical machine, it has many cores. I want to run 2 instances of
>> solr
>> > >> > because I think it has these benefits:
>> > >> >
>> > >> > 1) I can give less RAM to the writer (4GB), and use more RAM for
>> the
>> > >> > searcher (28GB)
>> > >> > 2) I can deactivate warming for the writer and keep it for the
>> > searcher
>> > >> > (this considerably speeds up indexing - each time we commit, the
>> > server
>> > >> is
>> > >> > rebuilding a citation network of 80M edges)
>> > >> > 3) saving disk space and better OS caching (OS should be able to
>> use
>> > >> more
>> > >> > RAM for the caching, which should result in faster operations - the
>> > two
>> > >> > processes are accessing the same index)
>> > >> >
>> > >> > Maybe I should just forget it and go with the replication, but it
>> > >> doesn't
>> > >> > 'feel right' IFF it is on the same physical machine. And Lucene
>> > >> > specifically has a method for discovering changes and re-opening
>> the
>> > >> index
>> > >> > (DirectoryReader.openIfChanged)
>> > >> >
>> > >> > Am I not seeing something?
>> > >> >
>> > >> > roman
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
>> > >> > jhell...@innoventsolutions.com> wrote:
>> > >> >
>> > >> > > Roman,
>> > >> > >
>> > >> > > Could you be more specific as to why replication doesn't meet
>> your
>> > >> > > requirements?  It was geared explicitly for this purpose,
>> including
>> > >> the
>> > >> > > automatic discovery of changes to the data on the index master.
>> > >> > >
>> > >> > > Jason
>> > >> > >
>> > >> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com>
>> > >> wrote:
>> > >> > >
>> > >> > > > OK, so I have verified the two instances can run alongside,
>> > sharing
>> > >> the
>> > >> > > > same datadir
>> > >> > > >
>> > >> > > > All update handlers are unaccessible in the read-only master
>> > >> > > >
>> > >> > > > <updateHandler class="solr.DirectUpdateHandler2"
>> > >> > > >                 enable="${solr.can.write:true}">
>> > >> > > >
>> > >> > > > java -Dsolr.can.write=false .....
>> > >> > > >
>> > >> > > > And I can reload the index manually:
>> > >> > > >
>> > >> > > > curl "
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
>> > >> > > > "
>> > >> > > >
>> > >> > > > But this is not an ideal solution; I'd like for the read-only
>> > >> server to
>> > >> > > > discover index changes on its own. Any pointers?
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > >
>> > >> > > >  roman
>> > >> > > >
>> > >> > > >
>> > >> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <
>> > roman.ch...@gmail.com>
>> > >> > > wrote:
>> > >> > > >
>> > >> > > >> Hello,
>> > >> > > >>
>> > >> > > >> I need your expert advice. I am thinking about running two
>> > >> instances
>> > >> > of
>> > >> > > >> solr that share the same datadirectory. The *reason* being:
>> > >> indexing
>> > >> > > >> instance is constantly building cache after every commit (we
>> > have a
>> > >> > big
>> > >> > > >> cache) and this slows it down. But indexing doesn't need much
>> > RAM,
>> > >> > only
>> > >> > > the
>> > >> > > >> search does (and server has lots of CPUs)
>> > >> > > >>
>> > >> > > >> So, it is like having two solr instances
>> > >> > > >>
>> > >> > > >> 1. solr-indexing-master
>> > >> > > >> 2. solr-read-only-master
>> > >> > > >>
>> > >> > > >> In the solrconfig.xml I can disable update components, It
>> should
>> > be
>> > >> > > fine.
>> > >> > > >> However, I don't know how to 'trigger' index re-opening on (2)
>> > >> after
>> > >> > the
>> > >> > > >> commit happens on (1).
>> > >> > > >>
>> > >> > > >> Ideally, the second instance could monitor the disk and
>> re-open
>> > >> disk
>> > >> > > after
>> > >> > > >> new files appear there. Do I have to implement custom
>> > >> > > IndexReaderFactory?
>> > >> > > >> Or something else?
>> > >> > > >>
>> > >> > > >> Please note: I know about the replication, this usecase is
>> IMHO
>> > >> > slightly
>> > >> > > >> different - in fact, write-only-master (1) is also a
>> replication
>> > >> > master
>> > >> > > >>
>> > >> > > >> Googling turned out only this
>> > >> > > >>
>> > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912-
>> > >> > > no
>> > >> > > >> pointers there.
>> > >> > > >>
>> > >> > > >> But If I am approaching the problem wrongly, please don't
>> > hesitate
>> > >> to
>> > >> > > >> 're-educate' me :)
>> > >> > > >>
>> > >> > > >> Thanks!
>> > >> > > >>
>> > >> > > >>  roman
>> > >> > > >>
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>

Re: Two instances of solr - the same datadir?

Reply via email to