Re: Two instances of solr - the same datadir?

Peter Sturge Tue, 02 Jul 2013 11:55:12 -0700

Hmmm, single lock sounds dangerous. It probably works ok because you've
been [un]lucky.
For example, even with a RO instance, you still need to do a commit in
order to reload caches/changes from the other instance.
What happens if this commit gets called in the middle of the other
instance's commit? I've not tested this scenario, but it's very possible
with a 'single' lock the results are indeterminate.
If the 'single' lock mechanism is making assumptions e.g. no other process
will interfere, and then one does, the Lucene index could very well get
corrupted.


For the error you're seeing using 'native', we use native lockType for both
write and RO instances, and it works fine - no contention.
Which version of Solr are you using? Perhaps there's been a change in
behaviour?

Peter


On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla <roman.ch...@gmail.com> wrote:

> as i discovered, it is not good to use 'native' locktype in this scenario,
> actually there is a note in the solrconfig.xml which says the same
>
> when a core is reloaded and solr tries to grab lock, it will fail - even if
> the instance is configured to be read-only, so i am using 'single' lock for
> the readers and 'native' for the writer, which seems to work OK
>
> roman
>
>
> On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla <roman.ch...@gmail.com> wrote:
>
> > I have auto commit after 40k RECs/1800secs. But I only tested with manual
> > commit, but I don't see why it should work differently.
> > Roman
> > On 7 Jun 2013 20:52, "Tim Vaillancourt" <t...@elementspace.com> wrote:
> >
> >> If it makes you feel better, I also considered this approach when I was
> in
> >> the same situation with a separate indexer and searcher on one Physical
> >> linux machine.
> >>
> >> My main concern was "re-using" the FS cache between both instances - If
> I
> >> replicated to myself there would be two independent copies of the index,
> >> FS-cached separately.
> >>
> >> I like the suggestion of using autoCommit to reload the index. If I'm
> >> reading that right, you'd set an autoCommit on 'zero docs changing', or
> >> just 'every N seconds'? Did that work?
> >>
> >> Best of luck!
> >>
> >> Tim
> >>
> >>
> >> On 5 June 2013 10:19, Roman Chyla <roman.ch...@gmail.com> wrote:
> >>
> >> > So here it is for a record how I am solving it right now:
> >> >
> >> > Write-master is started with: -Dmontysolr.warming.enabled=false
> >> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> >> > http://localhost:5005
> >> > Read-master is started with: -Dmontysolr.warming.enabled=true
> >> > -Dmontysolr.write.master=false
> >> >
> >> >
> >> > solrconfig.xml changes:
> >> >
> >> > 1. all index changing components have this bit,
> >> > enable="${montysolr.master:true}" - ie.
> >> >
> >> > <updateHandler class="solr.DirectUpdateHandler2"
> >> >                  enable="${montysolr.master:true}">
> >> >
> >> > 2. for cache warming de/activation
> >> >
> >> > <listener event="newSearcher"
> >> >       class="solr.QuerySenderListener"
> >> >       enable="${montysolr.enable.warming:true}">...
> >> >
> >> > 3. to trigger refresh of the read-only-master (from write-master):
> >> >
> >> >     <listener event="postCommit"
> >> >       class="solr.RunExecutableListener"
> >> >       enable="${montysolr.master:true}">
> >> >       <str name="exe">curl</str>
> >> >       <str name="dir">.</str>
> >> >       <bool name="wait">false</bool>
> >> >       <arr name="args"> <str>${montysolr.read.master:http://localhost
> >> >
> >> >
> >>
> }/solr/admin/cores?wt=json&amp;action=RELOAD&amp;core=collection1</str></arr>
> >> >     </listener>
> >> >
> >> > This works, I still don't like the reload of the whole core, but it
> >> seems
> >> > like the easiest thing to do now.
> >> >
> >> > -- roman
> >> >
> >> >
> >> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla <roman.ch...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Peter,
> >> > >
> >> > > Thank you, I am glad to read that this usecase is not alien.
> >> > >
> >> > > I'd like to make the second instance (searcher) completely
> read-only,
> >> so
> >> > I
> >> > > have disabled all the components that can write.
> >> > >
> >> > > (being lazy ;)) I'll probably use
> >> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> >> > after
> >> > > commit, or write some IndexReaderFactory that checks for changes
> >> > >
> >> > > The problem with calling the 'core reload' - is that it seems lots
> of
> >> > work
> >> > > for just opening a new searcher, eeekkk...somewhere I read that it
> is
> >> > cheap
> >> > > to reload a core, but re-opening the index searches must be
> definitely
> >> > > cheaper...
> >> > >
> >> > > roman
> >> > >
> >> > >
> >> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <
> peter.stu...@gmail.com
> >> > >wrote:
> >> > >
> >> > >> Hi,
> >> > >> We use this very same scenario to great effect - 2 instances using
> >> the
> >> > >> same
> >> > >> dataDir with many cores - 1 is a writer (no caching), the other is
> a
> >> > >> searcher (lots of caching).
> >> > >> To get the searcher to see the index changes from the writer, you
> >> need
> >> > the
> >> > >> searcher to do an empty commit - i.e. you invoke a commit with 0
> >> > >> documents.
> >> > >> This will refresh the caches (including autowarming), [re]build the
> >> > >> relevant searchers etc. and make any index changes visible to the
> RO
> >> > >> instance.
> >> > >> Also, make sure to use <lockType>native</lockType> in
> solrconfig.xml
> >> to
> >> > >> ensure the two instances don't try to commit at the same time.
> >> > >> There are several ways to trigger a commit:
> >> > >> Call commit() periodically within your own code.
> >> > >> Use autoCommit in solrconfig.xml.
> >> > >> Use an RPC/IPC mechanism between the 2 instance processes to tell
> the
> >> > >> searcher the index has changed, then call commit when called (more
> >> > complex
> >> > >> coding, but good if the index changes on an ad-hoc basis).
> >> > >> Note, doing things this way isn't really suitable for an NRT
> >> > environment.
> >> > >>
> >> > >> HTH,
> >> > >> Peter
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <
> roman.ch...@gmail.com>
> >> > >> wrote:
> >> > >>
> >> > >> > Replication is fine, I am going to use it, but I wanted it for
> >> > instances
> >> > >> > *distributed* across several (physical) machines - but here I
> have
> >> one
> >> > >> > physical machine, it has many cores. I want to run 2 instances of
> >> solr
> >> > >> > because I think it has these benefits:
> >> > >> >
> >> > >> > 1) I can give less RAM to the writer (4GB), and use more RAM for
> >> the
> >> > >> > searcher (28GB)
> >> > >> > 2) I can deactivate warming for the writer and keep it for the
> >> > searcher
> >> > >> > (this considerably speeds up indexing - each time we commit, the
> >> > server
> >> > >> is
> >> > >> > rebuilding a citation network of 80M edges)
> >> > >> > 3) saving disk space and better OS caching (OS should be able to
> >> use
> >> > >> more
> >> > >> > RAM for the caching, which should result in faster operations -
> the
> >> > two
> >> > >> > processes are accessing the same index)
> >> > >> >
> >> > >> > Maybe I should just forget it and go with the replication, but it
> >> > >> doesn't
> >> > >> > 'feel right' IFF it is on the same physical machine. And Lucene
> >> > >> > specifically has a method for discovering changes and re-opening
> >> the
> >> > >> index
> >> > >> > (DirectoryReader.openIfChanged)
> >> > >> >
> >> > >> > Am I not seeing something?
> >> > >> >
> >> > >> > roman
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> >> > >> > jhell...@innoventsolutions.com> wrote:
> >> > >> >
> >> > >> > > Roman,
> >> > >> > >
> >> > >> > > Could you be more specific as to why replication doesn't meet
> >> your
> >> > >> > > requirements?  It was geared explicitly for this purpose,
> >> including
> >> > >> the
> >> > >> > > automatic discovery of changes to the data on the index master.
> >> > >> > >
> >> > >> > > Jason
> >> > >> > >
> >> > >> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com
> >
> >> > >> wrote:
> >> > >> > >
> >> > >> > > > OK, so I have verified the two instances can run alongside,
> >> > sharing
> >> > >> the
> >> > >> > > > same datadir
> >> > >> > > >
> >> > >> > > > All update handlers are unaccessible in the read-only master
> >> > >> > > >
> >> > >> > > > <updateHandler class="solr.DirectUpdateHandler2"
> >> > >> > > >                 enable="${solr.can.write:true}">
> >> > >> > > >
> >> > >> > > > java -Dsolr.can.write=false .....
> >> > >> > > >
> >> > >> > > > And I can reload the index manually:
> >> > >> > > >
> >> > >> > > > curl "
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> >> > >> > > > "
> >> > >> > > >
> >> > >> > > > But this is not an ideal solution; I'd like for the read-only
> >> > >> server to
> >> > >> > > > discover index changes on its own. Any pointers?
> >> > >> > > >
> >> > >> > > > Thanks,
> >> > >> > > >
> >> > >> > > >  roman
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <
> >> > roman.ch...@gmail.com>
> >> > >> > > wrote:
> >> > >> > > >
> >> > >> > > >> Hello,
> >> > >> > > >>
> >> > >> > > >> I need your expert advice. I am thinking about running two
> >> > >> instances
> >> > >> > of
> >> > >> > > >> solr that share the same datadirectory. The *reason* being:
> >> > >> indexing
> >> > >> > > >> instance is constantly building cache after every commit (we
> >> > have a
> >> > >> > big
> >> > >> > > >> cache) and this slows it down. But indexing doesn't need
> much
> >> > RAM,
> >> > >> > only
> >> > >> > > the
> >> > >> > > >> search does (and server has lots of CPUs)
> >> > >> > > >>
> >> > >> > > >> So, it is like having two solr instances
> >> > >> > > >>
> >> > >> > > >> 1. solr-indexing-master
> >> > >> > > >> 2. solr-read-only-master
> >> > >> > > >>
> >> > >> > > >> In the solrconfig.xml I can disable update components, It
> >> should
> >> > be
> >> > >> > > fine.
> >> > >> > > >> However, I don't know how to 'trigger' index re-opening on
> (2)
> >> > >> after
> >> > >> > the
> >> > >> > > >> commit happens on (1).
> >> > >> > > >>
> >> > >> > > >> Ideally, the second instance could monitor the disk and
> >> re-open
> >> > >> disk
> >> > >> > > after
> >> > >> > > >> new files appear there. Do I have to implement custom
> >> > >> > > IndexReaderFactory?
> >> > >> > > >> Or something else?
> >> > >> > > >>
> >> > >> > > >> Please note: I know about the replication, this usecase is
> >> IMHO
> >> > >> > slightly
> >> > >> > > >> different - in fact, write-only-master (1) is also a
> >> replication
> >> > >> > master
> >> > >> > > >>
> >> > >> > > >> Googling turned out only this
> >> > >> > > >>
> >> > >>
> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912-
> >> > >> > > no
> >> > >> > > >> pointers there.
> >> > >> > > >>
> >> > >> > > >> But If I am approaching the problem wrongly, please don't
> >> > hesitate
> >> > >> to
> >> > >> > > >> 're-educate' me :)
> >> > >> > > >>
> >> > >> > > >> Thanks!
> >> > >> > > >>
> >> > >> > > >>  roman
> >> > >> > > >>
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
>

Re: Two instances of solr - the same datadir?

Reply via email to