Re: Two instances of solr - the same datadir?

Peter Sturge Tue, 02 Jul 2013 14:07:18 -0700

The RO instance commit isn't (or shouldn't be) doing any real writing, just
an empty commit to force new searchers, autowarm/refresh caches etc.
Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in
this area.
As long as you don't have autocommit in solrconfig.xml, there wouldn't be
any commits 'behind the scenes' (we do all our commits via a local solrj
client so it can be fully managed).
The only caveat might be NRT/soft commits, but I'm not too familiar with
this in 4.0.
In any case, your RO instance must be getting updated somehow, otherwise
how would it know your write instance made any changes?
Perhaps your write instance notifies the RO instance externally from Solr?
(a perfectly valid approach, and one that would allow a 'single' lock to
work without contention)




On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla <roman.ch...@gmail.com> wrote:

> Interesting, we are running 4.0 - and solr will refuse the start (or
> reload) the core. But from looking at the code I am not seeing it is doing
> any writing - but I should digg more...
>
> Are you sure it needs to do writing? Because I am not calling commits, in
> fact I have deactivated *all* components that write into index, so unless
> there is something deep inside, which automatically calls the commit, it
> should never happen.
>
> roman
>
>
> On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge <peter.stu...@gmail.com>
> wrote:
>
> > Hmmm, single lock sounds dangerous. It probably works ok because you've
> > been [un]lucky.
> > For example, even with a RO instance, you still need to do a commit in
> > order to reload caches/changes from the other instance.
> > What happens if this commit gets called in the middle of the other
> > instance's commit? I've not tested this scenario, but it's very possible
> > with a 'single' lock the results are indeterminate.
> > If the 'single' lock mechanism is making assumptions e.g. no other
> process
> > will interfere, and then one does, the Lucene index could very well get
> > corrupted.
> >
> > For the error you're seeing using 'native', we use native lockType for
> both
> > write and RO instances, and it works fine - no contention.
> > Which version of Solr are you using? Perhaps there's been a change in
> > behaviour?
> >
> > Peter
> >
> >
> > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla <roman.ch...@gmail.com>
> wrote:
> >
> > > as i discovered, it is not good to use 'native' locktype in this
> > scenario,
> > > actually there is a note in the solrconfig.xml which says the same
> > >
> > > when a core is reloaded and solr tries to grab lock, it will fail -
> even
> > if
> > > the instance is configured to be read-only, so i am using 'single' lock
> > for
> > > the readers and 'native' for the writer, which seems to work OK
> > >
> > > roman
> > >
> > >
> > > On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla <roman.ch...@gmail.com>
> > wrote:
> > >
> > > > I have auto commit after 40k RECs/1800secs. But I only tested with
> > manual
> > > > commit, but I don't see why it should work differently.
> > > > Roman
> > > > On 7 Jun 2013 20:52, "Tim Vaillancourt" <t...@elementspace.com>
> wrote:
> > > >
> > > >> If it makes you feel better, I also considered this approach when I
> > was
> > > in
> > > >> the same situation with a separate indexer and searcher on one
> > Physical
> > > >> linux machine.
> > > >>
> > > >> My main concern was "re-using" the FS cache between both instances -
> > If
> > > I
> > > >> replicated to myself there would be two independent copies of the
> > index,
> > > >> FS-cached separately.
> > > >>
> > > >> I like the suggestion of using autoCommit to reload the index. If
> I'm
> > > >> reading that right, you'd set an autoCommit on 'zero docs changing',
> > or
> > > >> just 'every N seconds'? Did that work?
> > > >>
> > > >> Best of luck!
> > > >>
> > > >> Tim
> > > >>
> > > >>
> > > >> On 5 June 2013 10:19, Roman Chyla <roman.ch...@gmail.com> wrote:
> > > >>
> > > >> > So here it is for a record how I am solving it right now:
> > > >> >
> > > >> > Write-master is started with: -Dmontysolr.warming.enabled=false
> > > >> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> > > >> > http://localhost:5005
> > > >> > Read-master is started with: -Dmontysolr.warming.enabled=true
> > > >> > -Dmontysolr.write.master=false
> > > >> >
> > > >> >
> > > >> > solrconfig.xml changes:
> > > >> >
> > > >> > 1. all index changing components have this bit,
> > > >> > enable="${montysolr.master:true}" - ie.
> > > >> >
> > > >> > <updateHandler class="solr.DirectUpdateHandler2"
> > > >> >                  enable="${montysolr.master:true}">
> > > >> >
> > > >> > 2. for cache warming de/activation
> > > >> >
> > > >> > <listener event="newSearcher"
> > > >> >       class="solr.QuerySenderListener"
> > > >> >       enable="${montysolr.enable.warming:true}">...
> > > >> >
> > > >> > 3. to trigger refresh of the read-only-master (from write-master):
> > > >> >
> > > >> >     <listener event="postCommit"
> > > >> >       class="solr.RunExecutableListener"
> > > >> >       enable="${montysolr.master:true}">
> > > >> >       <str name="exe">curl</str>
> > > >> >       <str name="dir">.</str>
> > > >> >       <bool name="wait">false</bool>
> > > >> >       <arr name="args"> <str>${montysolr.read.master:
> > http://localhost
> > > >> >
> > > >> >
> > > >>
> > >
> >
> }/solr/admin/cores?wt=json&amp;action=RELOAD&amp;core=collection1</str></arr>
> > > >> >     </listener>
> > > >> >
> > > >> > This works, I still don't like the reload of the whole core, but
> it
> > > >> seems
> > > >> > like the easiest thing to do now.
> > > >> >
> > > >> > -- roman
> > > >> >
> > > >> >
> > > >> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla <
> roman.ch...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Peter,
> > > >> > >
> > > >> > > Thank you, I am glad to read that this usecase is not alien.
> > > >> > >
> > > >> > > I'd like to make the second instance (searcher) completely
> > > read-only,
> > > >> so
> > > >> > I
> > > >> > > have disabled all the components that can write.
> > > >> > >
> > > >> > > (being lazy ;)) I'll probably use
> > > >> > > http://wiki.apache.org/solr/CollectionDistribution to call the
> > curl
> > > >> > after
> > > >> > > commit, or write some IndexReaderFactory that checks for changes
> > > >> > >
> > > >> > > The problem with calling the 'core reload' - is that it seems
> lots
> > > of
> > > >> > work
> > > >> > > for just opening a new searcher, eeekkk...somewhere I read that
> it
> > > is
> > > >> > cheap
> > > >> > > to reload a core, but re-opening the index searches must be
> > > definitely
> > > >> > > cheaper...
> > > >> > >
> > > >> > > roman
> > > >> > >
> > > >> > >
> > > >> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <
> > > peter.stu...@gmail.com
> > > >> > >wrote:
> > > >> > >
> > > >> > >> Hi,
> > > >> > >> We use this very same scenario to great effect - 2 instances
> > using
> > > >> the
> > > >> > >> same
> > > >> > >> dataDir with many cores - 1 is a writer (no caching), the other
> > is
> > > a
> > > >> > >> searcher (lots of caching).
> > > >> > >> To get the searcher to see the index changes from the writer,
> you
> > > >> need
> > > >> > the
> > > >> > >> searcher to do an empty commit - i.e. you invoke a commit with
> 0
> > > >> > >> documents.
> > > >> > >> This will refresh the caches (including autowarming), [re]build
> > the
> > > >> > >> relevant searchers etc. and make any index changes visible to
> the
> > > RO
> > > >> > >> instance.
> > > >> > >> Also, make sure to use <lockType>native</lockType> in
> > > solrconfig.xml
> > > >> to
> > > >> > >> ensure the two instances don't try to commit at the same time.
> > > >> > >> There are several ways to trigger a commit:
> > > >> > >> Call commit() periodically within your own code.
> > > >> > >> Use autoCommit in solrconfig.xml.
> > > >> > >> Use an RPC/IPC mechanism between the 2 instance processes to
> tell
> > > the
> > > >> > >> searcher the index has changed, then call commit when called
> > (more
> > > >> > complex
> > > >> > >> coding, but good if the index changes on an ad-hoc basis).
> > > >> > >> Note, doing things this way isn't really suitable for an NRT
> > > >> > environment.
> > > >> > >>
> > > >> > >> HTH,
> > > >> > >> Peter
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <
> > > roman.ch...@gmail.com>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >> > Replication is fine, I am going to use it, but I wanted it
> for
> > > >> > instances
> > > >> > >> > *distributed* across several (physical) machines - but here I
> > > have
> > > >> one
> > > >> > >> > physical machine, it has many cores. I want to run 2
> instances
> > of
> > > >> solr
> > > >> > >> > because I think it has these benefits:
> > > >> > >> >
> > > >> > >> > 1) I can give less RAM to the writer (4GB), and use more RAM
> > for
> > > >> the
> > > >> > >> > searcher (28GB)
> > > >> > >> > 2) I can deactivate warming for the writer and keep it for
> the
> > > >> > searcher
> > > >> > >> > (this considerably speeds up indexing - each time we commit,
> > the
> > > >> > server
> > > >> > >> is
> > > >> > >> > rebuilding a citation network of 80M edges)
> > > >> > >> > 3) saving disk space and better OS caching (OS should be able
> > to
> > > >> use
> > > >> > >> more
> > > >> > >> > RAM for the caching, which should result in faster
> operations -
> > > the
> > > >> > two
> > > >> > >> > processes are accessing the same index)
> > > >> > >> >
> > > >> > >> > Maybe I should just forget it and go with the replication,
> but
> > it
> > > >> > >> doesn't
> > > >> > >> > 'feel right' IFF it is on the same physical machine. And
> Lucene
> > > >> > >> > specifically has a method for discovering changes and
> > re-opening
> > > >> the
> > > >> > >> index
> > > >> > >> > (DirectoryReader.openIfChanged)
> > > >> > >> >
> > > >> > >> > Am I not seeing something?
> > > >> > >> >
> > > >> > >> > roman
> > > >> > >> >
> > > >> > >> >
> > > >> > >> >
> > > >> > >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> > > >> > >> > jhell...@innoventsolutions.com> wrote:
> > > >> > >> >
> > > >> > >> > > Roman,
> > > >> > >> > >
> > > >> > >> > > Could you be more specific as to why replication doesn't
> meet
> > > >> your
> > > >> > >> > > requirements?  It was geared explicitly for this purpose,
> > > >> including
> > > >> > >> the
> > > >> > >> > > automatic discovery of changes to the data on the index
> > master.
> > > >> > >> > >
> > > >> > >> > > Jason
> > > >> > >> > >
> > > >> > >> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <
> > roman.ch...@gmail.com
> > > >
> > > >> > >> wrote:
> > > >> > >> > >
> > > >> > >> > > > OK, so I have verified the two instances can run
> alongside,
> > > >> > sharing
> > > >> > >> the
> > > >> > >> > > > same datadir
> > > >> > >> > > >
> > > >> > >> > > > All update handlers are unaccessible in the read-only
> > master
> > > >> > >> > > >
> > > >> > >> > > > <updateHandler class="solr.DirectUpdateHandler2"
> > > >> > >> > > >                 enable="${solr.can.write:true}">
> > > >> > >> > > >
> > > >> > >> > > > java -Dsolr.can.write=false .....
> > > >> > >> > > >
> > > >> > >> > > > And I can reload the index manually:
> > > >> > >> > > >
> > > >> > >> > > > curl "
> > > >> > >> > > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > > >> > >> > > > "
> > > >> > >> > > >
> > > >> > >> > > > But this is not an ideal solution; I'd like for the
> > read-only
> > > >> > >> server to
> > > >> > >> > > > discover index changes on its own. Any pointers?
> > > >> > >> > > >
> > > >> > >> > > > Thanks,
> > > >> > >> > > >
> > > >> > >> > > >  roman
> > > >> > >> > > >
> > > >> > >> > > >
> > > >> > >> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <
> > > >> > roman.ch...@gmail.com>
> > > >> > >> > > wrote:
> > > >> > >> > > >
> > > >> > >> > > >> Hello,
> > > >> > >> > > >>
> > > >> > >> > > >> I need your expert advice. I am thinking about running
> two
> > > >> > >> instances
> > > >> > >> > of
> > > >> > >> > > >> solr that share the same datadirectory. The *reason*
> > being:
> > > >> > >> indexing
> > > >> > >> > > >> instance is constantly building cache after every commit
> > (we
> > > >> > have a
> > > >> > >> > big
> > > >> > >> > > >> cache) and this slows it down. But indexing doesn't need
> > > much
> > > >> > RAM,
> > > >> > >> > only
> > > >> > >> > > the
> > > >> > >> > > >> search does (and server has lots of CPUs)
> > > >> > >> > > >>
> > > >> > >> > > >> So, it is like having two solr instances
> > > >> > >> > > >>
> > > >> > >> > > >> 1. solr-indexing-master
> > > >> > >> > > >> 2. solr-read-only-master
> > > >> > >> > > >>
> > > >> > >> > > >> In the solrconfig.xml I can disable update components,
> It
> > > >> should
> > > >> > be
> > > >> > >> > > fine.
> > > >> > >> > > >> However, I don't know how to 'trigger' index re-opening
> on
> > > (2)
> > > >> > >> after
> > > >> > >> > the
> > > >> > >> > > >> commit happens on (1).
> > > >> > >> > > >>
> > > >> > >> > > >> Ideally, the second instance could monitor the disk and
> > > >> re-open
> > > >> > >> disk
> > > >> > >> > > after
> > > >> > >> > > >> new files appear there. Do I have to implement custom
> > > >> > >> > > IndexReaderFactory?
> > > >> > >> > > >> Or something else?
> > > >> > >> > > >>
> > > >> > >> > > >> Please note: I know about the replication, this usecase
> is
> > > >> IMHO
> > > >> > >> > slightly
> > > >> > >> > > >> different - in fact, write-only-master (1) is also a
> > > >> replication
> > > >> > >> > master
> > > >> > >> > > >>
> > > >> > >> > > >> Googling turned out only this
> > > >> > >> > > >>
> > > >> > >>
> > > http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912-
> > > >> > >> > > no
> > > >> > >> > > >> pointers there.
> > > >> > >> > > >>
> > > >> > >> > > >> But If I am approaching the problem wrongly, please
> don't
> > > >> > hesitate
> > > >> > >> to
> > > >> > >> > > >> 're-educate' me :)
> > > >> > >> > > >>
> > > >> > >> > > >> Thanks!
> > > >> > >> > > >>
> > > >> > >> > > >>  roman
> > > >> > >> > > >>
> > > >> > >> > >
> > > >> > >> > >
> > > >> > >> >
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Two instances of solr - the same datadir?

Reply via email to