Re: Two instances of solr - the same datadir?

Tim Vaillancourt Fri, 07 Jun 2013 17:52:42 -0700

If it makes you feel better, I also considered this approach when I was in
the same situation with a separate indexer and searcher on one Physical
linux machine.


My main concern was "re-using" the FS cache between both instances - If I
replicated to myself there would be two independent copies of the index,
FS-cached separately.

I like the suggestion of using autoCommit to reload the index. If I'm
reading that right, you'd set an autoCommit on 'zero docs changing', or
just 'every N seconds'? Did that work?

Best of luck!

Tim


On 5 June 2013 10:19, Roman Chyla <roman.ch...@gmail.com> wrote:

> So here it is for a record how I am solving it right now:
>
> Write-master is started with: -Dmontysolr.warming.enabled=false
> -Dmontysolr.write.master=true -Dmontysolr.read.master=
> http://localhost:5005
> Read-master is started with: -Dmontysolr.warming.enabled=true
> -Dmontysolr.write.master=false
>
>
> solrconfig.xml changes:
>
> 1. all index changing components have this bit,
> enable="${montysolr.master:true}" - ie.
>
> <updateHandler class="solr.DirectUpdateHandler2"
>                  enable="${montysolr.master:true}">
>
> 2. for cache warming de/activation
>
> <listener event="newSearcher"
>       class="solr.QuerySenderListener"
>       enable="${montysolr.enable.warming:true}">...
>
> 3. to trigger refresh of the read-only-master (from write-master):
>
>     <listener event="postCommit"
>       class="solr.RunExecutableListener"
>       enable="${montysolr.master:true}">
>       <str name="exe">curl</str>
>       <str name="dir">.</str>
>       <bool name="wait">false</bool>
>       <arr name="args"> <str>${montysolr.read.master:http://localhost
>
> }/solr/admin/cores?wt=json&amp;action=RELOAD&amp;core=collection1</str></arr>
>     </listener>
>
> This works, I still don't like the reload of the whole core, but it seems
> like the easiest thing to do now.
>
> -- roman
>
>
> On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla <roman.ch...@gmail.com>
> wrote:
>
> > Hi Peter,
> >
> > Thank you, I am glad to read that this usecase is not alien.
> >
> > I'd like to make the second instance (searcher) completely read-only, so
> I
> > have disabled all the components that can write.
> >
> > (being lazy ;)) I'll probably use
> > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> after
> > commit, or write some IndexReaderFactory that checks for changes
> >
> > The problem with calling the 'core reload' - is that it seems lots of
> work
> > for just opening a new searcher, eeekkk...somewhere I read that it is
> cheap
> > to reload a core, but re-opening the index searches must be definitely
> > cheaper...
> >
> > roman
> >
> >
> > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <peter.stu...@gmail.com
> >wrote:
> >
> >> Hi,
> >> We use this very same scenario to great effect - 2 instances using the
> >> same
> >> dataDir with many cores - 1 is a writer (no caching), the other is a
> >> searcher (lots of caching).
> >> To get the searcher to see the index changes from the writer, you need
> the
> >> searcher to do an empty commit - i.e. you invoke a commit with 0
> >> documents.
> >> This will refresh the caches (including autowarming), [re]build the
> >> relevant searchers etc. and make any index changes visible to the RO
> >> instance.
> >> Also, make sure to use <lockType>native</lockType> in solrconfig.xml to
> >> ensure the two instances don't try to commit at the same time.
> >> There are several ways to trigger a commit:
> >> Call commit() periodically within your own code.
> >> Use autoCommit in solrconfig.xml.
> >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> >> searcher the index has changed, then call commit when called (more
> complex
> >> coding, but good if the index changes on an ad-hoc basis).
> >> Note, doing things this way isn't really suitable for an NRT
> environment.
> >>
> >> HTH,
> >> Peter
> >>
> >>
> >>
> >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla <roman.ch...@gmail.com>
> >> wrote:
> >>
> >> > Replication is fine, I am going to use it, but I wanted it for
> instances
> >> > *distributed* across several (physical) machines - but here I have one
> >> > physical machine, it has many cores. I want to run 2 instances of solr
> >> > because I think it has these benefits:
> >> >
> >> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> >> > searcher (28GB)
> >> > 2) I can deactivate warming for the writer and keep it for the
> searcher
> >> > (this considerably speeds up indexing - each time we commit, the
> server
> >> is
> >> > rebuilding a citation network of 80M edges)
> >> > 3) saving disk space and better OS caching (OS should be able to use
> >> more
> >> > RAM for the caching, which should result in faster operations - the
> two
> >> > processes are accessing the same index)
> >> >
> >> > Maybe I should just forget it and go with the replication, but it
> >> doesn't
> >> > 'feel right' IFF it is on the same physical machine. And Lucene
> >> > specifically has a method for discovering changes and re-opening the
> >> index
> >> > (DirectoryReader.openIfChanged)
> >> >
> >> > Am I not seeing something?
> >> >
> >> > roman
> >> >
> >> >
> >> >
> >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> >> > jhell...@innoventsolutions.com> wrote:
> >> >
> >> > > Roman,
> >> > >
> >> > > Could you be more specific as to why replication doesn't meet your
> >> > > requirements?  It was geared explicitly for this purpose, including
> >> the
> >> > > automatic discovery of changes to the data on the index master.
> >> > >
> >> > > Jason
> >> > >
> >> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla <roman.ch...@gmail.com>
> >> wrote:
> >> > >
> >> > > > OK, so I have verified the two instances can run alongside,
> sharing
> >> the
> >> > > > same datadir
> >> > > >
> >> > > > All update handlers are unaccessible in the read-only master
> >> > > >
> >> > > > <updateHandler class="solr.DirectUpdateHandler2"
> >> > > >                 enable="${solr.can.write:true}">
> >> > > >
> >> > > > java -Dsolr.can.write=false .....
> >> > > >
> >> > > > And I can reload the index manually:
> >> > > >
> >> > > > curl "
> >> > > >
> >> > >
> >> >
> >>
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> >> > > > "
> >> > > >
> >> > > > But this is not an ideal solution; I'd like for the read-only
> >> server to
> >> > > > discover index changes on its own. Any pointers?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > >  roman
> >> > > >
> >> > > >
> >> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla <
> roman.ch...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > >> Hello,
> >> > > >>
> >> > > >> I need your expert advice. I am thinking about running two
> >> instances
> >> > of
> >> > > >> solr that share the same datadirectory. The *reason* being:
> >> indexing
> >> > > >> instance is constantly building cache after every commit (we
> have a
> >> > big
> >> > > >> cache) and this slows it down. But indexing doesn't need much
> RAM,
> >> > only
> >> > > the
> >> > > >> search does (and server has lots of CPUs)
> >> > > >>
> >> > > >> So, it is like having two solr instances
> >> > > >>
> >> > > >> 1. solr-indexing-master
> >> > > >> 2. solr-read-only-master
> >> > > >>
> >> > > >> In the solrconfig.xml I can disable update components, It should
> be
> >> > > fine.
> >> > > >> However, I don't know how to 'trigger' index re-opening on (2)
> >> after
> >> > the
> >> > > >> commit happens on (1).
> >> > > >>
> >> > > >> Ideally, the second instance could monitor the disk and re-open
> >> disk
> >> > > after
> >> > > >> new files appear there. Do I have to implement custom
> >> > > IndexReaderFactory?
> >> > > >> Or something else?
> >> > > >>
> >> > > >> Please note: I know about the replication, this usecase is IMHO
> >> > slightly
> >> > > >> different - in fact, write-only-master (1) is also a replication
> >> > master
> >> > > >>
> >> > > >> Googling turned out only this
> >> > > >>
> >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
> >> > > no
> >> > > >> pointers there.
> >> > > >>
> >> > > >> But If I am approaching the problem wrongly, please don't
> hesitate
> >> to
> >> > > >> 're-educate' me :)
> >> > > >>
> >> > > >> Thanks!
> >> > > >>
> >> > > >>  roman
> >> > > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Two instances of solr - the same datadir?

Reply via email to