RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

Kulkarni, Ajit Kamalakar Thu, 12 Mar 2009 07:00:29 -0700

Hi Shalin Shekhar Mangar,

Thanks for your inputs.

Please see my comments below.

I wish to know if there is any user who used EmbeddedSolrServer for
indexing and CommonsHttpSolrServer for search.

I have found that this combination offers better performance for
indexing. Searching becomes flexible as you can search from more number
of http clients simultaneously.

Does anyone have any related performance data? 

Thanks,

Ajit

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, March 11, 2009 7:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer

On Wed, Mar 11, 2009 at 6:37 PM, Kulkarni, Ajit Kamalakar <

ajkulka...@ptc.com> wrote:

> 

> If we index the documents using CommonsHttpSolrServer and search using

> the same, we get the updated results

> 

> That means we can search the latest added document as well even if it
is

> not committed to the file system

> 

That is not possible. Without calling commit, new documents will not be

visible to a searcher.

Ajit: When I tested using CommonsHttpSolrServer for indexing as well as
searching, I could search the latest added document through solr admin
page.

I could also search the document through CommonsHttpSolrServer without
explicitly calling commit.

I am even more surprised to see the same result by using
EmbeddedSolrServer for indexing and for searching CommonsHttpSolrServer.

I used embeddedSolrServer = new
EmbeddedSolrServer(SolrCore.getSolrCore()); which is deprecated API.

For this I did not need to call commit on CommonsHttpSolrServer to get
latest document searched on either solr admin page or even
programmatically through CommonsHttpSolrServer

However if I use 

      CoreContainer multicore = new CoreContainer(); 

      File home = new File( getSolrHome() );

      File f = new File( home, "solr.xml" );

      multicore.load( getSolrHome(), f );

      embeddedSolrServer = new EmbeddedSolrServer( multicore,
SolrIndexConstants.DEFAULT_CORE );

I had to use commit on CommonsHttpSolrServer to search the latest added
documents and the document was available through solr admin page only
when I programatcaaly searched after calling commit on
CommonsHttpSolrServer

This is consistent with what you mentioned above.

> So it looks like there is some kind of cache that is used by both
index

> and search logic inside solr for a given SolrServer components (e. g.

> CommonsHttpSolrServer, EmbeddedSolrServer)

> 

Indexing does not create any cache. The caching is done only by the

searcher. The old searcher/cache is discarded and a new searcher/cache
is

created when you call commit. Setting autoWarmCount on the caches in

solrconfig.xml makes the new searcher run some of the most recently used

queries on the old searcher to warm up the new cache.

Calling commit on the SolrServer to synch with the index data may not be

> good option as I suppose it to be expensive operation.

> 

It is the only option. But you may be able to make the operation cheaper
by

tweaking the autowarmCount on the caches (this is specified in

solrconfig.xml). However, caches are important for good search
performance.

Depending on your search traffic, you'll need to find a sweet spot.

> The cache and hard disk data synchronization should be independent of

> the SolrServer instances managed by Solr Web Application inside
tomcat.

> 

SolrServer is not really a server in itself. It is (a pointer to?) a
server

being used by a solrj client. The CommonsHttpSolrServer refers to a
remote

server url and makes calls through HTTP. SolrCore is the internal class

which manages the state of the server.

A SolrCore is created by the solr webapp. When you create another
SolrCore

for use by EmbeddedSolrServer, they do not know about each other.
Therefore

you need to notify it if you change the index through another core.

Ajit: If the same JVM is managing responding searchers for
EmbeddedSolrServer as well as CommonsHttpSolrServer, then why can't
responding searcher be same? I understand that EmbeddedSolrServer and
CommonsHttpSolrServer clients are separate but if searchers are managed
in same JVM, theoretically we should be able to make singleton searcher
attached to every kind of SolrServer. This searcher should be listener
for indexer.

Since searching is read operation, there won't be any threading or
scalability issue but indexer should be one

Since I don't have enough knowledge about solr and lucene so I may be
totally wrong!

> The issue still will be that EmbeddedSolrServer may directly access
hard

> index data as it may bypass the Solr web app totally

> 

> I am embedding tomcat in my RMI server.

> 

> The RMI Server is going to use EmbeddedSolrServer and it also hosts
the

> Solr WebApp inside its tomcat instance

> 

> So I guess I should be able to manage a singleton cache  that is given

> to both, CommonsHttpSolrServer related components managed inside Solr

> WebApp and EmbeddedSolrServer components

> 

> 

Why have two of them at all? Is the solr deployed inside tomcat serves
HTTP

requests from external clients without going through your RMI server?
You

can simplify things by keeping it either in tomcat or in embedded mode.

Ajit: The outside http search requests are served by solr web app
running under tomcat embedded in RMI server. 

RMI server is just a host.

I have multiple remote java clients that can simultaneously search. http
seems better approach for searching. 

Can you support this kind of searching through embedded mode? I guess
embedded mode is for local client.

Hope that helps.

-- 

Regards,

Shalin Shekhar Mangar.

RE: Combination of EmbeddedSolrServer and CommonHttpSolrServer

Reply via email to