Hi Shalin Shekhar Mangar, Thanks for your inputs.
Please see my comments below. I wish to know if there is any user who used EmbeddedSolrServer for indexing and CommonsHttpSolrServer for search. I have found that this combination offers better performance for indexing. Searching becomes flexible as you can search from more number of http clients simultaneously. Does anyone have any related performance data? Thanks, Ajit -----Original Message----- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, March 11, 2009 7:24 PM To: solr-user@lucene.apache.org Subject: Re: Combination of EmbeddedSolrServer and CommonHttpSolrServer On Wed, Mar 11, 2009 at 6:37 PM, Kulkarni, Ajit Kamalakar < ajkulka...@ptc.com> wrote: > > If we index the documents using CommonsHttpSolrServer and search using > the same, we get the updated results > > That means we can search the latest added document as well even if it is > not committed to the file system > That is not possible. Without calling commit, new documents will not be visible to a searcher. Ajit: When I tested using CommonsHttpSolrServer for indexing as well as searching, I could search the latest added document through solr admin page. I could also search the document through CommonsHttpSolrServer without explicitly calling commit. I am even more surprised to see the same result by using EmbeddedSolrServer for indexing and for searching CommonsHttpSolrServer. I used embeddedSolrServer = new EmbeddedSolrServer(SolrCore.getSolrCore()); which is deprecated API. For this I did not need to call commit on CommonsHttpSolrServer to get latest document searched on either solr admin page or even programmatically through CommonsHttpSolrServer However if I use CoreContainer multicore = new CoreContainer(); File home = new File( getSolrHome() ); File f = new File( home, "solr.xml" ); multicore.load( getSolrHome(), f ); embeddedSolrServer = new EmbeddedSolrServer( multicore, SolrIndexConstants.DEFAULT_CORE ); I had to use commit on CommonsHttpSolrServer to search the latest added documents and the document was available through solr admin page only when I programatcaaly searched after calling commit on CommonsHttpSolrServer This is consistent with what you mentioned above. > So it looks like there is some kind of cache that is used by both index > and search logic inside solr for a given SolrServer components (e. g. > CommonsHttpSolrServer, EmbeddedSolrServer) > Indexing does not create any cache. The caching is done only by the searcher. The old searcher/cache is discarded and a new searcher/cache is created when you call commit. Setting autoWarmCount on the caches in solrconfig.xml makes the new searcher run some of the most recently used queries on the old searcher to warm up the new cache. Calling commit on the SolrServer to synch with the index data may not be > good option as I suppose it to be expensive operation. > It is the only option. But you may be able to make the operation cheaper by tweaking the autowarmCount on the caches (this is specified in solrconfig.xml). However, caches are important for good search performance. Depending on your search traffic, you'll need to find a sweet spot. > The cache and hard disk data synchronization should be independent of > the SolrServer instances managed by Solr Web Application inside tomcat. > SolrServer is not really a server in itself. It is (a pointer to?) a server being used by a solrj client. The CommonsHttpSolrServer refers to a remote server url and makes calls through HTTP. SolrCore is the internal class which manages the state of the server. A SolrCore is created by the solr webapp. When you create another SolrCore for use by EmbeddedSolrServer, they do not know about each other. Therefore you need to notify it if you change the index through another core. Ajit: If the same JVM is managing responding searchers for EmbeddedSolrServer as well as CommonsHttpSolrServer, then why can't responding searcher be same? I understand that EmbeddedSolrServer and CommonsHttpSolrServer clients are separate but if searchers are managed in same JVM, theoretically we should be able to make singleton searcher attached to every kind of SolrServer. This searcher should be listener for indexer. Since searching is read operation, there won't be any threading or scalability issue but indexer should be one Since I don't have enough knowledge about solr and lucene so I may be totally wrong! > The issue still will be that EmbeddedSolrServer may directly access hard > index data as it may bypass the Solr web app totally > > I am embedding tomcat in my RMI server. > > The RMI Server is going to use EmbeddedSolrServer and it also hosts the > Solr WebApp inside its tomcat instance > > So I guess I should be able to manage a singleton cache that is given > to both, CommonsHttpSolrServer related components managed inside Solr > WebApp and EmbeddedSolrServer components > > Why have two of them at all? Is the solr deployed inside tomcat serves HTTP requests from external clients without going through your RMI server? You can simplify things by keeping it either in tomcat or in embedded mode. Ajit: The outside http search requests are served by solr web app running under tomcat embedded in RMI server. RMI server is just a host. I have multiple remote java clients that can simultaneously search. http seems better approach for searching. Can you support this kind of searching through embedded mode? I guess embedded mode is for local client. Hope that helps. -- Regards, Shalin Shekhar Mangar.