RE: Bulk Indexing

Zhang, Lisheng Fri, 27 Jul 2012 11:57:13 -0700

Hi,

Previously I asked a similar question and I have not fully implemented yet.


My plan is:
1) use Solr only for search, not for indexing
2) have a separate java process to index (calling lucene API directly, maybe
   can call Solr API, I need to check more details).

As other people pointed earlier, the problem with above plan is that Solr
does not know when to reload IndexSearcher (namely underlying IndexReader)
after indexing is done, since indexer and Solr are two separate processes?

My plan is to let Solr not to cache any IndexReader (each time when performing
search, just create a new IndexSearcher), because:

1) our app is made of many lucene indexed data folders (in Solr language, many
   cores), caching IndexSearcher would be too expensive.
2) in my experience, without caching search is still quite fast (this is 
   maybe partially due to the fact our indexed data is not large, per folder).

This is just my plan (not fully implemented yet).

Best regards, Lisheng

-----Original Message-----
From: Sohail Aboobaker [mailto:sabooba...@gmail.com]
Sent: Friday, July 27, 2012 6:56 AM
To: solr-user@lucene.apache.org
Subject: Bulk Indexing


Hi,

We have created a search service which is responsible for providing
interface between Solr and rest of our application. It basically takes one
document at a time and updates or adds it to appropriate index.

Now, in application, we have processes, that add products (our document are
based on products) in bulk using a data bulk load process. At this point,
we use the same search service to add the documents in a loop. These can be
up to 20,000 documents in one load.

In a recent solr user discussion, it seems like this is a no-no strategy
with red flags all around it.

What are other alternatives?

Thanks,

Regards,
Sohail Aboobaker.

RE: Bulk Indexing

Reply via email to