I am using Solr 4.5.1. I have two collections:
                Collection 1 - 2 shards, 3 replicas (Size of Shard 1 - 115
MB, Size of Shard 2 - 55 MB) 
                Collection 2 - 2 shards, 3 replicas (Size of Shard 2 - 3.5
GB, Size of Shard 2 - 1 GB)

I have a batch process that performs indexing (full refresh) - once a week
on the same index.

Here is some information on how I index:
a) I use SolrJ's bulk ADD API for indexing - CloudSolrServer.add(Collection
docs).
b) I have an autoCommit (hardcommit) setting of for both my Collections
(solrConfig.xml):
                                <autoCommit>
                                                <maxDocs>100000</maxDocs>
                                               
<openSearcher>false</openSearcher>
                                </autoCommit>
c) I do a programatic hardcommit at the end of the indexing cycle - with an
open searcher of "true" - so that the documents show up on the Search
Results.
d) I neither programatically soft commit (nor have any autoSoftCommit
seetings) during the batch indexing process
e) When I re-index all my data again (the following week) into the same
index - I don't delete existing docs. Rather, I just re-index into the same
Collection.
f) I am using the default mergefactor of 10 in my solrconfig.xml
                <mergeFactor>10</mergeFactor>
                
Here is what I am observing:
1) After a batch indexing cycle - the segment counts for each shard / core
is pretty high. The Solr Dashboard reports segment counts between 8 - 30
segments on the variousr cores.
2) Sometimes the Solr Dashboard shows the status of my Core as - NOT
OPTIMIZED. This I find unusual - since I have just finished a Batch indexing
cycle - and would assume that the Index should already be optimized - Is
this happening because I don't delete my docs before re-indexing all my data
?
3) After I run an optimize on my Collections - the segment count does reduce
to significantly - to 1 segment.

Am I doing indexing the right way ? Is there a better strategy ?

Is it necessary to perform an optimize after every batch indexing cycle ?? 

The outcome I am looking for is that I need an optimized index after every
major Batch Indexing cycle.

Thanks!!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-one-need-to-perform-an-optimize-soon-after-doing-a-batch-indexing-using-SolrJ-tp4143686.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to