Hi there,
I'm building an index to which I'm sending a few hundred thousand
entries. I pull them off the database in batches of 25k and send them to
solr, 100 documents at a time. I was doing a commit after each of those
but after what Yonik says I will remove it and commit only after each
batch of 25k.
Q1: I've got autocommit set to 1000 now.. in solrconfig.xml, should i
disable it in this scenario?
Q2: To decide which of those 25k are going to be indexed, we need to do
a query for each (this is the main reason to optimize before a new DB
batch is indexed), each of these 25k queries take around 30ms which is
good enough for us, but i've observed every ~30 queries the time of one
search goes up to 150ms or even 1200ms. Then it does another ~30, etc. I
guess there is something happening inside the server regularly that
causes it. Any clues what it can be and how can i minimize that time?
Q3: The 25k searches are done without any cumulative effect on
performance (avg/search is ~30ms from start to end). But if inmmediately
after start posting documents to the index tomcat peaks CPU. But if i
stop tomcat, and then post the 25k documents without doing those
searches they're very quick. Is there any reason why the searches would
affect tomcat to justify this? Just to clarify, searches are NOT done at
the same time as indexing.
My tomcat is running with -server -Xmx512m -Xms512m
Cheers,
galo
Yonik Seeley wrote:
On 4/13/07, James liu <[EMAIL PROTECTED]> wrote:
i find it will be OutOfMemory when i get more that 10k records.
so now i index 10k records( 5k / record)
In one request? There's really no reason to put more than hundreds of
documents in a single add request.
If you are indexing using multiple requests, and always run into
problems at 10k records, you are probably hitting memory issues with
Lucene merging. If that's the case, try lowering the mergeFactor so
fewer segments will be merged at the same time.
Some other things to be careful of:
- don't call commit after you add every batch of documents
- don't set maxBufferedDocs too high if you don't have the memory
-Yonik