OK, then let’s see the indexing code. Make sure you don’t
1> commit after every batch
2> never, never, never optimize.
BTW, you do not want to turn off commits entirely, there are some internal data
structures that grow between commits. So I might do something like specify
commitWithin on my add
I have tested the query desperately, actually executing query is pretty fast,
it only took a few minutes to go through all results including converting solr
document to java object. So I believe the slowness is in persistence end. BTW,
I am using linux system.
Sent from Yahoo Mail for iPhone
On 6/30/2019 2:08 PM, derrick cui wrote:
Good point Erick, I will try it today, but I have already use cursorMark in my
query for deep pagination.
Also I noticed that my cpu usage is pretty high, 8 cores, usage is over 700%. I
am not sure it will help if I use ssd disk
That depends on whether
Good point Erick, I will try it today, but I have already use cursorMark in my
query for deep pagination.
Also I noticed that my cpu usage is pretty high, 8 cores, usage is over 700%. I
am not sure it will help if I use ssd disk
Sent from Yahoo Mail for iPhone
On Sunday, June 30, 2019, 2:57
Well, the first thing I’d do is see what’s taking the time, querying or
updating? Should be easy enough to comment out whatever it is that sends docs
to Solr.
If it’s querying, it sounds like you’re paging through your entire data set and
may be hitting the “deep paging” problem. Use cursorMark
Only thing I can think of is to check whether you can do in-place
rather than atomic updates:
https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html#in-place-updates
But the conditions are quite restrictive: non-indexed
(indexed="false"), non-stored (stored="false"), single value
Thanks Alex,
My usage is that
1. I execute query and get result, return id only 2. Add a value to a dynamic
field3. Save to solr with batch size1000
I have define 50 queries and run them parallel Also I disable hard commit and
soft commit per 1000 docs
I am wondering whether any configuration
Indexing new documents is just adding additional segments.
Adding new field to a document means:
1) Reading existing document (may not always be possible, depending on
field configuration)
2) Marking existing document as deleted
3) Creating new document with reconstructed+plus new fields
4) Possib
I have 400k data, indexing is pretty fast, only take 10 minutes, but add
dynamic field to all documents according to query results is very slow, take
about 1.5 hours.
Anyone knows what could be the reason?
Thanks
Sent from Yahoo Mail for iPhone