Re: Document Update performances Improvement

Nicolas Paris Sat, 19 Oct 2019 13:37:39 -0700

> Maybe you need to give more details. I recommend always to try and
> test yourself as you know your own solution best. What performance do
> your use car needs and what is your current performance?


I have 10 collections on 4 shards (no replications). The collections are
quite large ranging from 2GB to 60 GB per shard. In every case, the
update process only add several values to an indexed array field on a
document subset of each collection. The proportion of the subset is from
0 to 100%, and 95% of time below 20%. The array field represents 1 over
20 fields which are mainly unstored fields with some large textual
fields.

The 4 solr instance collocate with the spark. Right now I tested with 40
spark executors. Commit timing and commit number document are both set
to 20000. Each shard has 20g of memory.
Loading/replacing the largest collection is about 2 hours - which is
quite fast I guess. Updating 5% percent of documents of each
collections, is about half an hour.

Because my need is "only" to append several values to an array I suspect
there is some trick to make things faster.



On Sat, Oct 19, 2019 at 10:10:36PM +0200, Jörn Franke wrote:
> Maybe you need to give more details. I recommend always to try and test 
> yourself as you know your own solution best. Depending on your spark process 
> atomic updates  could be faster.
> 
> With Spark-Solr additional complexity comes. You could have too many 
> executors for your Solr instance(s), ie a too high parallelism.
> 
> Probably the most important question is:
> What performance do your use car needs and what is your current performance?
> 
> Once this is clear further architecture aspects can be derived, such as 
> number of spark executors, number of Solr instances, sharding, replication, 
> commit timing etc.
> 
> > Am 19.10.2019 um 21:52 schrieb Nicolas Paris <nicolas.pa...@riseup.net>:
> > 
> > Hi community,
> > 
> > Any advice to speed-up updates ?
> > Is there any advice on commit, memory, docvalues, stored or any tips to
> > faster things ?
> > 
> > Thanks
> > 
> > 
> >> On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote:
> >> Hi
> >> 
> >> I am looking for a way to faster the update of documents.
> >> 
> >> In my context, the update replaces one of the many existing indexed
> >> fields, and keep the others as is.
> >> 
> >> Right now, I am building the whole document, and replacing the existing
> >> one by id.
> >> 
> >> I am wondering if **atomic update feature** would faster the process.
> >> 
> >> From one hand, using this feature would save network because only a
> >> small subset of the document would be send from the client to the
> >> server. 
> >> On the other hand, the server will have to collect the values from the
> >> disk and reindex them. In addition, this implies to store the values for
> >> every fields (I am not storing every fields) and use more space.
> >> 
> >> Also I have read about the ConcurrentUpdateSolrServer class might be an
> >> optimized way of updating documents.
> >> 
> >> I am using spark-solr library to deal with solr-cloud. If something
> >> exist to faster the process, I would be glad to implement it in that
> >> library.
> >> Also, I have split the collection over multiple shard, and I admit this
> >> faster the update process, but who knows ?
> >> 
> >> Thoughts ?
> >> 
> >> -- 
> >> nicolas
> >> 
> > 
> > -- 
> > nicolas
> 

-- 
nicolas

Re: Document Update performances Improvement

Reply via email to