Re: Document Update performances Improvement

Nicolas Paris Tue, 22 Oct 2019 12:14:25 -0700

> We, at Auto-Suggest, also do atomic updates daily and specifically
> changing merge factor gave us a boost of ~4x


Interesting. What kind of change exactly on the merge factor side ?


> At current configuration, our core atomically updates ~423 documents
> per second.

Would you say atomical update is faster than regular replacement of
documents ? (considering my first thought on this below)

> > I am wondering if **atomic update feature** would faster the process.
> > From one hand, using this feature would save network because only a
> > small subset of the document would be send from the client to the
> > server.
> > On the other hand, the server will have to collect the values from the
> > disk and reindex them. In addition, this implies to store the values
> > every fields (I am not storing every fields) and use more space.


Thanks Paras



On Tue, Oct 22, 2019 at 01:00:10PM +0530, Paras Lehana wrote:
> Hi Nicolas,
> 
> Have you tried playing with values of *IndexConfig*
> <https://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html>
> (merge factor, segment size, maxBufferedDocs, Merge Policies)? We, at
> Auto-Suggest, also do atomic updates daily and specifically changing merge
> factor gave us a boost of ~4x during indexing. At current configuration,
> our core atomically updates ~423 documents per second.
> 
> On Sun, 20 Oct 2019 at 02:07, Nicolas Paris <nicolas.pa...@riseup.net>
> wrote:
> 
> > > Maybe you need to give more details. I recommend always to try and
> > > test yourself as you know your own solution best. What performance do
> > > your use car needs and what is your current performance?
> >
> > I have 10 collections on 4 shards (no replications). The collections are
> > quite large ranging from 2GB to 60 GB per shard. In every case, the
> > update process only add several values to an indexed array field on a
> > document subset of each collection. The proportion of the subset is from
> > 0 to 100%, and 95% of time below 20%. The array field represents 1 over
> > 20 fields which are mainly unstored fields with some large textual
> > fields.
> >
> > The 4 solr instance collocate with the spark. Right now I tested with 40
> > spark executors. Commit timing and commit number document are both set
> > to 20000. Each shard has 20g of memory.
> > Loading/replacing the largest collection is about 2 hours - which is
> > quite fast I guess. Updating 5% percent of documents of each
> > collections, is about half an hour.
> >
> > Because my need is "only" to append several values to an array I suspect
> > there is some trick to make things faster.
> >
> >
> >
> > On Sat, Oct 19, 2019 at 10:10:36PM +0200, Jörn Franke wrote:
> > > Maybe you need to give more details. I recommend always to try and test
> > yourself as you know your own solution best. Depending on your spark
> > process atomic updates  could be faster.
> > >
> > > With Spark-Solr additional complexity comes. You could have too many
> > executors for your Solr instance(s), ie a too high parallelism.
> > >
> > > Probably the most important question is:
> > > What performance do your use car needs and what is your current
> > performance?
> > >
> > > Once this is clear further architecture aspects can be derived, such as
> > number of spark executors, number of Solr instances, sharding, replication,
> > commit timing etc.
> > >
> > > > Am 19.10.2019 um 21:52 schrieb Nicolas Paris <nicolas.pa...@riseup.net
> > >:
> > > >
> > > > Hi community,
> > > >
> > > > Any advice to speed-up updates ?
> > > > Is there any advice on commit, memory, docvalues, stored or any tips to
> > > > faster things ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > >> On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote:
> > > >> Hi
> > > >>
> > > >> I am looking for a way to faster the update of documents.
> > > >>
> > > >> In my context, the update replaces one of the many existing indexed
> > > >> fields, and keep the others as is.
> > > >>
> > > >> Right now, I am building the whole document, and replacing the
> > existing
> > > >> one by id.
> > > >>
> > > >> I am wondering if **atomic update feature** would faster the process.
> > > >>
> > > >> From one hand, using this feature would save network because only a
> > > >> small subset of the document would be send from the client to the
> > > >> server.
> > > >> On the other hand, the server will have to collect the values from the
> > > >> disk and reindex them. In addition, this implies to store the values
> > for
> > > >> every fields (I am not storing every fields) and use more space.
> > > >>
> > > >> Also I have read about the ConcurrentUpdateSolrServer class might be
> > an
> > > >> optimized way of updating documents.
> > > >>
> > > >> I am using spark-solr library to deal with solr-cloud. If something
> > > >> exist to faster the process, I would be glad to implement it in that
> > > >> library.
> > > >> Also, I have split the collection over multiple shard, and I admit
> > this
> > > >> faster the update process, but who knows ?
> > > >>
> > > >> Thoughts ?
> > > >>
> > > >> --
> > > >> nicolas
> > > >>
> > > >
> > > > --
> > > > nicolas
> > >
> >
> > --
> > nicolas
> >
> 
> 
> -- 
> -- 
> Regards,
> 
> *Paras Lehana* [65871]
> Software Programmer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
> 
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
> 
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
> 
> -- 
> IMPORTANT: 
> NEVER share your IndiaMART OTP/ Password with anyone.

-- 
nicolas

Re: Document Update performances Improvement

Reply via email to