Hi Nicolas, Have you tried playing with values of *IndexConfig* <https://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html> (merge factor, segment size, maxBufferedDocs, Merge Policies)? We, at Auto-Suggest, also do atomic updates daily and specifically changing merge factor gave us a boost of ~4x during indexing. At current configuration, our core atomically updates ~423 documents per second.
On Sun, 20 Oct 2019 at 02:07, Nicolas Paris <nicolas.pa...@riseup.net> wrote: > > Maybe you need to give more details. I recommend always to try and > > test yourself as you know your own solution best. What performance do > > your use car needs and what is your current performance? > > I have 10 collections on 4 shards (no replications). The collections are > quite large ranging from 2GB to 60 GB per shard. In every case, the > update process only add several values to an indexed array field on a > document subset of each collection. The proportion of the subset is from > 0 to 100%, and 95% of time below 20%. The array field represents 1 over > 20 fields which are mainly unstored fields with some large textual > fields. > > The 4 solr instance collocate with the spark. Right now I tested with 40 > spark executors. Commit timing and commit number document are both set > to 20000. Each shard has 20g of memory. > Loading/replacing the largest collection is about 2 hours - which is > quite fast I guess. Updating 5% percent of documents of each > collections, is about half an hour. > > Because my need is "only" to append several values to an array I suspect > there is some trick to make things faster. > > > > On Sat, Oct 19, 2019 at 10:10:36PM +0200, Jörn Franke wrote: > > Maybe you need to give more details. I recommend always to try and test > yourself as you know your own solution best. Depending on your spark > process atomic updates could be faster. > > > > With Spark-Solr additional complexity comes. You could have too many > executors for your Solr instance(s), ie a too high parallelism. > > > > Probably the most important question is: > > What performance do your use car needs and what is your current > performance? > > > > Once this is clear further architecture aspects can be derived, such as > number of spark executors, number of Solr instances, sharding, replication, > commit timing etc. > > > > > Am 19.10.2019 um 21:52 schrieb Nicolas Paris <nicolas.pa...@riseup.net > >: > > > > > > Hi community, > > > > > > Any advice to speed-up updates ? > > > Is there any advice on commit, memory, docvalues, stored or any tips to > > > faster things ? > > > > > > Thanks > > > > > > > > >> On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote: > > >> Hi > > >> > > >> I am looking for a way to faster the update of documents. > > >> > > >> In my context, the update replaces one of the many existing indexed > > >> fields, and keep the others as is. > > >> > > >> Right now, I am building the whole document, and replacing the > existing > > >> one by id. > > >> > > >> I am wondering if **atomic update feature** would faster the process. > > >> > > >> From one hand, using this feature would save network because only a > > >> small subset of the document would be send from the client to the > > >> server. > > >> On the other hand, the server will have to collect the values from the > > >> disk and reindex them. In addition, this implies to store the values > for > > >> every fields (I am not storing every fields) and use more space. > > >> > > >> Also I have read about the ConcurrentUpdateSolrServer class might be > an > > >> optimized way of updating documents. > > >> > > >> I am using spark-solr library to deal with solr-cloud. If something > > >> exist to faster the process, I would be glad to implement it in that > > >> library. > > >> Also, I have split the collection over multiple shard, and I admit > this > > >> faster the update process, but who knows ? > > >> > > >> Thoughts ? > > >> > > >> -- > > >> nicolas > > >> > > > > > > -- > > > nicolas > > > > -- > nicolas > -- -- Regards, *Paras Lehana* [65871] Software Programmer, Auto-Suggest, IndiaMART Intermesh Ltd. 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN - 201303 Mob.: +91-9560911996 Work: 01203916600 | Extn: *8173* -- IMPORTANT: NEVER share your IndiaMART OTP/ Password with anyone.