> We, at Auto-Suggest, also do atomic updates daily and specifically > changing merge factor gave us a boost of ~4x
Interesting. What kind of change exactly on the merge factor side ? > At current configuration, our core atomically updates ~423 documents > per second. Would you say atomical update is faster than regular replacement of documents ? (considering my first thought on this below) > > I am wondering if **atomic update feature** would faster the process. > > From one hand, using this feature would save network because only a > > small subset of the document would be send from the client to the > > server. > > On the other hand, the server will have to collect the values from the > > disk and reindex them. In addition, this implies to store the values > > every fields (I am not storing every fields) and use more space. Thanks Paras On Tue, Oct 22, 2019 at 01:00:10PM +0530, Paras Lehana wrote: > Hi Nicolas, > > Have you tried playing with values of *IndexConfig* > <https://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html> > (merge factor, segment size, maxBufferedDocs, Merge Policies)? We, at > Auto-Suggest, also do atomic updates daily and specifically changing merge > factor gave us a boost of ~4x during indexing. At current configuration, > our core atomically updates ~423 documents per second. > > On Sun, 20 Oct 2019 at 02:07, Nicolas Paris <nicolas.pa...@riseup.net> > wrote: > > > > Maybe you need to give more details. I recommend always to try and > > > test yourself as you know your own solution best. What performance do > > > your use car needs and what is your current performance? > > > > I have 10 collections on 4 shards (no replications). The collections are > > quite large ranging from 2GB to 60 GB per shard. In every case, the > > update process only add several values to an indexed array field on a > > document subset of each collection. The proportion of the subset is from > > 0 to 100%, and 95% of time below 20%. The array field represents 1 over > > 20 fields which are mainly unstored fields with some large textual > > fields. > > > > The 4 solr instance collocate with the spark. Right now I tested with 40 > > spark executors. Commit timing and commit number document are both set > > to 20000. Each shard has 20g of memory. > > Loading/replacing the largest collection is about 2 hours - which is > > quite fast I guess. Updating 5% percent of documents of each > > collections, is about half an hour. > > > > Because my need is "only" to append several values to an array I suspect > > there is some trick to make things faster. > > > > > > > > On Sat, Oct 19, 2019 at 10:10:36PM +0200, Jörn Franke wrote: > > > Maybe you need to give more details. I recommend always to try and test > > yourself as you know your own solution best. Depending on your spark > > process atomic updates could be faster. > > > > > > With Spark-Solr additional complexity comes. You could have too many > > executors for your Solr instance(s), ie a too high parallelism. > > > > > > Probably the most important question is: > > > What performance do your use car needs and what is your current > > performance? > > > > > > Once this is clear further architecture aspects can be derived, such as > > number of spark executors, number of Solr instances, sharding, replication, > > commit timing etc. > > > > > > > Am 19.10.2019 um 21:52 schrieb Nicolas Paris <nicolas.pa...@riseup.net > > >: > > > > > > > > Hi community, > > > > > > > > Any advice to speed-up updates ? > > > > Is there any advice on commit, memory, docvalues, stored or any tips to > > > > faster things ? > > > > > > > > Thanks > > > > > > > > > > > >> On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote: > > > >> Hi > > > >> > > > >> I am looking for a way to faster the update of documents. > > > >> > > > >> In my context, the update replaces one of the many existing indexed > > > >> fields, and keep the others as is. > > > >> > > > >> Right now, I am building the whole document, and replacing the > > existing > > > >> one by id. > > > >> > > > >> I am wondering if **atomic update feature** would faster the process. > > > >> > > > >> From one hand, using this feature would save network because only a > > > >> small subset of the document would be send from the client to the > > > >> server. > > > >> On the other hand, the server will have to collect the values from the > > > >> disk and reindex them. In addition, this implies to store the values > > for > > > >> every fields (I am not storing every fields) and use more space. > > > >> > > > >> Also I have read about the ConcurrentUpdateSolrServer class might be > > an > > > >> optimized way of updating documents. > > > >> > > > >> I am using spark-solr library to deal with solr-cloud. If something > > > >> exist to faster the process, I would be glad to implement it in that > > > >> library. > > > >> Also, I have split the collection over multiple shard, and I admit > > this > > > >> faster the update process, but who knows ? > > > >> > > > >> Thoughts ? > > > >> > > > >> -- > > > >> nicolas > > > >> > > > > > > > > -- > > > > nicolas > > > > > > > -- > > nicolas > > > > > -- > -- > Regards, > > *Paras Lehana* [65871] > Software Programmer, Auto-Suggest, > IndiaMART Intermesh Ltd. > > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > Noida, UP, IN - 201303 > > Mob.: +91-9560911996 > Work: 01203916600 | Extn: *8173* > > -- > IMPORTANT: > NEVER share your IndiaMART OTP/ Password with anyone. -- nicolas