Hi Nicolas,

Have you tried playing with values of *IndexConfig*
<https://lucene.apache.org/solr/guide/6_6/indexconfig-in-solrconfig.html>
(merge factor, segment size, maxBufferedDocs, Merge Policies)? We, at
Auto-Suggest, also do atomic updates daily and specifically changing merge
factor gave us a boost of ~4x during indexing. At current configuration,
our core atomically updates ~423 documents per second.

On Sun, 20 Oct 2019 at 02:07, Nicolas Paris <nicolas.pa...@riseup.net>
wrote:

> > Maybe you need to give more details. I recommend always to try and
> > test yourself as you know your own solution best. What performance do
> > your use car needs and what is your current performance?
>
> I have 10 collections on 4 shards (no replications). The collections are
> quite large ranging from 2GB to 60 GB per shard. In every case, the
> update process only add several values to an indexed array field on a
> document subset of each collection. The proportion of the subset is from
> 0 to 100%, and 95% of time below 20%. The array field represents 1 over
> 20 fields which are mainly unstored fields with some large textual
> fields.
>
> The 4 solr instance collocate with the spark. Right now I tested with 40
> spark executors. Commit timing and commit number document are both set
> to 20000. Each shard has 20g of memory.
> Loading/replacing the largest collection is about 2 hours - which is
> quite fast I guess. Updating 5% percent of documents of each
> collections, is about half an hour.
>
> Because my need is "only" to append several values to an array I suspect
> there is some trick to make things faster.
>
>
>
> On Sat, Oct 19, 2019 at 10:10:36PM +0200, Jörn Franke wrote:
> > Maybe you need to give more details. I recommend always to try and test
> yourself as you know your own solution best. Depending on your spark
> process atomic updates  could be faster.
> >
> > With Spark-Solr additional complexity comes. You could have too many
> executors for your Solr instance(s), ie a too high parallelism.
> >
> > Probably the most important question is:
> > What performance do your use car needs and what is your current
> performance?
> >
> > Once this is clear further architecture aspects can be derived, such as
> number of spark executors, number of Solr instances, sharding, replication,
> commit timing etc.
> >
> > > Am 19.10.2019 um 21:52 schrieb Nicolas Paris <nicolas.pa...@riseup.net
> >:
> > >
> > > Hi community,
> > >
> > > Any advice to speed-up updates ?
> > > Is there any advice on commit, memory, docvalues, stored or any tips to
> > > faster things ?
> > >
> > > Thanks
> > >
> > >
> > >> On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote:
> > >> Hi
> > >>
> > >> I am looking for a way to faster the update of documents.
> > >>
> > >> In my context, the update replaces one of the many existing indexed
> > >> fields, and keep the others as is.
> > >>
> > >> Right now, I am building the whole document, and replacing the
> existing
> > >> one by id.
> > >>
> > >> I am wondering if **atomic update feature** would faster the process.
> > >>
> > >> From one hand, using this feature would save network because only a
> > >> small subset of the document would be send from the client to the
> > >> server.
> > >> On the other hand, the server will have to collect the values from the
> > >> disk and reindex them. In addition, this implies to store the values
> for
> > >> every fields (I am not storing every fields) and use more space.
> > >>
> > >> Also I have read about the ConcurrentUpdateSolrServer class might be
> an
> > >> optimized way of updating documents.
> > >>
> > >> I am using spark-solr library to deal with solr-cloud. If something
> > >> exist to faster the process, I would be glad to implement it in that
> > >> library.
> > >> Also, I have split the collection over multiple shard, and I admit
> this
> > >> faster the update process, but who knows ?
> > >>
> > >> Thoughts ?
> > >>
> > >> --
> > >> nicolas
> > >>
> > >
> > > --
> > > nicolas
> >
>
> --
> nicolas
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Reply via email to