Re: Solr maximum Optimal Index Size per Shard

Vineet Mishra Fri, 06 Jun 2014 05:58:25 -0700

Earlier I used to index with HtttpPost Mechanism only, making each post
size specific to 2Mb to 20Mb that was going fine, but we had a suspect that
instead of indexing through network call(which ofcourse results in latency
due to network delays and http protocol) if we can index Offline by just
writing the index and dumping it to Shards it would be much better.


Although I am doing commit with a batch of 25K docs which I will try to
replace with CommitWithin(seems it works faster) or probably have a look at
this Binary Prot.

Thanks!




On Fri, Jun 6, 2014 at 5:55 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> On Fri, 2014-06-06 at 14:05 +0200, Vineet Mishra wrote:
>
> > Could you state what indexing mechanism are you using, as I started
> > with EmbeddedSolrServer but it was pretty slow after a few GB(~30+) of
> > indexing.
>
> I suspect that is due to too-frequent commits, too small heap or
> something third, unrelated to EmbeddedSolrServer itself. Underneath the
> surface it is just the same as a standalone Solr.
>
> We're building our ~1TB indexes individually, using standalone workers
> for the heavy part of the analysis (Tika). The delivery from the workers
> to the Solr server is over the network, using the Solr binary protocol.
> My colleague Thomas Egense just created a small write-up at
> https://github.com/netarchivesuite/netsearch
>
> >  I started indexing 1 week back and still its 37GB, although I assume
> > HttpPost mechanism will perform lethargic slow due to network latency
> > and for the response await.
>
> Maybe if you send the documents one at a time, but if you bundle them in
> larger updates, the post-method should be fine.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: Solr maximum Optimal Index Size per Shard

Reply via email to