Hi,

I don't know. But,  unless something outside Solr is a bottleneck, it may
be wise to see if you can speed up indexing. Maybe we can help here...

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Oct 1, 2013 9:29 AM, "Thomas Egense" <thomas.ege...@gmail.com> wrote:

> Hello everyone,
> I have a small challenge performance testing a SolrCloud setup. I have 10
> shards, and each shard is supposed to have index-size ~200GB. However I
> only have a single index of 200GB because it will take too long to build
> another index with different data,  and I hope to somehow use this index on
> all 10 shards and make it behave as all documents are different on each
> shard. So building more indexes from new data is not an option.
>
> Making a query to a SolrCloud is a two-phase operation. First all shards
> receive the query and return ID's and ranking. The merger will then remove
> duplicate ID's and then the full documents will be retreived.
>
> When I copy this index to all shards and make a request the following will
> happen: Phase one: All shards will receive the query and return ids+ranking
> (actually same set from all shards). This part is realistic enough.
> Phase two: ID's will be merged and retrieving the documents is not
> realistic as if they were spread out between shards (IO wise).
>
> Is there any way I can 'fake' this somehow and have shards return a
> prefixed_id for phase1 etc., which then also have to be undone when
> retriving the documents for phase2.  I have tried making the hack in
> org.apache.solr.handler.component.QueryComponent and a few other classes,
> but no success. (The resultset are always empty). I do not need to index
> any new documents, which would also be a challenge due to the ID
> hash-interval for the shards with this hack.
>
> Anyone has a good idea how to make this hack work?
>
> From,
> Thomas Egense
>

Reply via email to