Hi, I don't know. But, unless something outside Solr is a bottleneck, it may be wise to see if you can speed up indexing. Maybe we can help here...
Otis Solr & ElasticSearch Support http://sematext.com/ On Oct 1, 2013 9:29 AM, "Thomas Egense" <thomas.ege...@gmail.com> wrote: > Hello everyone, > I have a small challenge performance testing a SolrCloud setup. I have 10 > shards, and each shard is supposed to have index-size ~200GB. However I > only have a single index of 200GB because it will take too long to build > another index with different data, and I hope to somehow use this index on > all 10 shards and make it behave as all documents are different on each > shard. So building more indexes from new data is not an option. > > Making a query to a SolrCloud is a two-phase operation. First all shards > receive the query and return ID's and ranking. The merger will then remove > duplicate ID's and then the full documents will be retreived. > > When I copy this index to all shards and make a request the following will > happen: Phase one: All shards will receive the query and return ids+ranking > (actually same set from all shards). This part is realistic enough. > Phase two: ID's will be merged and retrieving the documents is not > realistic as if they were spread out between shards (IO wise). > > Is there any way I can 'fake' this somehow and have shards return a > prefixed_id for phase1 etc., which then also have to be undone when > retriving the documents for phase2. I have tried making the hack in > org.apache.solr.handler.component.QueryComponent and a few other classes, > but no success. (The resultset are always empty). I do not need to index > any new documents, which would also be a challenge due to the ID > hash-interval for the shards with this hack. > > Anyone has a good idea how to make this hack work? > > From, > Thomas Egense >