Re: Poor performance on distributed search

Erick Erickson Mon, 19 Dec 2011 08:20:41 -0800

Uhm, either I misunderstand your question or you're doing
a lot of extra work for nothing....


The whole point of sharding it exactly to collect the top N docs
from each shard and merge them into a single result. So if
you want 10 docs, just specify rows=10. Solr will query all
the shards, get the top 10 docs from each and then
merge them into a final list 10 items long. Both the initial
fetch and the final merge are based on the
sort criteria are respected.

Score is the default "sort". If you specify other sort criteria,
i.e. a field, then that sort is respected by the merge process.

So why do you have this 2,000 requirement in the first
place? This really sounds like an XY problem.


Best
Erick

On Mon, Dec 19, 2011 at 4:35 AM, ku3ia <[email protected]> wrote:
> Hi, Erick. Thanks for your advice.
>>>Here's another test. Add &debugQuery=on to your query and post the
> results.
> Here is for 2K rows:
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">53153</int>
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="fl">*,score</str>
> <str name="shards">
> 127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4
> </str>
> <str name="ident">true</str>
> <str name="start">0</str>
> <str name="q">(mainstreaming)</str>
> <str name="rows">2000</str>
> </lst>
> </lst>
> <result name="response" numFound="2305" start="0" maxScore="4.657284">
>>>>Here 2K docs<<<
> </result>
> <lst name="debug">
> <str name="rawquerystring">(mainstreaming)</str>
> <str name="querystring">(mainstreaming)</str>
> <str name="parsedquery">ArticleText:mainstream</str>
> <str name="parsedquery_toString">ArticleText:mainstream</str>
> <str name="QParser">LuceneQParser</str>
> <lst name="timing">
> <double name="time">67797.0</double>
> <lst name="prepare">
> <double name="time">73.0</double>
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">72.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> <lst name="process">
> <double name="time">67724.0</double>
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">66607.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">1115.0</double>
> </lst>
> </lst>
> </lst>
> <lst name="explain">
> ...
> </lst>
> </lst>
> </response>
>
> And this is for 10:
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">3626</int>
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="fl">*,score</str>
> <str name="shards">
> 127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4
> </str>
> <str name="ident">true</str>
> <str name="start">0</str>
> <str name="q">(mainstreaming)</str>
> <str name="rows">10</str>
> </lst>
> </lst>
> <result name="response" numFound="2305" start="0" maxScore="4.657284">
>>>>Here 10 docs<<<
> </result>
> <lst name="debug">
> <str name="rawquerystring">(mainstreaming)</str>
> <str name="querystring">(mainstreaming)</str>
> <str name="parsedquery">ArticleText:mainstream</str>
> <str name="parsedquery_toString">ArticleText:mainstream</str>
> <str name="QParser">LuceneQParser</str>
> <lst name="timing">
> <double name="time">566.0</double>
> <lst name="prepare">
> <double name="time">17.0</double>
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">17.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> <lst name="process">
> <double name="time">549.0</double>
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">353.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">196.0</double>
> </lst>
> </lst>
> </lst>
> <lst name="explain">
> ...
> </lst>
> </lst>
> </response>
>
>>>Also, I really have a hard time seeing what advantage you get from
>>>putting all those shards on the same machine, you're just creating
>>>extra work.
> Yeah, on my production I have 5 servers and 6 shards (big shards) on each.
> But I tried to use only one shard for each server (summary five shards) but
> results wasn't fine.
>
>>>Although there's one other possibility: By returning 2,000 rows, you
>>>require that each shard assemble a list of the top 2,000 documents
>>>and then they are collated into a single packet, so you're asking
>>>the system to do a lot of list processing.
> So, as I understand, my main problem is to get 2000 rows from each shard?
>
> P.S. Is any mechanism, for example, to get top 100 rows from each shard,
> only merge it, sort by defined at query filed or score and pull result to
> the user?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p3597893.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Poor performance on distributed search

Reply via email to