Re: Result merging takes too long

remi tassing Sun, 16 Mar 2014 00:50:22 -0700

>So I have to ask what the end goal is here.
In our case, the purpose of sharding was/is to speed up the process.
We've noticed that as the index size was growing, response speed kept going
down so we decided to split the index across 5 machines.


>Are your response times really in need of improvement or is this more
trying to understand the process?
Our response time went from 1second to 5+ seconds, so we thought we could
definitely do better with Solr(Cloud).

'start' and 'rows' are generally set to the default values (i.e., 0 and 10
respectively).

Any clues how to conduct further investigations?


On Sun, Mar 16, 2014 at 7:29 AM, Erick Erickson <[email protected]>wrote:

> I wouldn't expect the merge times to be significant
> at all, _assuming_ you're not doing something like
> setting a very high &start= parameter or returning
> a whole of rows.
>
> Now, it may be that you're sharding with too small
> a document set to really notice a difference.
> Sharding isn't really about speeding up responses,
> as it is being able to handle very large indexes.
>
> So I have to ask what the end goal is here. Are
> your response times really in need of improvement
> or is this more trying to understand the process?
>
> Best,
> Erick
>
> On Thu, Mar 13, 2014 at 1:19 AM, remi tassing <[email protected]>
> wrote:
> > Hi Erick,
> >
> > I've used the fl=id parameter to avoid retrieving the actual documents
> > (step <4> in your mail) but the problem still exists.
> > Any ideas on how to find the merging time(step <3>)?
> >
> > Remi
> >
> >
> > On Tue, Mar 11, 2014 at 7:29 PM, Erick Erickson <[email protected]
> >wrote:
> >
> >> In SolrCloud there are a couple of round trips
> >> that _may_ be what you're seeing.
> >>
> >> First, though, the QTime is the time spent
> >> querying, it does NOT include assembling
> >> the documents from disk for return etc., so
> >> bear that in mind....
> >>
> >> But here's the sequence as I understand it
> >> from the receiving node's viewpoint.
> >> 1> send the query out to one replica for
> >> each shard
> >> 2> get the top N doc IDs and scores (
> >> or whatever sorting criteria) from each
> >> shard.
> >> 3> Merge the lists and select the top N
> >> to return
> >> 4> request the actual documents for
> >> the top N list from each of the shards
> >> 5> return the list.
> >>
> >> So as you can see, there's an extra
> >> round trip to each shard to get the
> >> full document. Perhaps this is what
> >> you're seeing? <4> seems like it
> >> might be what you're seeing, I don't
> >> think it's counted in QTime.
> >>
> >> HTH
> >> Erick
> >>
> >> On Tue, Mar 11, 2014 at 3:17 AM, remi tassing <[email protected]>
> >> wrote:
> >> > Hi,
> >> >
> >> > I've just setup a SolrCloud with Tomcat. 5 Shards with one replication
> >> each
> >> > and total 10million docs (evenly distributed).
> >> >
> >> > I've noticed the query response time is faster than using one single
> node
> >> > but still not as fast as I expected.
> >> >
> >> > After turning debugQuery on, I noticed the query time is different to
> the
> >> > value returned in the debug explanation (see some excerpt below). More
> >> > importantly, while making a query to one, and only one, shard then the
> >> > result is consistent. It appears the server spends most of its time
> doing
> >> > result aggregation (merging).
> >> >
> >> > After searching on Google in vain I didn't find anything concrete
> except
> >> > that the problem could be in 'SearchComponent'.
> >> >
> >> > Could you point me in the right direction (e.g. configuration...)?
> >> >
> >> > Thanks!
> >> >
> >> > Remi
> >> >
> >> > Solr Cloud result:
> >> >
> >> > <lst name="responseHeader">
> >> >
> >> > <int name="status">0</int>
> >> >
> >> > <int name="QTime">3471</int>
> >> >
> >> > <lst name="params">
> >> >
> >> > <str name="debugQuery">on</str>
> >> >
> >> > <str name="q">project development agile</str>
> >> >
> >> > </lst>
> >> >
> >> > </lst>
> >> >
> >> > <result name="response" numFound="2762803" start="0"
> >> > maxScore="0.17022902">...</result>
> >> >
> >> > ...
> >> >
> >> >
> >> >
> >> > <lst name="timing">
> >> >
> >> > <double name="time">508.0</double>
> >> >
> >> > <lst name="prepare">
> >> >
> >> > <double name="time">8.0</double>
> >> >
> >> > <lst name="query">
> >> >
> >> > <double name="time">8.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="facet">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="mlt">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="highlight">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="stats">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="debug">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="process">
> >> >
> >> > <double name="time">499.0</double>
> >> >
> >> > <lst name="query">
> >> >
> >> > <double name="time">195.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="facet">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="mlt">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="highlight">
> >> >
> >> > <double name="time">228.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="stats">
> >> >
> >> > <double name="time">0.0</double>
> >> >
> >> > </lst>
> >> >
> >> > <lst name="debug">
> >> >
> >> > <double name="time">76.0</double>
> >> >
> >> > </lst>
> >> >
> >> > </lst>
> >> >
> >> > </lst>
> >>
>

Re: Result merging takes too long

Reply via email to