Re: distributed search is significantly slower than direct search

Manuel Le Normand Wed, 13 Nov 2013 15:31:44 -0800

It's surprising such a query takes a long time, I would assume that after
trying consistently q=*:* you should be getting cache hits and times should
be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were
indexed (returing the first [docid]), so seems all this time is wasted on
transfer. Out of these 7 secs how much is spent on the above method? What
do you return by default? How big is every doc you display in your results?
Might be the matter that both collections work on the same ressources. Try
elaborating your use-case.

Anyway, it seems like you just made a test to see what will be the
performance hit in a distributed environment so I'll try to explain some
things we encountered in our benchmarks, with a case that has at least the
similarity of the num of docs fetched.

We reclaim 2000 docs every query, running over 40 shards. This means every
shard is actually transfering to our frontend 2000 docs every
document-match request (the first you were referring to). Even if lazily
loaded, reading 2000 id's (on 40 servers) and lazy loading the fields is a
tough job. Waiting for the slowest shard to respond, then sorting the docs
and reloading (lazy or not) the top 2000 docs might take a long time.

Our times are 4-8 secs, but do it's not possible comparing cases. We've
done few steps that improved it along the way, steps that led to others.
These were our starters:

   1. Profile these queries from different servers and solr instances, try
   putting your finger what collection is working hard and why. Check if
   you're stuck on components that don't have an added value for you but are
   used by default.
   2. Consider eliminating the doc cache. It loads lots of (partly) lazy
   documents that their probability of secondary usage is low. There's no such
   thing "popular docs" when requesting so many docs. You may be using your
   memory in a better way.
   3. Bottleneck check - inner server metrics as cpu user / iowait, packets
   transferred over the network, page faults etc. are excellent in order to
   understand if the disk/network/cpu is slowing you down. Then upgrade
   hardware in one of the shards to check if it helps by looking at the
   upgraded shard qTime compared to other.
   4. Warm up the index after commiting - try to benchmark how do queries
   performs before and after some warm-up, let's say some few hundreds of
   queries (from your previous system) in order to warm up the os cache
   (assuming your using NRTDirectoryFactory)

Good luck,
Manu

On Wed, Nov 13, 2013 at 2:38 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> One thing you can try, and this is more diagnostic than a cure, is return
> just
> the id field (and insure that lazy field loading is true). That'll tell you
> whether
> the issue is actually fetching the document off disk and decompressing,
> although
> frankly that's unlikely since you can get your 5,000 rows from a single
> machine
> quickly.
>
> The code you found where Solr is spending its time, is that on the
> "routing" core
> or on the shards? I actually have a hard time understanding how that
> code could take a long time, doesn't seem right.
>
> You are transferring 5,000 docs across the network, so it's possible that
> your network is just slow, that's certainly a difference between the local
> and remote case, but that's a stab in the dark.
>
> Not much help I know,
> Erick
>
>
>
> On Wed, Nov 13, 2013 at 2:52 AM, Elran Dvir <elr...@checkpoint.com> wrote:
>
> > Erick, Thanks for your response.
> >
> > We are upgrading our system using Solr.
> > We need to preserve old functionality.  Our client displays 5K document
> > and groups them.
> >
> > Is there a way to refactor code in order to improve distributed documents
> > fetching?
> >
> > Thanks.
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerick...@gmail.com]
> > Sent: Wednesday, October 30, 2013 3:17 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: distributed search is significantly slower than direct
> search
> >
> > You can't. There will inevitably be some overhead in the distributed
> case.
> > That said, 7 seconds is quite long.
> >
> > 5,000 rows is excessive, and probably where your issue is. You're having
> > to go out and fetch the docs across the wire. Perhaps there is some
> > batching that could be done there, I don't know whether this is one
> > document per request or not.
> >
> > Why 5K docs?
> >
> > Best,
> > Erick
> >
> >
> > On Tue, Oct 29, 2013 at 2:54 AM, Elran Dvir <elr...@checkpoint.com>
> wrote:
> >
> > > Hi all,
> > >
> > > I am using Solr 4.4 with multi cores. One core (called template) is my
> > > "routing" core.
> > >
> > > When I run
> > > http://127.0.0.1:8983/solr/template/select?rows=5000&q=*:*&shards=127.
> > > 0.0.1:8983/solr/core1,
> > > it consistently takes about 7s.
> > > When I run http://127.0.0.1:8983/solr/core1/select?rows=5000&q=*:*, it
> > > consistently takes about 40ms.
> > >
> > > I profiled the distributed query.
> > > This is the distributed query process (I hope the terms are accurate):
> > > When solr identifies a distributed query, it sends the query to the
> > > shard and get matched shard docs.
> > > Then it sends another query to the shard to get the Solr documents.
> > > Most time is spent in the last stage in the function "process" of
> > > "QueryComponent" in:
> > >
> > > for (int i=0; i<idArr.size(); i++) {
> > >         int id = req.getSearcher().getFirstMatch(
> > >                 new Term(idField.getName(),
> > > idField.getType().toInternal(idArr.get(i))));
> > >
> > > How can I make my distributed query as fast as the direct one?
> > >
> > > Thanks.
> > >
> >
> >
> > Email secured by Check Point
> >
>

Re: distributed search is significantly slower than direct search

Reply via email to