Re: Poor performance on distributed search

2013-12-16 Thread ku3ia
Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p4106968.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Poor performance on distributed search

2013-12-16 Thread ku3ia
Yonik Seeley-2-2 wrote > On Wed, Dec 28, 2011 at 5:47 AM, ku3ia < > demesg@ > > wrote: >> So, based on p.2) and on my previous researches, I conclude, that the >> more >> documents I want to retrieve, the slower is search and main problem is >> the >> cycle in writeDocs method. Am I right? Can yo

Re: Poor performance on distributed search

2011-12-28 Thread Yonik Seeley
On Wed, Dec 28, 2011 at 5:47 AM, ku3ia wrote: > So, based on p.2) and on my previous researches, I conclude, that the more > documents I want to retrieve, the slower is search and main problem is the > cycle in writeDocs method. Am I right? Can you advice something in this > situation? For the fi

RE: Poor performance on distributed search

2011-12-28 Thread ku3ia
Hi all. Due to my code review, I discovered next things: 1) as I wrote before, seems there is a low disk read speed; 2) at ~/solr-3.5/solr/core/src/java/org/apache/solr/response/XMLWriter.java and in the same classes there is a writeDocList => writeDocs method, which contains a cycle for of all doc

RE: Poor performance on distributed search

2011-12-21 Thread ku3ia
Hi! Today I'd added loginfo to Solr here: ~/solr-3.5/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java to method private void writeResponse(SolrQueryResponse solrRsp, ServletResponse response, QueryResponseWriter responseWriter, SolrQueryRequest solr

RE: Poor performance on distributed search

2011-12-20 Thread Chris Hostetter
: I had a similar requirement in my project, where a user might ask for up : to 3000 results. What I did was change SolrIndexSearcher.doc(int, Set) : to retrieve the unique key from the field cache instead of retrieving it : as a stored field from disk. This resulted in a massive speed : impro

Re: Poor performance on distributed search

2011-12-20 Thread Chris Hostetter
: So why do you have this 2,000 requirement in the first : place? This really sounds like an XY problem. I would really suggest re-visiting this question. No sinle user is going to look at 2000 docs on a single page, and in your previous email you said there was a requirement to ask solr for 2

Re: Poor performance on distributed search

2011-12-20 Thread Chris Hostetter
: For example I have 4 shards. Finally, I need 2000 docs. Now, when I'm using : &shards=127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4 : Solr gets 2000 docs from each shard (shard1,2,3,4, summary we have 8000 : docs) merge and sort it,

Re: Poor performance on distributed search

2011-12-20 Thread ku3ia
tomas.zerolo wrote > > But then the results would be wrong? Suppose the documents are not evenly > distributed (wrt the sort criterium) across all the shards. In an extreme > case, just imagine all 2000 top-most documents are on shard 3. You would > get > the 500 top-most (from shard 3) and some

Re: Poor performance on distributed search

2011-12-20 Thread Tomas Zerolo
On Mon, Dec 19, 2011 at 01:32:22PM -0800, ku3ia wrote: > >>Uhm, either I misunderstand your question or you're doing > >>a lot of extra work for nothing > > >>The whole point of sharding it exactly to collect the top N docs > >>from each shard and merge them into a single result [...] > >>

RE: Poor performance on distributed search

2011-12-19 Thread ku3ia
project2501 wrote > > I see what you are asking. This is an interesting question. It seems > inefficient for Solr to apply the > requested rows to all shards only to discard most of the results on merge. > That would consume lots of resources not used in the final result set. > Yeah, like Erick

RE: Poor performance on distributed search

2011-12-19 Thread Michael Ryan
I had a similar requirement in my project, where a user might ask for up to 3000 results. What I did was change SolrIndexSearcher.doc(int, Set) to retrieve the unique key from the field cache instead of retrieving it as a stored field from disk. This resulted in a massive speed improvement for t

Re: Poor performance on distributed search

2011-12-19 Thread Darren Govoni
I see what you are asking. This is an interesting question. It seems inefficient for Solr to apply the requested rows to all shards only to discard most of the results on merge. That would consume lots of resources not used in the final result set. On 12/19/2011 04:32 PM, ku3ia wrote: Uhm, eith

Re: Poor performance on distributed search

2011-12-19 Thread ku3ia
>>Uhm, either I misunderstand your question or you're doing >>a lot of extra work for nothing >>The whole point of sharding it exactly to collect the top N docs >>from each shard and merge them into a single result. So if >>you want 10 docs, just specify rows=10. Solr will query all >>the

Re: Poor performance on distributed search

2011-12-19 Thread Erick Erickson
Uhm, either I misunderstand your question or you're doing a lot of extra work for nothing The whole point of sharding it exactly to collect the top N docs from each shard and merge them into a single result. So if you want 10 docs, just specify rows=10. Solr will query all the shards, get the

Re: Poor performance on distributed search

2011-12-19 Thread ku3ia
Hi, Erick. Thanks for your advice. >>Here's another test. Add &debugQuery=on to your query and post the results. Here is for 2K rows: 0 53153 on *,score 127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4 true 0 (mainstreaming) 2000 >>

Re: Poor performance on distributed search

2011-12-18 Thread Erick Erickson
Here's another test. Add &debugQuery=on to your query and post the results. I believe you'll find that the QTime parameter returned in the packet is quite small. That is the amount of time spent in the query, NOT the time spent reading the docs form disk to return. And if that number remains small

Re: Poor performance on distributed search

2011-12-17 Thread ku3ia
Hi, Erick! >>Right, are you falling afoul of the recursive shard thing? That is, >>if you shards point back to itself. As far as I understand, your >>shards parameter in your request handler shouldn't point back >>to itself No, my request handler don't points itself cause default is false.

Re: Poor performance on distributed search

2011-12-16 Thread Erick Erickson
Right, are you falling afoul of the recursive shard thing? That is, if you shards point back to itself. As far as I understand, your shards parameter in your request handler shouldn't point back to itself But I'm guessing here. Best Erick On Fri, Dec 16, 2011 at 4:27 PM, ku3ia wrote: >>> OK

Re: Poor performance on distributed search

2011-12-16 Thread ku3ia
>> OK, so your speed differences are pretty much dependent upon whether you specify >> rows=2000 or rows=10, right? Why do you need 2,000 rows? Yes, big difference is 10 v. 2K records. Limit of 2K rows is setted by manager and I can't decrease it. It is a minimum row count needed to process data.

Re: Poor performance on distributed search

2011-12-16 Thread Erick Erickson
OK, so your speed differences are pretty much dependent upon whether you specify rows=2000 or rows=10, right? Why do you need 2,000 rows? Or is the root question why there's such a difference when you specify qt=requestShards? In which case I'm curious to see that request handler definition... Be

Re: Poor performance on distributed search

2011-12-16 Thread ku3ia
Hi, Erick, thanks for your reply Yeah, you are right - document cache is default, but I tried to decrease and increase values but I didn't get the desired result. I tried the tests. Here are results: >>1> try with "&rows=10" successfully started at 19:48:34 Queries interval is: 10 queries per mi

Re: Poor performance on distributed search

2011-12-16 Thread Erick Erickson
The thing that jumps out at me is "&rows=2000". If you documentCache in solrconfig.xml is still the defaults, it only holds 512. So you're running all over your disk gathering up the fields to return, especially since you also specified "fl=*,score". And if you have large fields stored, you're doin