Any ideas?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p4106968.html
Sent from the Solr - User mailing list archive at Nabble.com.
Yonik Seeley-2-2 wrote
> On Wed, Dec 28, 2011 at 5:47 AM, ku3ia <
> demesg@
> > wrote:
>> So, based on p.2) and on my previous researches, I conclude, that the
>> more
>> documents I want to retrieve, the slower is search and main problem is
>> the
>> cycle in writeDocs method. Am I right? Can yo
On Wed, Dec 28, 2011 at 5:47 AM, ku3ia wrote:
> So, based on p.2) and on my previous researches, I conclude, that the more
> documents I want to retrieve, the slower is search and main problem is the
> cycle in writeDocs method. Am I right? Can you advice something in this
> situation?
For the fi
Hi all.
Due to my code review, I discovered next things:
1) as I wrote before, seems there is a low disk read speed;
2) at ~/solr-3.5/solr/core/src/java/org/apache/solr/response/XMLWriter.java
and in the same classes there is a writeDocList => writeDocs method, which
contains a cycle for of all doc
Hi!
Today I'd added loginfo to Solr here:
~/solr-3.5/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java
to method
private void writeResponse(SolrQueryResponse solrRsp, ServletResponse
response,
QueryResponseWriter responseWriter,
SolrQueryRequest solr
: I had a similar requirement in my project, where a user might ask for up
: to 3000 results. What I did was change SolrIndexSearcher.doc(int, Set)
: to retrieve the unique key from the field cache instead of retrieving it
: as a stored field from disk. This resulted in a massive speed
: impro
: So why do you have this 2,000 requirement in the first
: place? This really sounds like an XY problem.
I would really suggest re-visiting this question. No sinle user is going
to look at 2000 docs on a single page, and in your previous email you said
there was a requirement to ask solr for 2
: For example I have 4 shards. Finally, I need 2000 docs. Now, when I'm using
:
&shards=127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4
: Solr gets 2000 docs from each shard (shard1,2,3,4, summary we have 8000
: docs) merge and sort it,
tomas.zerolo wrote
>
> But then the results would be wrong? Suppose the documents are not evenly
> distributed (wrt the sort criterium) across all the shards. In an extreme
> case, just imagine all 2000 top-most documents are on shard 3. You would
> get
> the 500 top-most (from shard 3) and some
On Mon, Dec 19, 2011 at 01:32:22PM -0800, ku3ia wrote:
> >>Uhm, either I misunderstand your question or you're doing
> >>a lot of extra work for nothing
>
> >>The whole point of sharding it exactly to collect the top N docs
> >>from each shard and merge them into a single result [...]
> >>
project2501 wrote
>
> I see what you are asking. This is an interesting question. It seems
> inefficient for Solr to apply the
> requested rows to all shards only to discard most of the results on merge.
> That would consume lots of resources not used in the final result set.
>
Yeah, like Erick
I had a similar requirement in my project, where a user might ask for up to
3000 results. What I did was change SolrIndexSearcher.doc(int, Set) to retrieve
the unique key from the field cache instead of retrieving it as a stored field
from disk. This resulted in a massive speed improvement for t
I see what you are asking. This is an interesting question. It seems
inefficient for Solr to apply the
requested rows to all shards only to discard most of the results on merge.
That would consume lots of resources not used in the final result set.
On 12/19/2011 04:32 PM, ku3ia wrote:
Uhm, eith
>>Uhm, either I misunderstand your question or you're doing
>>a lot of extra work for nothing
>>The whole point of sharding it exactly to collect the top N docs
>>from each shard and merge them into a single result. So if
>>you want 10 docs, just specify rows=10. Solr will query all
>>the
Uhm, either I misunderstand your question or you're doing
a lot of extra work for nothing
The whole point of sharding it exactly to collect the top N docs
from each shard and merge them into a single result. So if
you want 10 docs, just specify rows=10. Solr will query all
the shards, get the
Hi, Erick. Thanks for your advice.
>>Here's another test. Add &debugQuery=on to your query and post the
results.
Here is for 2K rows:
0
53153
on
*,score
127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4
true
0
(mainstreaming)
2000
>>
Here's another test. Add &debugQuery=on to your query and post the results.
I believe you'll find that the QTime parameter returned in the packet
is quite small. That is the amount of time spent in the query, NOT
the time spent reading the docs form disk to return. And if that number
remains small
Hi, Erick!
>>Right, are you falling afoul of the recursive shard thing? That is,
>>if you shards point back to itself. As far as I understand, your
>>shards parameter in your request handler shouldn't point back
>>to itself
No, my request handler don't points itself cause default is false.
Right, are you falling afoul of the recursive shard thing? That is,
if you shards point back to itself. As far as I understand, your
shards parameter in your request handler shouldn't point back
to itself
But I'm guessing here.
Best
Erick
On Fri, Dec 16, 2011 at 4:27 PM, ku3ia wrote:
>>> OK
>> OK, so your speed differences are pretty much dependent upon whether you
specify
>> rows=2000 or rows=10, right? Why do you need 2,000 rows?
Yes, big difference is 10 v. 2K records. Limit of 2K rows is setted by
manager and I can't decrease it. It is a minimum row count needed to process
data.
OK, so your speed differences are pretty much dependent upon whether you specify
rows=2000 or rows=10, right? Why do you need 2,000 rows?
Or is the root question why there's such a difference when you specify
qt=requestShards? In which case I'm curious to see that request
handler definition...
Be
Hi, Erick, thanks for your reply
Yeah, you are right - document cache is default, but I tried to decrease and
increase values but I didn't get the desired result.
I tried the tests. Here are results:
>>1> try with "&rows=10"
successfully started at 19:48:34
Queries interval is: 10 queries per mi
The thing that jumps out at me is "&rows=2000". If you documentCache in
solrconfig.xml is still the defaults, it only holds 512. So you're running
all over your disk gathering up the fields to return, especially since
you also specified "fl=*,score". And if you have large fields stored, you're
doin
23 matches
Mail list logo