Re: a bug of solr distributed search

2010-10-27 Thread Toke Eskildsen
On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote: > And a third potential reason - it's arguably a feature instead of a bug > for some applications. Depending on how I organize my shards, "give me > the most relevant document from each shard for this search" seems like > it could be useful. You

Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer
Andrzej Bialecki wrote: > On 2010-10-25 11:22, Toke Eskildsen wrote: >> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: >>> But itshows a problem of distrubted search without common idf. >>> A doc will get different score in different shard. >> Bingo. >> >> I really don't understand why this funda

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 13:37, Toke Eskildsen wrote: > On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: >> * there is an exact solution to this problem, namely to make two >> distributed calls instead of one (first call to collect per-shard IDFs >> for given query terms, second call to submit a que

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: > * there is an exact solution to this problem, namely to make two > distributed calls instead of one (first call to collect per-shard IDFs > for given query terms, second call to submit a query rewritten with the > global IDF-s). This solu

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 11:22, Toke Eskildsen wrote: > On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: >> But itshows a problem of distrubted search without common idf. >> A doc will get different score in different shard. > > Bingo. > > I really don't understand why this fundamental problem with sharding

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: > But itshows a problem of distrubted search without common idf. > A doc will get different score in different shard. Bingo. I really don't understand why this fundamental problem with sharding isn't mentioned more often. Every time the advice "use

Re: a bug of solr distributed search

2010-07-25 Thread MitchK
use doc_X from shard_A or >>> shard_B, since they will all have got the same scores. >> >> That only works if the docs are exactly the same - they may not be. >> >> -Yonik >> http://www.lucidimagination.com >> > > -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p995407.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-25 Thread Li Li
the solr version I used is 1.4 2010/7/26 Li Li : > where is the link of this patch? > > 2010/7/24 Yonik Seeley : >> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: >>> why do we do not send the output of TermsComponent of every node in the >>> cluster to a Hadoop instance? >>> Since TermsComponent

Re: a bug of solr distributed search

2010-07-25 Thread Li Li
where is the link of this patch? 2010/7/24 Yonik Seeley : > On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: >> why do we do not send the output of TermsComponent of every node in the >> cluster to a Hadoop instance? >> Since TermsComponent does the map-part of the map-reduce concept, Hadoop >> onl

Re: a bug of solr distributed search

2010-07-24 Thread MitchK
distributed IDF (like at the mentioned JIRA-issue) to normalize your results's scoring. But the mentioned problem at this mailing-list-posting has nothing to do with that... Regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-s

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:40 PM, MitchK wrote: > That only works if the docs are exactly the same - they may not be. > Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, > don't they? Documents aren't supposed to be duplicated across shards... so the presence of multiple

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html Sent fro

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
gt;> elements. I guess we could rebuild the current priority queue if we >> detect a duplicate, but that will have an obvious performance impact. >> Any other suggestions? >> >> -Yonik >> http://www.lucidimagination.com >> > -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: > why do we do not send the output of TermsComponent of every node in the > cluster to a Hadoop instance? > Since TermsComponent does the map-part of the map-reduce concept, Hadoop > only needs to reduce the stuff. Maybe we even do not need Hadoop for

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
> http://www.lucidimagination.com > -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-22 Thread Chris Hostetter
: As the comments suggest, it's not a bug, but just the best we can do : for now since our priority queues don't support removal of arbitrary FYI: I updated the DistributedSearch wiki to be more clear about this -- it previously didn't make it explicitly clear that docIds were suppose to be uni

Re: a bug of solr distributed search

2010-07-22 Thread Yonik Seeley
As the comments suggest, it's not a bug, but just the best we can do for now since our priority queues don't support removal of arbitrary elements. I guess we could rebuild the current priority queue if we detect a duplicate, but that will have an obvious performance impact. Any other suggestions?

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
case, Solr sees > the doc_X firstly at shard_A and ignores it at shard_B. That means, that the > doc maybe would occur at page 10 in pagination, although it *should* occur > at page 1 or 2. > > Kind regards, > - Mitch > -- > View this message in context: > http://lucene

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
sees the doc_X firstly at shard_A and ignores it at shard_B. That means, that the doc maybe would occur at page 10 in pagination, although it *should* occur at page 1 or 2. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search

Re: a bug of solr distributed search

2010-07-21 Thread Siva Kommuri
How about sorting over the score? Would that be possible? On Jul 21, 2010, at 12:13 AM, Li Li wrote: > in QueryComponent.mergeIds. It will remove document which has > duplicated uniqueKey with others. In current implementation, it use > the first encountered. > String prevShard = uniqueD

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
not sure, but I think you can't prevent this without custom coding or > making a document's occurence unique. > > Kind regards, > - Mitch > -- > View this message in context: > http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
que. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
, which may not be intended by the user. > > Kind regards, > - Mitch > -- > View this message in context: > http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html > Sent from the Solr - User mailing list archive at Nabble.com. >

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
Li Li, this is the intended behaviour, not a bug. Otherwise you could get back the same record in a response for several times, which may not be intended by the user. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search

a bug of solr distributed search

2010-07-21 Thread Li Li
in QueryComponent.mergeIds. It will remove document which has duplicated uniqueKey with others. In current implementation, it use the first encountered. String prevShard = uniqueDoc.put(id, srsp.getShard()); if (prevShard != null) { // duplicate detected