Re: a bug of solr distributed search

2010-10-27 Thread Toke Eskildsen
On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote: > And a third potential reason - it's arguably a feature instead of a bug > for some applications. Depending on how I organize my shards, "give me > the most relevant document from each shard for this search" seems like > it could be useful. You

Re: a bug of solr distributed search

2010-10-26 Thread Ron Mayer
Andrzej Bialecki wrote: > On 2010-10-25 11:22, Toke Eskildsen wrote: >> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: >>> But itshows a problem of distrubted search without common idf. >>> A doc will get different score in different shard. >> Bingo. >> >> I really don't understand why this funda

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 13:37, Toke Eskildsen wrote: > On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: >> * there is an exact solution to this problem, namely to make two >> distributed calls instead of one (first call to collect per-shard IDFs >> for given query terms, second call to submit a que

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote: > * there is an exact solution to this problem, namely to make two > distributed calls instead of one (first call to collect per-shard IDFs > for given query terms, second call to submit a query rewritten with the > global IDF-s). This solu

Re: a bug of solr distributed search

2010-10-25 Thread Andrzej Bialecki
On 2010-10-25 11:22, Toke Eskildsen wrote: > On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: >> But itshows a problem of distrubted search without common idf. >> A doc will get different score in different shard. > > Bingo. > > I really don't understand why this fundamental problem with sharding

Re: a bug of solr distributed search

2010-10-25 Thread Toke Eskildsen
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote: > But itshows a problem of distrubted search without common idf. > A doc will get different score in different shard. Bingo. I really don't understand why this fundamental problem with sharding isn't mentioned more often. Every time the advice "use

Re: a bug of solr distributed search

2010-07-25 Thread MitchK
Good morning, https://issues.apache.org/jira/browse/SOLR-1632 - Mitch Li Li wrote: > > where is the link of this patch? > > 2010/7/24 Yonik Seeley : >> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: >>> why do we do not send the output of TermsComponent of every node in the >>> cluster to a

Re: a bug of solr distributed search

2010-07-25 Thread Li Li
the solr version I used is 1.4 2010/7/26 Li Li : > where is the link of this patch? > > 2010/7/24 Yonik Seeley : >> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: >>> why do we do not send the output of TermsComponent of every node in the >>> cluster to a Hadoop instance? >>> Since TermsComponent

Re: a bug of solr distributed search

2010-07-25 Thread Li Li
where is the link of this patch? 2010/7/24 Yonik Seeley : > On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: >> why do we do not send the output of TermsComponent of every node in the >> cluster to a Hadoop instance? >> Since TermsComponent does the map-part of the map-reduce concept, Hadoop >> onl

Re: a bug of solr distributed search

2010-07-24 Thread MitchK
Okay, but than LiLi did something wrong, right? I mean, if the document exists only at one shard, it should get the same score whenever one requests it, no? Of course, this only applies if nothing gets changed between the requests. The only remaining problem here would be, that you need distribut

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:40 PM, MitchK wrote: > That only works if the docs are exactly the same - they may not be. > Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, > don't they? Documents aren't supposed to be duplicated across shards... so the presence of multiple

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
That only works if the docs are exactly the same - they may not be. Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same, don't they? -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html Sent from the So

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
... Additionally to my previous posting: To keep this sync we could do two things: Waiting for every server to make sure that everyone uses the same values to compute the score and than apply them. Or: Let's say that we collect the new values every 15 minutes. To merge and send them over the netwo

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote: > why do we do not send the output of TermsComponent of every node in the > cluster to a Hadoop instance? > Since TermsComponent does the map-part of the map-reduce concept, Hadoop > only needs to reduce the stuff. Maybe we even do not need Hadoop for

Re: a bug of solr distributed search

2010-07-23 Thread MitchK
Yonik, why do we do not send the output of TermsComponent of every node in the cluster to a Hadoop instance? Since TermsComponent does the map-part of the map-reduce concept, Hadoop only needs to reduce the stuff. Maybe we even do not need Hadoop for this. After reducing, every node in the cluste

Re: a bug of solr distributed search

2010-07-22 Thread Chris Hostetter
: As the comments suggest, it's not a bug, but just the best we can do : for now since our priority queues don't support removal of arbitrary FYI: I updated the DistributedSearch wiki to be more clear about this -- it previously didn't make it explicitly clear that docIds were suppose to be uni

Re: a bug of solr distributed search

2010-07-22 Thread Yonik Seeley
As the comments suggest, it's not a bug, but just the best we can do for now since our priority queues don't support removal of arbitrary elements. I guess we could rebuild the current priority queue if we detect a duplicate, but that will have an obvious performance impact. Any other suggestions?

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
I think what Siva mean is that when there are docs with the same url, leave the doc whose score is large. This is the right solution. But itshows a problem of distrubted search without common idf. A doc will get different score in different shard. 2010/7/22 MitchK : > > It already was sorted by sco

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
It already was sorted by score. The problem here is the following: Shard_A and shard_B contain doc_X and doc_X. If you are querying for something, doc_X could have a score of 1.0 at shard_A and a score of 12.0 at shard_B. You can never be sure which doc Solr sees first. In the bad case, Solr see

Re: a bug of solr distributed search

2010-07-21 Thread Siva Kommuri
How about sorting over the score? Would that be possible? On Jul 21, 2010, at 12:13 AM, Li Li wrote: > in QueryComponent.mergeIds. It will remove document which has > duplicated uniqueKey with others. In current implementation, it use > the first encountered. > String prevShard = uniqueD

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
I don't know much about the code. Maybe you can tell me to what file you are referring? However, from the comments one can see, that the problem is known but one decided to let it happen, because of System requirements in the Java version. - Mitch -- View this message in context: http://luce

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
yes. This will make user think our search engine has some bug. from the comments of the codes, it needs more things to do if (prevShard != null) { // For now, just always use the first encountered since we can't currently // remove the previous one added to the pri

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
Ah, okay. I understand your problem. Why should doc x be at position 1 when searching for the first time, and when I search for the 2nd time it occurs at position 8 - right? I am not sure, but I think you can't prevent this without custom coding or making a document's occurence unique. Kind rega

Re: a bug of solr distributed search

2010-07-21 Thread Li Li
But users will think there is something wrong with it when he/she search the same query but got different result. 2010/7/21 MitchK : > > Li Li, > > this is the intended behaviour, not a bug. > Otherwise you could get back the same record in a response for several > times, which may not be intended

Re: a bug of solr distributed search

2010-07-21 Thread MitchK
Li Li, this is the intended behaviour, not a bug. Otherwise you could get back the same record in a response for several times, which may not be intended by the user. Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp98