On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote:
> And a third potential reason - it's arguably a feature instead of a bug
> for some applications. Depending on how I organize my shards, "give me
> the most relevant document from each shard for this search" seems like
> it could be useful.
You
Andrzej Bialecki wrote:
> On 2010-10-25 11:22, Toke Eskildsen wrote:
>> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
>>> But itshows a problem of distrubted search without common idf.
>>> A doc will get different score in different shard.
>> Bingo.
>>
>> I really don't understand why this funda
On 2010-10-25 13:37, Toke Eskildsen wrote:
> On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
>> * there is an exact solution to this problem, namely to make two
>> distributed calls instead of one (first call to collect per-shard IDFs
>> for given query terms, second call to submit a que
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
> * there is an exact solution to this problem, namely to make two
> distributed calls instead of one (first call to collect per-shard IDFs
> for given query terms, second call to submit a query rewritten with the
> global IDF-s). This solu
On 2010-10-25 11:22, Toke Eskildsen wrote:
> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
>> But itshows a problem of distrubted search without common idf.
>> A doc will get different score in different shard.
>
> Bingo.
>
> I really don't understand why this fundamental problem with sharding
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
> But itshows a problem of distrubted search without common idf.
> A doc will get different score in different shard.
Bingo.
I really don't understand why this fundamental problem with sharding
isn't mentioned more often. Every time the advice "use
Good morning,
https://issues.apache.org/jira/browse/SOLR-1632
- Mitch
Li Li wrote:
>
> where is the link of this patch?
>
> 2010/7/24 Yonik Seeley :
>> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>>> why do we do not send the output of TermsComponent of every node in the
>>> cluster to a
the solr version I used is 1.4
2010/7/26 Li Li :
> where is the link of this patch?
>
> 2010/7/24 Yonik Seeley :
>> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>>> why do we do not send the output of TermsComponent of every node in the
>>> cluster to a Hadoop instance?
>>> Since TermsComponent
where is the link of this patch?
2010/7/24 Yonik Seeley :
> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>> why do we do not send the output of TermsComponent of every node in the
>> cluster to a Hadoop instance?
>> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
>> onl
Okay, but than LiLi did something wrong, right?
I mean, if the document exists only at one shard, it should get the same
score whenever one requests it, no?
Of course, this only applies if nothing gets changed between the requests.
The only remaining problem here would be, that you need distribut
On Fri, Jul 23, 2010 at 2:40 PM, MitchK wrote:
> That only works if the docs are exactly the same - they may not be.
> Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
> don't they?
Documents aren't supposed to be duplicated across shards... so the
presence of multiple
That only works if the docs are exactly the same - they may not be.
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent from the So
... Additionally to my previous posting:
To keep this sync we could do two things:
Waiting for every server to make sure that everyone uses the same values to
compute the score and than apply them.
Or: Let's say that we collect the new values every 15 minutes. To merge and
send them over the netwo
On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
> why do we do not send the output of TermsComponent of every node in the
> cluster to a Hadoop instance?
> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
> only needs to reduce the stuff. Maybe we even do not need Hadoop for
Yonik,
why do we do not send the output of TermsComponent of every node in the
cluster to a Hadoop instance?
Since TermsComponent does the map-part of the map-reduce concept, Hadoop
only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
After reducing, every node in the cluste
: As the comments suggest, it's not a bug, but just the best we can do
: for now since our priority queues don't support removal of arbitrary
FYI: I updated the DistributedSearch wiki to be more clear about this --
it previously didn't make it explicitly clear that docIds were suppose to
be uni
As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements. I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other suggestions?
I think what Siva mean is that when there are docs with the same url,
leave the doc whose score is large.
This is the right solution.
But itshows a problem of distrubted search without common idf. A doc
will get different score in different shard.
2010/7/22 MitchK :
>
> It already was sorted by sco
It already was sorted by score.
The problem here is the following:
Shard_A and shard_B contain doc_X and doc_X.
If you are querying for something, doc_X could have a score of 1.0 at
shard_A and a score of 12.0 at shard_B.
You can never be sure which doc Solr sees first. In the bad case, Solr see
How about sorting over the score? Would that be possible?
On Jul 21, 2010, at 12:13 AM, Li Li wrote:
> in QueryComponent.mergeIds. It will remove document which has
> duplicated uniqueKey with others. In current implementation, it use
> the first encountered.
> String prevShard = uniqueD
I don't know much about the code.
Maybe you can tell me to what file you are referring?
However, from the comments one can see, that the problem is known but one
decided to let it happen, because of System requirements in the Java
version.
- Mitch
--
View this message in context:
http://luce
yes. This will make user think our search engine has some bug.
from the comments of the codes, it needs more things to do
if (prevShard != null) {
// For now, just always use the first encountered since we
can't currently
// remove the previous one added to the pri
Ah, okay. I understand your problem. Why should doc x be at position 1 when
searching for the first time, and when I search for the 2nd time it occurs
at position 8 - right?
I am not sure, but I think you can't prevent this without custom coding or
making a document's occurence unique.
Kind rega
But users will think there is something wrong with it when he/she
search the same query but got different result.
2010/7/21 MitchK :
>
> Li Li,
>
> this is the intended behaviour, not a bug.
> Otherwise you could get back the same record in a response for several
> times, which may not be intended
Li Li,
this is the intended behaviour, not a bug.
Otherwise you could get back the same record in a response for several
times, which may not be intended by the user.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp98
25 matches
Mail list logo