On Tue, 2010-10-26 at 15:48 +0200, Ron Mayer wrote:
> And a third potential reason - it's arguably a feature instead of a bug
> for some applications. Depending on how I organize my shards, "give me
> the most relevant document from each shard for this search" seems like
> it could be useful.
You
Andrzej Bialecki wrote:
> On 2010-10-25 11:22, Toke Eskildsen wrote:
>> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
>>> But itshows a problem of distrubted search without common idf.
>>> A doc will get different score in different shard.
>> Bingo.
>>
>> I really don't understand why this funda
On 2010-10-25 13:37, Toke Eskildsen wrote:
> On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
>> * there is an exact solution to this problem, namely to make two
>> distributed calls instead of one (first call to collect per-shard IDFs
>> for given query terms, second call to submit a que
On Mon, 2010-10-25 at 11:50 +0200, Andrzej Bialecki wrote:
> * there is an exact solution to this problem, namely to make two
> distributed calls instead of one (first call to collect per-shard IDFs
> for given query terms, second call to submit a query rewritten with the
> global IDF-s). This solu
On 2010-10-25 11:22, Toke Eskildsen wrote:
> On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
>> But itshows a problem of distrubted search without common idf.
>> A doc will get different score in different shard.
>
> Bingo.
>
> I really don't understand why this fundamental problem with sharding
On Thu, 2010-07-22 at 04:21 +0200, Li Li wrote:
> But itshows a problem of distrubted search without common idf.
> A doc will get different score in different shard.
Bingo.
I really don't understand why this fundamental problem with sharding
isn't mentioned more often. Every time the advice "use
use doc_X from shard_A or
>>> shard_B, since they will all have got the same scores.
>>
>> That only works if the docs are exactly the same - they may not be.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p995407.html
Sent from the Solr - User mailing list archive at Nabble.com.
the solr version I used is 1.4
2010/7/26 Li Li :
> where is the link of this patch?
>
> 2010/7/24 Yonik Seeley :
>> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>>> why do we do not send the output of TermsComponent of every node in the
>>> cluster to a Hadoop instance?
>>> Since TermsComponent
where is the link of this patch?
2010/7/24 Yonik Seeley :
> On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
>> why do we do not send the output of TermsComponent of every node in the
>> cluster to a Hadoop instance?
>> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
>> onl
distributed IDF
(like at the mentioned JIRA-issue) to normalize your results's scoring.
But the mentioned problem at this mailing-list-posting has nothing to do
with that...
Regards
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-s
On Fri, Jul 23, 2010 at 2:40 PM, MitchK wrote:
> That only works if the docs are exactly the same - they may not be.
> Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
> don't they?
Documents aren't supposed to be duplicated across shards... so the
presence of multiple
That only works if the docs are exactly the same - they may not be.
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent fro
gt;> elements. I guess we could rebuild the current priority queue if we
>> detect a duplicate, but that will have an obvious performance impact.
>> Any other suggestions?
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.
On Fri, Jul 23, 2010 at 2:23 PM, MitchK wrote:
> why do we do not send the output of TermsComponent of every node in the
> cluster to a Hadoop instance?
> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
> only needs to reduce the stuff. Maybe we even do not need Hadoop for
> http://www.lucidimagination.com
>
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html
Sent from the Solr - User mailing list archive at Nabble.com.
: As the comments suggest, it's not a bug, but just the best we can do
: for now since our priority queues don't support removal of arbitrary
FYI: I updated the DistributedSearch wiki to be more clear about this --
it previously didn't make it explicitly clear that docIds were suppose to
be uni
As the comments suggest, it's not a bug, but just the best we can do
for now since our priority queues don't support removal of arbitrary
elements. I guess we could rebuild the current priority queue if we
detect a duplicate, but that will have an obvious performance impact.
Any other suggestions?
case, Solr sees
> the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
> doc maybe would occur at page 10 in pagination, although it *should* occur
> at page 1 or 2.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene
sees
the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
doc maybe would occur at page 10 in pagination, although it *should* occur
at page 1 or 2.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
How about sorting over the score? Would that be possible?
On Jul 21, 2010, at 12:13 AM, Li Li wrote:
> in QueryComponent.mergeIds. It will remove document which has
> duplicated uniqueKey with others. In current implementation, it use
> the first encountered.
> String prevShard = uniqueD
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html
Sent from the Solr - User mailing list archive at Nabble.com.
not sure, but I think you can't prevent this without custom coding or
> making a document's occurence unique.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771
que.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
Sent from the Solr - User mailing list archive at Nabble.com.
, which may not be intended by the user.
>
> Kind regards,
> - Mitch
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Li Li,
this is the intended behaviour, not a bug.
Otherwise you could get back the same record in a response for several
times, which may not be intended by the user.
Kind regards,
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search
in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
String prevShard = uniqueDoc.put(id, srsp.getShard());
if (prevShard != null) {
// duplicate detected
26 matches
Mail list logo