There is already a patch available to address that short-coming in distributed search:

    http://issues.apache.org/jira/browse/SOLR-1632


On Feb 11, 2010, at 6:56 AM, abhishes wrote:


Thanks really useful article.

I am wondering about this statement in the article

"Keep in mind that Solr does not calculate universal term/doc frequencies. At a large scale, its not likely to matter that tf/idf is calculated at the
shard level - however, if your collection is heavily skewed in its
distribution across servers, you might take issue with the relevance
results. Its probably best to randomly distribute documents to your shards"

So if there is no universal tf/idf kept, then how does solr determine the rank of two documents which came from different shards in a distributed
search query?

Regards,
Abhishek





Juan Pedro Danculovic-2 wrote:

To scale solr, take a look to this article

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr



Juan Pedro Danculovic
CTO - www.linebee.com


On Thu, Feb 11, 2010 at 4:12 AM, abhishes <abhis...@gmail.com> wrote:


Suppose I am indexing very large data (5 billion rows in a database)

Now I want to use the Solr Core feature to split the index into
manageable
chunks.

However I have two questions


1. Can Cores reside on difference physical servers?

2. when a query comes, will the query be answered by index in 1 core or
the
query will be sent to all the cores?

My desire is to have a system which from outside appears as a single
large
index... but inside it is multiple small indexes running on different
hardware machines.
--
View this message in context:
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27543068.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://old.nabble.com/Question-on-Solr-Scalability-tp27543068p27544436.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to