I don't think that will solve the relevance issues, given that the IDF (described at http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html) is per document, not per field. In the end, though, it may be negligible. Can you test it out fairly quickly?

One other thing to think about with multiple indexes is whether or not keeping them separate affords you some extra flexibility at the cost of some more up front work? For instance, news is probably updated much more frequently than classifieds and so you may want to tune it for frequent updates and possibly even give it more hardware, whereas classifieds may not be as critical (or vice versa, I'm not in the news biz.) Naturally, the tradeoff is you need to develop tools to manage these various indexes, whereas the single index approach is already pretty well understood.

I would expect that as the mutlicore (https://issues.apache.org/jira/browse/SOLR-350 ) patch evolves, it is going to bring in more management tools for working with various indexes (perhaps you can donate your expertise if you go this route?)

-Grant

On Nov 5, 2007, at 10:27 AM, Tim Archambault wrote:

Good points Grant. I'm envisioning my front end working so that a user would
never be able to search across all the verticals at once.

EVERY query would inject "vertical:jobs" or "vertical:news" or
"vertical:Autos", etc.. etc...

This may detrimentally affect my faceted results sets so I'll have to think
about this more.

Wouldn't this approach overcome my relevancy and scoring issues?

On 11/5/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

One reason to consider separate indexes is in terms of relevance.  Do
you want content from classifieds effecting the rankings of your news
searches?  May not be an issue for you depending on your term
distributions, but might be something to consider.    As you suspect,
though, having multiple indexes will require more management of the
various instances.  Perhaps you can logically group things to only
have a couple of indexes? For instance, maybe home, auto, classifieds
are similar in content and structure and news and community-generated
content are similar?

-Grant

On Nov 5, 2007, at 9:34 AM, Tim Archambault wrote:

Typical newspaper site with: news, jobs, homes, autos, classifieds,
community-generated content, guestimate of .5 million documents

Do I really need to create a different solr index for each vertical?
How
ineffecient is it to add a few additional fields for each content
type?

Thinking of having a string field name "vertical" that would be used
to
segment by verticals above.

My intuition is that most of the additional fields would be numbers:
integers, prices, decimals.

Thanks,

Tim

--
True innovation is not just about changing a product, a service or
even a
marketplace; its also about recognizing and relishing the need to
change
yourself.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ





--
True innovation is not just about changing a product, a service or even a marketplace; its also about recognizing and relishing the need to change
yourself.

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ


Reply via email to