Re: solr instances for different content?

Grant Ingersoll Mon, 05 Nov 2007 09:01:10 -0800

I don't think that will solve the relevance issues, given that the IDF(described at http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html)is per document, not per field. In the end, though, it may benegligible. Can you test it out fairly quickly?

One other thing to think about with multiple indexes is whether or notkeeping them separate affords you some extra flexibility at the costof some more up front work? For instance, news is probably updatedmuch more frequently than classifieds and so you may want to tune itfor frequent updates and possibly even give it more hardware, whereasclassifieds may not be as critical (or vice versa, I'm not in the newsbiz.) Naturally, the tradeoff is you need to develop tools to managethese various indexes, whereas the single index approach is alreadypretty well understood.

I would expect that as the mutlicore (https://issues.apache.org/jira/browse/SOLR-350) patch evolves, it is going to bring in more management tools forworking with various indexes (perhaps you can donate your expertise ifyou go this route?)


-Grant

On Nov 5, 2007, at 10:27 AM, Tim Archambault wrote:

Good points Grant. I'm envisioning my front end working so that auser would

never be able to search across all the verticals at once.

EVERY query would inject "vertical:jobs" or "vertical:news" or
"vertical:Autos", etc.. etc...

This may detrimentally affect my faceted results sets so I'll haveto think

about this more.

Wouldn't this approach overcome my relevancy and scoring issues?

On 11/5/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:


One reason to consider separate indexes is in terms of relevance.  Do
you want content from classifieds effecting the rankings of your news
searches?  May not be an issue for you depending on your term
distributions, but might be something to consider.    As you suspect,
though, having multiple indexes will require more management of the
various instances.  Perhaps you can logically group things to only

have a couple of indexes? For instance, maybe home, auto,classifieds

are similar in content and structure and news and community-generated
content are similar?

-Grant

On Nov 5, 2007, at 9:34 AM, Tim Archambault wrote:

Typical newspaper site with: news, jobs, homes, autos, classifieds,
community-generated content, guestimate of .5 million documents

Do I really need to create a different solr index for each vertical?
How
ineffecient is it to add a few additional fields for each content
type?

Thinking of having a string field name "vertical" that would be used
to
segment by verticals above.

My intuition is that most of the additional fields would be numbers:
integers, prices, decimals.

Thanks,

Tim

--
True innovation is not just about changing a product, a service or
even a
marketplace; its also about recognizing and relishing the need to
change
yourself.


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

--

True innovation is not just about changing a product, a service oreven amarketplace; its also about recognizing and relishing the need tochange

yourself.


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Boot Camp Training:
ApacheCon Atlanta, Nov. 12, 2007.  Sign up now!  http://www.apachecon.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: solr instances for different content?

Reply via email to