Re: how to balance index and search
2007/3/17, Chris Hostetter <[EMAIL PROTECTED]>: if your indexing while searching is causing problems, one way to reduce the impact is to index on a master instance and then use the replication scripts to sync it up with a slave instance (where all of your searches happen) I think it have problem that we use win2003 and i remember replication scripts have problem in FreeBSD. if you are specificly seeing high CPU when indexing HTML, that's probably because the HTML Analyzers have to do a lot of compelx stuff to strip out hte HTML ... another option might be to parse that HTML on the client side before sending it to Solr. Spider crawl html data into MS sql server. I just get data from SQL Server and curl it to solr. Tomorrow i will test under this option . : I find index html will make tomcat obtain cpu 100% . It make seach become : slow. : : So how to balance index and search. : : : web i use apache+php : : solr i use tomcat 6+java1.6 -Hoss -- regards jl
Re: how to balance index and search
2007/3/17, Chris Hostetter <[EMAIL PROTECTED]>: : Can people from cnet tell how to use solr in CNET.COM ? I really don't understand your question, here's some links to CNET.com that use Solr... http://www.cnet.com/4244-5_1-0.html?query=ipod http://search.news.com/search?q=apple http://reviews.cnet.com/4566-3121-0.html I just wana know CNET.com's index and search architecture if it can be public. Many people who use solr or wanna use,,they all wanna know and learn. -Hoss -- regards jl
Score / Sort question
How do i force SOLR to score documents that contain ALL terms 1st before results that contain some of the terms? The problem is that i don't want to use an AND (since i am also interested in the OR results) but i do want to score documents that contain all terms higher. Please advise, Thanks
Re: how to balance index and search
: I think it have problem that we use win2003 and i remember replication The scripts thta come with Solr don't work on windows becaues they rely on hardlinks to efficinelty copy only things that have changed -- but the principle of indexing on one server, creating "snapshots" (which could be true copies instead of hardlinks) and the nreplicating those snapshots out to slave servers for searching is still a solid one. the hooks Solr provides for triggering snapshot creation on the master and snapshot installation on the slave make it possible for you to implement those anyway thta makes sense for your environment. -Hoss
Re: how to balance index and search
: I just wana know CNET.com's index and search architecture if it can be : public. : Many people who use solr or wanna use,,they all wanna know and learn. I'm not sure what to tell you: Solr *is* our search arch. We have a dozen or so Solr, indexes, all of them use hte master/slave model -- but they are all configured in various ways based on the nature of the data and the types of queries we do. the news collection doesn't do faceted search and surfacing new stories immediately is crucial, so they have small cache configs, with very low auto warming, and replication cranked up to happen very frequently; meanwhile hte product index where update latency of 20 minutes isn't the end of the world but we do want to support faceted searching does snapinstalls only every 15 minutes (i think) with big caches, that are 100% auto warmed. -Hoss
Re: Score / Sort question
: How do i force SOLR to score documents that contain ALL terms 1st : before results that contain some of the terms? generally speaking this is hte result you will usually on random data ... under the covers Lucene uses TF/IDF based weighting of terms, with a coord factor that penalizes queries that don't match all clauses -- but i'm sure it's possible that sometimes the tf is so high and the idf so low, that the score from one term can dominate. the only solution to your problem that i can think of is to write a custom Similarity class where tf and idf are fixed so only the coordFactor matters. -Hoss
Re: Returning xx number of each group in a single query?
there's nothing like that in SOlr right now, but you could write a Custom RequestHandler to do it. in theory you could even write a request handler thta just successivly called another request handler by name (which could be a param) altering the request params each time based on it's input, and consolidating all of hte results. : Date: Fri, 16 Mar 2007 17:59:56 -0700 (PDT) : From: Brian Lucas <[EMAIL PROTECTED]> : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Returning xx number of each group in a single query? : : : Is there a way to fetch 5 records with group_id:1, 5 records with group_id:2, : 5 records with group_id:3, and so forth in a single query? : : The facet features don't seem to give me what I need -- same with rows. Any : ideas on how to do something like this? : : : -- : View this message in context: http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9525144 : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
Re: Score / Sort question
I assumed the tf/idf would behave like this but it's behaving VERY differently/wrong so i wonder maybe something is wrong with my indexing strategy ? I think for a quicker solution (ok, hack :) ) I'll run two different queries (AND, OR) and merge them. Does SOLR support some kind of merging i can leverage or do i need to do it manually ? Thanks, On 3/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : How do i force SOLR to score documents that contain ALL terms 1st : before results that contain some of the terms? generally speaking this is hte result you will usually on random data ... under the covers Lucene uses TF/IDF based weighting of terms, with a coord factor that penalizes queries that don't match all clauses -- but i'm sure it's possible that sometimes the tf is so high and the idf so low, that the score from one term can dominate. the only solution to your problem that i can think of is to write a custom Similarity class where tf and idf are fixed so only the coordFactor matters. -Hoss
Re: how to balance index and search
2007/3/19, Chris Hostetter <[EMAIL PROTECTED]>: : I think it have problem that we use win2003 and i remember replication The scripts thta come with Solr don't work on windows becaues they rely on hardlinks to efficinelty copy only things that have changed -- but the principle of indexing on one server, creating "snapshots" (which could be true copies instead of hardlinks) and the nreplicating those snapshots out to slave servers for searching is still a solid one. Now i m reading cwRsync which is Rsync in Window. the hooks Solr provides for triggering snapshot creation on the master and snapshot installation on the slave make it possible for you to implement those anyway thta makes sense for your environment. -Hoss -- regards jl
Re: how to balance index and search
2007/3/19, Chris Hostetter <[EMAIL PROTECTED]>: : I just wana know CNET.com's index and search architecture if it can be : public. : Many people who use solr or wanna use,,they all wanna know and learn. I'm not sure what to tell you: Solr *is* our search arch. Below information that i wanna learn. Thks Chris. Maybe this thing should add to wiki. I think person will be happy reading it. We have a dozen or so Solr, indexes, all of them use hte master/slave model -- but they are all configured in various ways based on the nature of the data and the types of queries we do. the news collection doesn't do faceted search and surfacing new stories immediately is crucial, so they have small cache configs, with very low auto warming, and replication cranked up to happen very frequently; meanwhile hte product index where update latency of 20 minutes isn't the end of the world but we do want to support faceted searching does snapinstalls only every 15 minutes (i think) with big caches, that are 100% auto warmed. -Hoss -- regards jl
Re: Returning xx number of each group in a single query?
How about returning at most 1 result of each group in a single query? For example, a website may have a lot of pages. When google returns search results, it only shows at most one result for each website. I have a similar situation. Is there an easy way for this kind of problem? -- View this message in context: http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9546032 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Score / Sort question
Hey Chris, wouldn't doing something like this in the query : (field1:tag1 tag2) OR (field1:tag1 AND tag2) Achieve similar affect ? The documents that have all the tags (tag1 and tag2) will comply with both conditions and get scores from both while the documents that don't have both tags will only get a score from the 1st (the OR) condition therfore won't have higher score. is this right ? On 3/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : How do i force SOLR to score documents that contain ALL terms 1st : before results that contain some of the terms? generally speaking this is hte result you will usually on random data ... under the covers Lucene uses TF/IDF based weighting of terms, with a coord factor that penalizes queries that don't match all clauses -- but i'm sure it's possible that sometimes the tf is so high and the idf so low, that the score from one term can dominate. the only solution to your problem that i can think of is to write a custom Similarity class where tf and idf are fixed so only the coordFactor matters. -Hoss
Re: Returning xx number of each group in a single query?
You may want to take a look at the related discussion: http://www.nabble.com/result-grouping--tf2910425.html#a8131895 Yonik suggested a dynamic priority queue... if the number of things you are grouping by is small it is probably easier to make multiple calls to solr. ryan On 3/16/07, Brian Lucas <[EMAIL PROTECTED]> wrote: Is there a way to fetch 5 records with group_id:1, 5 records with group_id:2, 5 records with group_id:3, and so forth in a single query? The facet features don't seem to give me what I need -- same with rows. Any ideas on how to do something like this? -- View this message in context: http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9525144 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Score / Sort question
An example would help. A query and the results that you see. wunder On 3/18/07 6:48 PM, "shai deljo" <[EMAIL PROTECTED]> wrote: > I assumed the tf/idf would behave like this but it's behaving VERY > differently/wrong so i wonder maybe something is wrong with my > indexing strategy ? > I think for a quicker solution (ok, hack :) ) I'll run two different > queries (AND, OR) and merge them. > Does SOLR support some kind of merging i can leverage or do i need to > do it manually ? > Thanks, > > > On 3/18/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: >> >> : How do i force SOLR to score documents that contain ALL terms 1st >> : before results that contain some of the terms? >> >> generally speaking this is hte result you will usually on random data ... >> under the covers Lucene uses TF/IDF based weighting of terms, with a coord >> factor that penalizes queries that don't match all clauses -- but i'm sure >> it's possible that sometimes the tf is so high and the idf so low, that >> the score from one term can dominate. >> >> the only solution to your problem that i can think of is to write a custom >> Similarity class where tf and idf are fixed so only the coordFactor >> matters. >> >> >> >> >> -Hoss >> >>
Re: Returning xx number of each group in a single query?
Thanks to everyone who responded thus far. Simple is good for right now. Chris, is there a way to do what you describe here (write a request handler that successively calls another request handler by name) in the solrconfig.xml file, or does this require me to write a custom RequestHandler in java to perform this? Brian Chris Hostetter wrote: > > > there's nothing like that in SOlr right now, but you could write a Custom > RequestHandler to do it. > > in theory you could even write a request handler thta just successivly > called another request handler by name (which could be a param) altering > the request params each time based on it's input, and consolidating all of > hte results. > > : Date: Fri, 16 Mar 2007 17:59:56 -0700 (PDT) > : From: Brian Lucas <[EMAIL PROTECTED]> > : Reply-To: solr-user@lucene.apache.org > : To: solr-user@lucene.apache.org > : Subject: Returning xx number of each group in a single query? > : > : > : Is there a way to fetch 5 records with group_id:1, 5 records with > group_id:2, > : 5 records with group_id:3, and so forth in a single query? > : > : The facet features don't seem to give me what I need -- same with rows. > Any > : ideas on how to do something like this? > : > : > : -- > : View this message in context: > http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9525144 > : Sent from the Solr - User mailing list archive at Nabble.com. > : > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Returning-xx-number-of-each-group-in-a-single-query--tf3417627.html#a9546710 Sent from the Solr - User mailing list archive at Nabble.com.