phrase slop param in dismax handler

2008-01-05 Thread anuvenk
How does adding a phrase slop in the handler help? I tried ps=25 along with some pf values. I assumed that it means this..for eg: a search term, 'child custody battle' means documents which have the words 'child','custody','battle' within 25 words of one another will rank high. Is that correct? --

Re: phrase slop param in dismax handler

2008-01-05 Thread Otis Gospodnetic
I'm not looking at the docs to double-check this, but the ps option lets you boost exact phrase matches higher. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, January

Re: solr word delimiter

2008-01-05 Thread Otis Gospodnetic
It sounds like you simply want to drop solr.WordDelimiterFilterFactory from your analyzer definition, no? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, January 5, 200

Re: morelikethishandler

2008-01-05 Thread Otis Gospodnetic
MLT - give it an ID of a doc and it will return similar docs. DisMax - give it a query string and it will construct a "parametric" query with boosts defined in solrconfig.xml Different beasts for different uses. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original

Re: solr with hadoop

2008-01-05 Thread Otis Gospodnetic
Evgeniy, Two simple options: 1) take your index, put it on N Solr search servers, and put them behind a load balancer 2) take your index, split it in N (or create N smaller indices from scratch) and put it on N Solr search servers (and see SOLR-303) Each will help in a different way and it soun

Re: How the star operator works

2008-01-05 Thread Otis Gospodnetic
My first guess would be that this is related to Wildcard queries not being analyzed. Check the Lucene FAQ, I believe the explanation is there. Also, go to Solr Admin page and run your query in the Analysis section of the Admin to see what's going on. Otis -- Sematext -- http://sematext.com/ -

Re: solr word delimiter

2008-01-05 Thread anuvenk
Thats what i'm thinking too. If i remove solr.worddelimiter filter from both index and query, the word h1-b will remain as is in the index correct, so if someone searches for h1b (without hyphens) would it still return the h1-b doc. Otis Gospodnetic wrote: > > It sounds like you simply want to

How does solr rank multiple docs with same score

2008-01-05 Thread anuvenk
I noticed that the top 10 results for a particular search term had the same score. In such cases how does solr determine which should get the first place, second and so on? -- View this message in context: http://www.nabble.com/How-does-solr-rank-multiple-docs-with-same-score-tp14638959p14638959

what are tf,idf,fieldNorm,queryNorm.?

2008-01-05 Thread anuvenk
I understand tf means term frequency. For eg: if the search term is 'chapter 7', does tf mean how frequently 'chapter 7' occurs in the docs? Does it take in to account the total number of words in a doc to determine frequency. Also what is idf, fieldNorm and queryNorm. Trying to understand how sol

java client for java 1.4 solr 2.0

2008-01-05 Thread Sean Laval
can anyone offer any advice as to whether there is a java client that will work on java 1.4 against 2.0. Well I have seen various references to java a java clients but there doesn't seem to be one included in the solr 2.0 distribution. I think there is one intended for solr 3.0 but of course th

Re: solr word delimiter

2008-01-05 Thread Yonik Seeley
On Jan 5, 2008 2:28 PM, anuvenk <[EMAIL PROTECTED]> wrote: > Thats what i'm thinking too. If i remove solr.worddelimiter filter from both > index and query, the word h1-b will remain as is in the index correct, so if > someone searches for h1b (without hyphens) would it still return the h1-b > doc.

Re: How does solr rank multiple docs with same score

2008-01-05 Thread Yonik Seeley
On Jan 5, 2008 3:53 PM, anuvenk <[EMAIL PROTECTED]> wrote: > I noticed that the top 10 results for a particular search term had the same > score. In such cases how does solr determine which should get the first > place, second and so on? Ties are the same as in lucene... internal docid (equiv to t

Re: solr word delimiter

2008-01-05 Thread anuvenk
The worddelimiter filter is set to generatewordparts=1,generatenumberparts=1,catenatewords=1,catenatenumbers=1 both at index and querytime. Now i have this synonym mapping k-1 => k1 visa Here is the parsedquery_ToString +(text:"k (1 k) 1 visa"^0.8 | name:"k (1 k) 1 visa"^2.0)~0.01 (text:"k (1 k

Re: what are tf,idf,fieldNorm,queryNorm.?

2008-01-05 Thread Otis Gospodnetic
You should really look at Lucene first, if you want to know this type of stuff. TF - # of occurrences of a term in a single doc DF - # of occurrences of a term in the corpus/index (IDF is the inverse DF) But lookgoogle... http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/

Re: java client for java 1.4 solr 2.0

2008-01-05 Thread Otis Gospodnetic
Sean, There is no solr 2.0 nor 3.0 yet - 1.2 is the last release, while 1.3 is still baking in the oven. The only supported/official Solr Java client is solrj, and you can get it if you get Solr our of svn (and maybe some other way). If solrj doesn't work for you, I am guessing you'll have to r

queryResultCache

2008-01-05 Thread s d
What is the best approach to tune queryResultCache ?For example the default size is: size="512" but since a document id is just an int (it is an int, right?) ,i.e 4 bytes why not set size to 10,000,000 for example (it's only ~38Mb). I sense there is something that I'm missing here :). any help wou

Boosting a Field (Standard Handler)

2008-01-05 Thread s d
How do i boost a field (not a term) using the standard handler syntax? I know i can do that with the DisMax but I'm trying to keep myself in the standard one.Can this be done ? Thanks,

Re: phrase slop param in dismax handler

2008-01-05 Thread anuvenk
The lower ps , the better or vice versa? I'm guessing lower. I think that'll make the search stricter. Is it correct? Otis Gospodnetic wrote: > > I'm not looking at the docs to double-check this, but the ps option lets > you boost exact phrase matches higher. > > Otis > -- > Sematext -- http://

RE: java client for java 1.4 solr 2.0

2008-01-05 Thread Sean Laval
sorry. I meant 1.2 and 1.3. thanks> Date: Sat, 5 Jan 2008 18:30:54 -0800> From: [EMAIL PROTECTED]> Subject: Re: java client for java 1.4 solr 2.0> To: solr-user@lucene.apache.org> > Sean,> There is no solr 2.0 nor 3.0 yet - 1.2 is the last release, while 1.3 is still baking in the oven.> The on

Re: phrase slop param in dismax handler

2008-01-05 Thread Otis Gospodnetic
ja -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Sunday, January 6, 2008 1:50:42 AM Subject: Re: phrase slop param in dismax handler The lower ps , the better or vice versa? I'm g

Re: parsedquery_ToString

2008-01-05 Thread Chris Hostetter
: Is the parsedquery_ToString, the one passed to solr after all the tokenizing : and analyzing of the query? yes. : For the search term 'chapter 7' i have this parsedquery_ToString ... : I have these synonyms : chap 7 => bankruptcy ... : But seem to have a little bit of trouble

Re: solr results debugging

2008-01-05 Thread Chris Hostetter
: I've been using the solr admin form with debug=true to do some in-depth : analysis on some results. Could someone explain how to make sense of : this..This is the debugging info for the first result i got. there's more to the debugging info then just what's below ... this is known as a "score

Re: phrase slop param in dismax handler

2008-01-05 Thread Chris Hostetter
: How does adding a phrase slop in the handler help? : I tried ps=25 along with some pf values. I assumed that it means this..for : eg: a search term, 'child custody battle' means documents which have the : words 'child','custody','battle' within 25 words of one another will rank : high. Is that c