solr word delimiter

2008-01-04 Thread anuvenk
I have the word delimiter filter factory in the text field definition both at index and query time. But it does have some negative effects on some search terms like h1-b visa It splits this in to three tokens h,1,b. Now if i understand right, does solr look for matches for 'h' separately, '1' sep

solr results debugging

2008-01-04 Thread anuvenk
I've been using the solr admin form with debug=true to do some in-depth analysis on some results. Could someone explain how to make sense of this..This is the debugging info for the first result i got. 10.201284 = (MATCH) sum of: 6.2467875 = (MATCH) max plus 0.01 times others of: 6.236769

morelikethishandler

2008-01-04 Thread anuvenk
How does the morelikethis handler work? Solr wiki doesn't seem to have an elaborate explaination. In which cases would it be better to use this instead of the dismax? -- View this message in context: http://www.nabble.com/morelikethishandler-tp14628416p14628416.html Sent from the Solr - User ma

Re: spellcheckhandler

2008-01-04 Thread John Stewart
The way we do this is with the Solr 1.2 (the current release), inspired by a discussion on the ML, is to build a spellcheck dictionary with the relevant collocations such as the one in your example, based on a custom field that is effectively not tokenized. We actually create dummy documents for th

spellcheckhandler

2008-01-04 Thread anuvenk
Is it possible to implement something like this with the spellcheckhandler Like how google does,.. say i search for 'chater 13 bakrupcy', should be able to display these.. did you search for 'chapter 13 bankruptcy' Has someone been able to do this? -- View this message in context: http://ww

Dealing with numbers in search terms

2008-01-04 Thread anuvenk
I seem to have problems with the results i get for this search term. Not sure if its because of the synonym mappings i have for this search term. Search term: chapter 7 The first result doesn't even have any occurrence of chapter, bankruptcy. But just a few occurrences of 7. But i have the 'mm'

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread Mike Klaas
It is the fraction of the score non-max terms that get added to the solr. Hence, 1.0=sum everythign. -Mike On 4-Jan-08, at 3:28 PM, anuvenk wrote: Could you elaborate on what the tie param does? I did read the definition in the solr wiki but still not crystal clear. Mike Klaas wrote:

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread anuvenk
Could you elaborate on what the tie param does? I did read the definition in the solr wiki but still not crystal clear. Mike Klaas wrote: > > > On 4-Jan-08, at 1:12 PM, s d wrote: > >> but i want to sum the scores and not use max, can i still do it >> with the >> DisMax? am i missing anythin

parsedquery_ToString

2008-01-04 Thread anuvenk
Is the parsedquery_ToString, the one passed to solr after all the tokenizing and analyzing of the query? For the search term 'chapter 7' i have this parsedquery_ToString +(text:"(bankruptci chap 7) (7 chapter chap) 7 bankruptci"^0.8 | ((name:bankruptci name:chap)^2.0))~0.01 (text:"(bankruptci ch

Re: Search terms with quotes

2008-01-04 Thread Mike Klaas
anuvenk, solr-dev is for discussion about the _development_ of Solr, not on usage or general questions. Also, your audience will be severely restricted compared to posting on solr-user. To answer your question, please provide more details about your setup, including what request handler

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread Mike Klaas
On 4-Jan-08, at 1:12 PM, s d wrote: but i want to sum the scores and not use max, can i still do it with the DisMax? am i missing anything ? If you set tie=1.0, dismax functions like dissum. -Mike

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread s d
but i want to sum the scores and not use max, can i still do it with the DisMax? am i missing anything ? On Jan 4, 2008 2:32 AM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jan 4, 2008, at 4:40 AM, s d wrote: > > Is there a simpler way to write this query (I'm using the standard > > handler) >

Re: correct escapes in csv-Update files

2008-01-04 Thread Yonik Seeley
Here's the commons-csv bug for those who want to follow along: http://issues.apache.org/jira/browse/SANDBOX-206 -Yonik On Jan 4, 2008 12:03 PM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > On 04.01.2008 17:35 Walter Underwood wrote: > > > I recommend the opencsv library for Java or the csv pack

Re: solr with hadoop

2008-01-04 Thread Ryan McKinley
Mike Klaas wrote: On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: I have huge index base (about 110 millions documents, 100 fields each). But size of the index base is reasonable, it's about 70 Gb. All I need is increase performance, since some queries, which match big number of documents, a

Re: solr with hadoop

2008-01-04 Thread Mike Klaas
On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote: I have huge index base (about 110 millions documents, 100 fields each). But size of the index base is reasonable, it's about 70 Gb. All I need is increase performance, since some queries, which match big number of documents, are running slow.

solr with hadoop

2008-01-04 Thread Evgeniy Strokin
I have huge index base (about 110 millions documents, 100 fields each). But size of the index base is reasonable, it's about 70 Gb. All I need is increase performance, since some queries, which match big number of documents, are running slow. So I was thinking is any benefits to use hadoop for t

Re: Backup of a Solr index

2008-01-04 Thread Jörg Kiegeland
A postCommit hook (configured in solrconfig.xml) is called in a safe place for every commit. You could have a program as a hook that normally did nothing unless you had previously signaled to make a copy of the index. Then I will give the postCommit trigger a try and hope that while the trig

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 11:18 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > On 04.01.2008 16:55 Yonik Seeley wrote: > > > On Jan 4, 2008 10:25 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > >> If the fields value is: > >> 's-Gravenhage > >> I cannot get it into SOLR with CSV. > > > > This one works f

Re: SolrJ Javadoc?

2008-01-04 Thread Ryan McKinley
run: ant javadoc-solrj and that will build them... Yes, they should be built into the nightly distribution... Matthew Runo wrote: Hello! I've seen some SVN commits and heard some rumblings of SolrJ javadoc - but can't seem to find any. Is there any yet? I know that SolrJ is still pretty yo

SolrJ Javadoc?

2008-01-04 Thread Matthew Runo
Hello! I've seen some SVN commits and heard some rumblings of SolrJ javadoc - but can't seem to find any. Is there any yet? I know that SolrJ is still pretty young =p Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833

Re: Duplicated Keyword

2008-01-04 Thread Robert Young
You can think of it as the latter but it's quite a bit more complicated than that. For details on how lucene stores it's index check out the file formats page on lucene. http://lucene.apache.org/java/docs/fileformats.html Cheers Rob On Jan 4, 2008 4:59 PM, Jae Joo <[EMAIL PROTECTED]> wrote: > ti

Re: correct escapes in csv-Update files

2008-01-04 Thread Michael Lackhoff
On 04.01.2008 17:35 Walter Underwood wrote: > I recommend the opencsv library for Java or the csv package for Python. > Either one can write legal CSV files. > > There are lots of corner cases in CSV and some differences between > applications, like whetehr newlines are allowed inside a quoted fi

Re: Duplicated Keyword

2008-01-04 Thread Jae Joo
title of Document 1 - "This is document 1 regarding china" - fieldtype = text title of Document 2 - "This is document 2 regarding china" fieldtype=text Once it is indexed, will index hold 2 "china" text fields or just 1 china word which is pointing document1 and document2? Jae On Jan 4, 2008

Re: correct escapes in csv-Update files

2008-01-04 Thread Walter Underwood
I recommend the opencsv library for Java or the csv package for Python. Either one can write legal CSV files. There are lots of corner cases in CSV and some differences between applications, like whetehr newlines are allowed inside a quoted field. It is best to use a library for this instead of ha

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Michael Lackhoff
On 04.01.2008 16:55 Yonik Seeley wrote: > On Jan 4, 2008 10:25 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: >> If the fields value is: >> 's-Gravenhage >> I cannot get it into SOLR with CSV. > > This one works for me fine. > > $ cat t2.csv > id,name > 12345,"'s-Gravenhage" > 12345,'s-Gravenha

How the star operator works

2008-01-04 Thread Leonardo Santagada
From both lucene and solr docs the star "*" operator used after a word should find the word plus 0 or more characters after word. I have some documents on a solr index (both in type text and string) and both don't work like that. For example I have a document called Test Document, if I sear

Re: correct escapes in csv-Update files

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 4:08 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > Thanks for the hint but the result is the same, that is, ""quoted"" > behaves exactly like \"quoted\": > - both leave the single unescaped quote in the record: "quoted" > - both have the problem with a backslash before the escape

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Ryan McKinley
Michael Lackhoff wrote: If the fields value is: 's-Gravenhage I cannot get it into SOLR with CSV. I tried to double the single quote/apostrophe or escape it in several ways but I either get an error or another character (the "escape") in front of the single quote. Is it not possible to have a fie

Re: Another text I cannot get into SOLR with csv

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 10:25 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > If the fields value is: > 's-Gravenhage > I cannot get it into SOLR with CSV. This one works for me fine. $ cat t2.csv id,name 12345,"'s-Gravenhage" 12345,'s-Gravenhage 12345,"""s-Gravenhage" $ curl http://localhost:8983/solr

Re: Duplicated Keyword

2008-01-04 Thread Robert Young
I don't quite understand what you're getting at. What is the problem you're encountering or what are you trying to achieve? Cheers Rob On Jan 4, 2008 3:26 PM, Jae Joo <[EMAIL PROTECTED]> wrote: > Hi, > > Is there any way to dedup the keyword cross the document? > > Ex. > > "china" keyword is in d

Duplicated Keyword

2008-01-04 Thread Jae Joo
Hi, Is there any way to dedup the keyword cross the document? Ex. "china" keyword is in doc1 and doc2. Will Solr index have only 1 "china" keyword for both document? Thanks, Jae Joo

Another text I cannot get into SOLR with csv

2008-01-04 Thread Michael Lackhoff
If the fields value is: 's-Gravenhage I cannot get it into SOLR with CSV. I tried to double the single quote/apostrophe or escape it in several ways but I either get an error or another character (the "escape") in front of the single quote. Is it not possible to have a field that begins with an apo

Re: Backup of a Solr index

2008-01-04 Thread Yonik Seeley
On Jan 4, 2008 8:44 AM, Jörg Kiegeland <[EMAIL PROTECTED]> wrote: > > If you want to copy the hard files from the data/index directory, yes, > > you'll probably want to shut down the server first. You may be able to get > > away with leaving the server up but stopping any index/commit operations,

Re: Backup of a Solr index

2008-01-04 Thread Jörg Kiegeland
If you want to copy the hard files from the data/index directory, yes, you'll probably want to shut down the server first. You may be able to get away with leaving the server up but stopping any index/commit operations, but I could be wrong. How do I stop remote clients to do index/commit

Re: Best practice for storing relational data in Solr

2008-01-04 Thread Robert Young
Short answer: It depends. Long answer: It depends on whether you want to be able to search on. If you need to search by recruiter name then obviously you'll need to index it, if you don't you only really need to index the most relevent db identifier, then work out the relations from that in MySQL (

Best practice for storing relational data in Solr

2008-01-04 Thread steve.lillywhite
Hi all, This is a (possibly very naive) newbie question regarding Solr best practice... I run a website that displays/stores data on job applicants, together with information on where they came from (e.g. which recruiter), which office they are applying to, etc. This data is stored in a m

Re: Query Syntax (Standard handler) Question

2008-01-04 Thread Erik Hatcher
On Jan 4, 2008, at 4:40 AM, s d wrote: Is there a simpler way to write this query (I'm using the standard handler) ? field1:t1 field1:t2 field1:"t1 t2" field2:t1 field2:t2 field2:"t1 t2" Looks like you'd be better off using the DisMax handler for (without the brackets). Erik

Query Syntax (Standard handler) Question

2008-01-04 Thread s d
Is there a simpler way to write this query (I'm using the standard handler) ? field1:t1 field1:t2 field1:"t1 t2" field2:t1 field2:t2 field2:"t1 t2" Thanks,

Re: correct escapes in csv-Update files

2008-01-04 Thread Michael Lackhoff
On 03.01.2008 17:16 Yonik Seeley wrote: > CSV doesn't use backslash escaping. > http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm > > "This is text with a ""quoted"" string" Thanks for the hint but the result is the same, that is, ""quoted"" behaves exactly like \"quoted\": - both leave the s