multi-language searching with Solr

2008-05-05 Thread Eli K
Hello folks, Let me start by saying that I am new to Lucene and Solr. I am in the process of designing a search back-end for a system that receives 20k documents a day and needs to keep them available for 30 days. The documents should be searchable on a free text field and on about 8 other field

Re[2]: definition of field types?

2008-05-05 Thread JLIST
Thanks Otis. The schema.xml actually explains it very well! > A good place to look is the Wiki. Look for "Analyzer" substring on the main > Solr wiki page. >> I must be overlooking ... where can I find definitions of >> the built-in types such as textTight, text_ws, etc?

custom queries via plugins?

2008-05-05 Thread Phillip Rhodes
I am currently using lucene directly to build custom queries. Can I write a plugin to build these custom BooleanQueries, RangeQueries, etc...? As a simple example, we have documents that represent coupons, events and activities. Some searches may only be for coupons and events. Currently, I p

RE: dismax query handler ignoring qf entirely!

2008-05-05 Thread Ezra Epstein
I think the problem is that 'cat' is of type 'string' and we're querying as though it was type 'text'. We get expected results only when we quote the query string, otherwise the query string is goes through stemming and, after that, no longer quite matches the literal string in the 'cat' field.

RE: multi-language searching with Solr

2008-05-05 Thread Binkley, Peter
I think you would have to declare a separate field for each language (freetext_en, freetext_fr, etc.), each with its own appropriate stemming. Your ingestion process would have to assign the free text content for each document to the appropriate field; so, for each document, only one of the freetex

Re: multi-language searching with Solr

2008-05-05 Thread Eli K
Wouldn't this impact both indexing and search performance and the size of the index? It is also probable that I will have more then one free text fields later on and with at least 20 languages this approach does not seem very manageable. Are there other options for making this work with stemming?

Re: multi-language searching with Solr

2008-05-05 Thread Erick Erickson
You might want to bounce over to the Lucene user's list and search for language. This topic has arisen many times and there's some good discussion. And have you searched the solr users list of "language"? I know it's turned up here as well. Best Erick On Mon, May 5, 2008 at 4:28 PM, Eli K <[EMAIL

RE: multi-language searching with Solr

2008-05-05 Thread Binkley, Peter
It won't make much difference to the index size, since you'll only be populating one of the language fields for each document, and empty fields cost nothing. The performance may suffer a bit but Lucene may surprise you with how good it is with that kind of boolean query. I agree that as the numbe

Re: multi-language searching with Solr

2008-05-05 Thread Eli K
I searched the Solr list but not as much the Lucene list. I will look again to see if there is something there that might work with Solr. I rather leverage Solr, but if I have no choice I will to do this using Lucene only. Thanks, Eli On Mon, May 5, 2008 at 4:58 PM, Erick Erickson <[EMAIL PROT

Re: Help optimizing

2008-05-05 Thread Mike Klaas
On 3-May-08, at 10:06 AM, Daniel Andersson wrote: Our database/index is 3.5 GB and contains 4,352,471 documents. Most documents are less than 1 kb. When performing a search, the results vary between 1.5 seconds up to 60 seconds. I don't have a big problem with 1.5 seconds (even though belo

Re: Tokenize integers?

2008-05-05 Thread Mike Klaas
Just use fieldType="string", and send them to solr in a multivalued fashion: 1133name="blah">999 Search: blah:133 +blah:999 +blah:1 [both must match] Just treat the numbers as untokenized text. -Mike On 4-May-08, at 2:30 AM, [EMAIL PROTECTED] wrote: Ok, thanks. However I am still abit c

Re: Re[2]: startsWith?

2008-05-05 Thread Mike Klaas
On 3-May-08, at 10:44 PM, JLIST wrote: Hello Otis, Do you mean that if I index the URL as a "text" field, I'll be able to do * for a given prefix because the text will be tokenized at the "/" and should suffice for my need? I'm not sure what your needs are, but I use the following to index u

Re: custom queries via plugins?

2008-05-05 Thread Otis Gospodnetic
I'm not sure if you are after a custom query parsing component, but if that is that you are after, start by looking at these: $ ff \*QParser\*java ./src/test/org/apache/solr/search/FooQParserPlugin.java ./src/java/org/apache/solr/search/LuceneQParserPlugin.java<== here ./src/java/org/apache/

Multiple SpellCheckRequestHandlers

2008-05-05 Thread solr_user
Hi all, Is it possible in Solr to have multiple SpellCheckRequestHandlers. In my application I have got two different spell check indexes. I want the spell checker to check for a spelling suggestion in the first index and if it fails to get any suggestion from the first index only then it sho

Re: Tokenize integers?

2008-05-05 Thread Chris Hostetter
: Just use fieldType="string", and send them to solr in a multivalued fashion: : : 1133999 But as the OP said: that requires preprocessing -- it would be nice if Solr would make this easier for you. I've had some ideas in the back of my mind for a while now that: 1) schema.xml should support

Re: Multiple SpellCheckRequestHandlers

2008-05-05 Thread Otis Gospodnetic
Yes, just define two instances (with two distinct names) in solrconfig.xml and point each of them to a different index. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: solr_user <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tu

Re: Tokenize integers?

2008-05-05 Thread Mike Klaas
On 5-May-08, at 9:19 PM, Chris Hostetter wrote: : Just use fieldType="string", and send them to solr in a multivalued fashion: : : 1133field> : name="blah">999 But as the OP said: that requires preprocessing -- it would be nice if Solr would make this easier for you. Oh I see, I misinter

Your valuable suggestion on autocomplete

2008-05-05 Thread Rantjil Bould
Hi Group, I have already got some valuable suggestions from group. Based on that, I have come out with following process to finally implement autocomplete like fetaure in my system 1- Index the whole documents 2- Extract all terms using indexReader's terms() method I am getting terms