Re: Accented Search in Solr

2010-10-12 Thread Chris Hostetter
: Subject: Accented Search in Solr : References: : In-Reply-To: http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the

Re: Accented Search in Solr

2010-10-08 Thread Otis Gospodnetic
rom: "Sethi, Parampreet" > To: "solr-user@lucene.apache.org" > Sent: Fri, October 8, 2010 11:33:02 AM > Subject: Accented Search in Solr > > Hi All, > > I am using Solr 1.3 in my project. Just wanted to know if there is any other >way by which below

Re: Accented Search in Solr

2010-10-08 Thread Erick Erickson
not that I know of. Do note that whether the query has the accent filter active or not MUST be matched with the index-time filter. In other words, if you indexed with the filter but search without it or vice-versa you won't get the resultsyou expect. Also note that no matter what, the original tex

Accented Search in Solr

2010-10-08 Thread Sethi, Parampreet
Hi All, I am using Solr 1.3 in my project. Just wanted to know if there is any other way by which below mentioned queries will return the same results: Gruyère-and-Zucchini Gruyere-and-Zucchini The first query has accented characters in it. I was just going through the Solr tokenizers and fi

Re: Accented search

2008-06-24 Thread Robert Haschart
climbingrose wrote: Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken; public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t = input.nex

Re: Accented search

2008-06-24 Thread climbingrose
Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken; public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t = input.next(); String filte

Re: Accented search

2008-06-20 Thread Phillip Farber
Regarding indexing words with accented and unaccented characters with positionIncrement zero: Chris Hostetter wrote: you don't really need a custom tokenizer -- just a buffered TokenFilter that clones the original token if it contains accent chars, mutates the clone, and then emits it next w

Re: Accented search

2008-06-20 Thread Phillip Farber
I've seen mention of these filters: But I don't see them in the 1.2 distribution. Am I looking in the wrong place? What will the UnicodeNormalizationFilterFactory do for me? I can't find any documentation on it. Thanks, Phil

Re: Accented search

2008-03-11 Thread Chris Hostetter
: It looks like a very promising approach for us. I'm going to implement : an custom Tokeniser based on your suggestions and see how it goes. Thank : you all for your comments! you don't really need a custom tokenizer -- just a buffered TokenFilter that clones the original token if it contains

Re: Accented search

2008-03-11 Thread climbingrose
not stripped from queries. The effect is that an accented search matches > your Doc A, and an unaccented search matches Docs A and B. We do that after > lower-casing the token. > > There are some limitations: users might start to expect that they can > freely add accents to restr

Re: Accented search

2008-03-11 Thread Walter Underwood
okens to be useful? > > And how does this take care of scoring? Do you get a higher score with a > closer match? > > > > > -Original Message- > From: Binkley, Peter [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 11, 2008 8:37 AM > To: solr-user@lucene.apa

RE: Accented search

2008-03-11 Thread Renaud Waldura
lr-user@lucene.apache.org Subject: RE: Accented search We've done this in a pre-Solr Lucene context by using the position increment: when a token contains accented characters, you add a stripped version of that token with a zero increment, so that for matching purposes the original and the st

RE: Accented search

2008-03-11 Thread Binkley, Peter
e not stripped from queries. The effect is that an accented search matches your Doc A, and an unaccented search matches Docs A and B. We do that after lower-casing the token. There are some limitations: users might start to expect that they can freely add accents to restrict their search to acc

Re: Accented search

2008-03-11 Thread Peter Cline
I'm not sure about a way to boost scores in this case, but you can achieve the basic matching by applying a filter to the index and the queries. The ISOLatin1Accent Filter seems like it may work for you, though I'm not entirely certain if that will cover all the accent characters you need. M

Accented search

2008-03-10 Thread climbingrose
Hi guys, I'm running to some problems with accented (UTF-8) language. I'd love to hear some ideas about how to use Solr with those languages. Basically, I want to achieve what Google did with UTF-8 language. My requirements including: 1) Accent insensitive search and proper highlighting: For ex