: Subject: Accented Search in Solr
: References:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
Param,
Note that the original value will be stored even if ISOLatin1AccentFilter
removes the accept for indexing / matching purposes.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
> From: "S
not that I know of. Do note that whether the query has the accent filter
active or not MUST
be matched with the index-time filter. In other words, if you indexed with
the filter but
search without it or vice-versa you won't get the resultsyou expect.
Also note that no matter what, the original tex
climbingrose wrote:
Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;
public final Token next() throws IOException {
if (hasAccents) {
hasAccents = false;
return filteredToken;
}
Token t = input.nex
Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;
public final Token next() throws IOException {
if (hasAccents) {
hasAccents = false;
return filteredToken;
}
Token t = input.next();
String filte
Regarding indexing words with accented and unaccented characters with
positionIncrement zero:
Chris Hostetter wrote:
you don't really need a custom tokenizer -- just a buffered TokenFilter
that clones the original token if it contains accent chars, mutates the
clone, and then emits it next w
I've seen mention of these filters:
But I don't see them in the 1.2 distribution. Am I looking in the wrong
place? What will the UnicodeNormalizationFilterFactory do for me? I
can't find any documentation on it.
Thanks,
Phil
: It looks like a very promising approach for us. I'm going to implement
: an custom Tokeniser based on your suggestions and see how it goes. Thank
: you all for your comments!
you don't really need a custom tokenizer -- just a buffered TokenFilter
that clones the original token if it contains
Hi Peter,
It looks like a very promising approach for us. I'm going to implement an
custom Tokeniser based on your suggestions and see how it goes. Thank you
all for your comments!
Cheers
On Wed, Mar 12, 2008 at 2:37 AM, Binkley, Peter <[EMAIL PROTECTED]>
wrote:
> We've done this in a pre-Solr
okens to be useful?
>
> And how does this take care of scoring? Do you get a higher score with a
> closer match?
>
>
>
>
> -Original Message-
> From: Binkley, Peter [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 11, 2008 8:37 AM
> To: solr-user@lucene.apa
lr-user@lucene.apache.org
Subject: RE: Accented search
We've done this in a pre-Solr Lucene context by using the position
increment: when a token contains accented characters, you add a stripped
version of that token with a zero increment, so that for matching purposes
the original and the st
We've done this in a pre-Solr Lucene context by using the position increment:
when a token contains accented characters, you add a stripped version of that
token with a zero increment, so that for matching purposes the original and the
stripped version are at the same position. Accents are not s
I'm not sure about a way to boost scores in this case, but you can
achieve the basic matching by applying a filter to the index and the
queries. The ISOLatin1Accent Filter seems like it may work for you,
though I'm not entirely certain if that will cover all the accent
characters you need.
M
13 matches
Mail list logo