: Subject: Accented Search in Solr
: References:
: In-Reply-To:
http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
rom: "Sethi, Parampreet"
> To: "solr-user@lucene.apache.org"
> Sent: Fri, October 8, 2010 11:33:02 AM
> Subject: Accented Search in Solr
>
> Hi All,
>
> I am using Solr 1.3 in my project. Just wanted to know if there is any other
>way by which below
not that I know of. Do note that whether the query has the accent filter
active or not MUST
be matched with the index-time filter. In other words, if you indexed with
the filter but
search without it or vice-versa you won't get the resultsyou expect.
Also note that no matter what, the original tex
Hi All,
I am using Solr 1.3 in my project. Just wanted to know if there is any other
way by which below mentioned queries will return the same results:
Gruyère-and-Zucchini
Gruyere-and-Zucchini
The first query has accented characters in it. I was just going through the
Solr tokenizers and fi
climbingrose wrote:
Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;
public final Token next() throws IOException {
if (hasAccents) {
hasAccents = false;
return filteredToken;
}
Token t = input.nex
Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;
public final Token next() throws IOException {
if (hasAccents) {
hasAccents = false;
return filteredToken;
}
Token t = input.next();
String filte
Regarding indexing words with accented and unaccented characters with
positionIncrement zero:
Chris Hostetter wrote:
you don't really need a custom tokenizer -- just a buffered TokenFilter
that clones the original token if it contains accent chars, mutates the
clone, and then emits it next w
I've seen mention of these filters:
But I don't see them in the 1.2 distribution. Am I looking in the wrong
place? What will the UnicodeNormalizationFilterFactory do for me? I
can't find any documentation on it.
Thanks,
Phil
: It looks like a very promising approach for us. I'm going to implement
: an custom Tokeniser based on your suggestions and see how it goes. Thank
: you all for your comments!
you don't really need a custom tokenizer -- just a buffered TokenFilter
that clones the original token if it contains
not stripped from queries. The effect is that an accented search matches
> your Doc A, and an unaccented search matches Docs A and B. We do that after
> lower-casing the token.
>
> There are some limitations: users might start to expect that they can
> freely add accents to restr
okens to be useful?
>
> And how does this take care of scoring? Do you get a higher score with a
> closer match?
>
>
>
>
> -Original Message-
> From: Binkley, Peter [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 11, 2008 8:37 AM
> To: solr-user@lucene.apa
lr-user@lucene.apache.org
Subject: RE: Accented search
We've done this in a pre-Solr Lucene context by using the position
increment: when a token contains accented characters, you add a stripped
version of that token with a zero increment, so that for matching purposes
the original and the st
e not stripped from
queries. The effect is that an accented search matches your Doc A, and an
unaccented search matches Docs A and B. We do that after lower-casing the token.
There are some limitations: users might start to expect that they can freely
add accents to restrict their search to acc
I'm not sure about a way to boost scores in this case, but you can
achieve the basic matching by applying a filter to the index and the
queries. The ISOLatin1Accent Filter seems like it may work for you,
though I'm not entirely certain if that will cover all the accent
characters you need.
M
Hi guys,
I'm running to some problems with accented (UTF-8) language. I'd love to
hear some ideas about how to use Solr with those languages. Basically, I
want to achieve what Google did with UTF-8 language.
My requirements including:
1) Accent insensitive search and proper highlighting:
For ex
15 matches
Mail list logo