Re: Query for German "Special Characters" (i.e., ä, ö, ß)

Marc Bechler Fri, 14 Sep 2007 11:39:31 -0700

Hi Tom,

thanks for your response -- and sorry for the newbie question, may soundsomehow silly ;-) . Here the quick result of the analysis UI:

Index for "really": 5* really. Query for "really": 5* really, 2* realli(from: EnglishPorterFilterFactory {protected=protwords.txt},RemoveDuplicatesTokenFilterFactory {})


For "this" everyting is completely fine.

Is a complete matching required between index and query or is a partialmatching also okay?


Thanks for helping me

 marc




Tom Hill schrieb:

Hi Marc,

Are you using the same stemmer on your queries that you use when indexing?

Try the analysis function in the admin UI, to see how things are stemmed for
indexing vs. querying. If they don't match for really and fünny, and do
match for kraßen, then that's your problem.

Tom


On 9/14/07, Marc Bechler <[EMAIL PROTECTED]> wrote:

Hi,

oops, the URIEncoding was lost during the update to tomcat 6.0.14.
Thanks for the advice.

But now I am really curioused. After indexing the document from scratch,
I have the effect that queries to "this" and "is" work fine, whereas
queries to "really" and "fünny" do not return the result. Fünnily ;-) ,
after extending my sometext to "This is really fünny kraßen.", queries
to "really" and "fünny" still do not work, but "kraßen" is found.
Now I am somehow confused -- hopefully anyone has a good explanation ;-)

Regards,

  marc

Tom Hill schrieb:

If you are using tomcat, try adding "URIEncoding="UTF-8" to your
tomcat connector.

<Connector port="8080" maxHttpHeaderSize="8192" maxThreads="150"
minSpareThreads="25" maxSpareThreads="75" enableLookups="false"
redirectPort="8443" acceptCount="100" connectionTimeout="20000"
disableUploadTimeout="true" URIEncoding="UTF-8" />

use the analysis page of the admin interface to check to see what's
 happening to your queries, too.

http://localhost:8080/solr/admin/analysis.jsp?highlight=on  (your
port # may vary)

Tom

On 9/13/07, Marc Bechler <[EMAIL PROTECTED]> wrote:

Hi SOLR kings,

I'm just playing around with queries, but I was not able to query
for any special characters like the German "Umlaute" (i.e., ä, ö,
ü). Maybe others might have the same effects and already found a
solution ;-)

Here is my example: I have one field called "sometext" of type
"text" (the one delivered with the SOLR example). I indexed a few
words similar to

<field name="sometext"> <![CDATA[ This is really fünny
]]></field>

Works fine, and searching for "really" shows the result and fünny
will be displayed correctly. However, the query for "fünny" using
the /solr/admin page is resolved (correctly) to the URL
...q=f%C3%BCnny... but does not find the document.

And now the question: Any ideas? ;-)

Cheers,

marc

Re: Query for German "Special Characters" (i.e., ä, ö, ß)

Reply via email to