Just for the archives - the removal of the "preservereOriginal" from the "query" analizer solved my problem.

Thank you, Jack!

You need to have separate "index" and "query" analyzers for that field type. The "query" analyzer would not have preserveOriginal="1", which would generate an extra term that would not match the exact term sequence that was indexed.

A query of "123 2012" would not split any terms and hence not generate the extra "preserved" term.

But a query of "123/2012" would actually query "123/2012 123 2012", which is not a term sequence that was indexed.

-- Jack Krupansky

-----Original Message----- From: Farkas István
Sent: Wednesday, October 17, 2012 8:58 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter and the dot character

Hello,

I've ran into an interesting problem. I am using Solr 3.5 on an Ubuntu
server.

I have some data with a code field, which contains some identifiers
(mostly) in the following format: E.123/2012.

I've set up a fieldType for this code field:

|<fieldType name="text_split" class="solr.TextField" positionIncrementGap="100">
<analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnNumerics="1" preserveOriginal="1" />
<filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>
|

If I search for the exact code ("E.123/2012."), I will get the expected
result. If I search for "123 2012", I also get the expected results. If
I search for the "123/2012" string, the result set is empty. Tried it
with catenateNumbers and catenateWords enabled, with the same results.

The interesting thing here is that using the field analysis tool, the
123/2012 gives a match if I select the "highlight matches" option. But
the same query yields nothing when I try to use it in the query debug
tool in the Solr admin. The query works if I use a wilcard search
(*123/2012*), but I would like to avoid that. What do I miss here?

Regards,
  Istvan

Reply via email to