Index with ItalianStemmer

Tommaso Teofili Fri, 03 Sep 2010 05:05:01 -0700

Hi all,
I am experiencing a strange behavior while indexing italian text (an indexed
not stored text field) when stemming with italian language:


<fieldType name="text" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">

        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory"

                ignoreCase="true"

                words="stopwords.txt"

                enablePositionIncrements="true"

                />

        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

         <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.SnowballPorterFilterFactory" language="Italian"
> protected="protwords.txt"/>

      </analyzer>


if I try to index the text field with the value:
"mi voglio documentare su Solr e sulla sua storia" (which means "I want to
study Solr and its history")
my search for "q=text:documentare" or for  "q=text:documento" turns out
nothing.
The biggest issue is that the first one, which was intended to work both if
stemming was and was not enabled, does not match any document

If I change the stemmer language to English and then reindex, the first of
the queries above succeeds as expected because no stemming is applied.

Does anyone know what could be the root cause or if I am missing something?
Thanks in advance for any help,
Tommaso

Index with ItalianStemmer

Reply via email to