Dear all,

before searching through the source code - maybe one of you can answer this easily:

When and based on what are the tokenizer and filters applied when copying fields? Can it happen that fields are analyzed twice (once when creating the first field, and a second time when they are copied to the another field)?


Here an example from my current setup:
I have the following types defined, in schema.xml:

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.LengthFilterFactory" min="2" max="5000" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_de.txt" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        </analyzer>
        <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_de.txt" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.SnowballPorterFilterFactory" language="German" />
        <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        </analyzer>
</fieldType>

Used for those fields:

<field name="title" type="keyword" index="true" stored="true" required="true" /> <field name="title_de" type="text_de" index="true" stored="false" required="false" /> <field name="subtitle_text_de" type="text_de" index="true" stored="true" required="false" /> <field name="dtext_de" type="text_de" index="true" stored="false" required="false" />

Which are used to populate this field using the copy field directive:

<field name="all_text_de" type="text_de" indexed="true" stored="false"
                        multiValued="true" />

like that (that is what I do, now, at least):

<copyField source="title" dest="title_de" />
<copyField source="title" dest="all_text_de" />
<copyField source="subtitle_text_de" dest="all_text_de" />
<copyField source="dtext_de" dest="all_text_de" />


I am copying fields with different types to all_text_de, e.g. title is different from subtitle_text_de. Is the valued copied to the destination field the raw (input) value or the already analyzed one?


Thanks!
Chantal


--
Chantal Ackermann

Reply via email to