Dear all,
before searching through the source code - maybe one of you can answer
this easily:
When and based on what are the tokenizer and filters applied when
copying fields? Can it happen that fields are analyzed twice (once when
creating the first field, and a second time when they are copied to the
another field)?
Here an example from my current setup:
I have the following types defined, in schema.xml:
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.LengthFilterFactory" min="2" max="5000" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="German" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
Used for those fields:
<field name="title" type="keyword" index="true" stored="true"
required="true" />
<field name="title_de" type="text_de" index="true" stored="false"
required="false" />
<field name="subtitle_text_de" type="text_de" index="true" stored="true"
required="false" />
<field name="dtext_de" type="text_de" index="true" stored="false"
required="false" />
Which are used to populate this field using the copy field directive:
<field name="all_text_de" type="text_de" indexed="true" stored="false"
multiValued="true" />
like that (that is what I do, now, at least):
<copyField source="title" dest="title_de" />
<copyField source="title" dest="all_text_de" />
<copyField source="subtitle_text_de" dest="all_text_de" />
<copyField source="dtext_de" dest="all_text_de" />
I am copying fields with different types to all_text_de, e.g. title is
different from subtitle_text_de. Is the valued copied to the destination
field the raw (input) value or the already analyzed one?
Thanks!
Chantal
--
Chantal Ackermann