Thanks, Mark!

Mark Miller schrieb:
Its the pre-analyzed form thats copied. The field that its copied to will
determine the analyzer/filters for that field.
If you want to check out the code doing it, its
in org.apache.solr.update.DocumentBuilder

--
- Mark

http://www.lucidimagination.com

On Mon, Aug 3, 2009 at 8:12 AM, Chantal Ackermann <
chantal.ackerm...@btelligent.de> wrote:

Dear all,

before searching through the source code - maybe one of you can answer this
easily:

When and based on what are the tokenizer and filters applied when copying
fields? Can it happen that fields are analyzed twice (once when creating the
first field, and a second time when they are copied to the another field)?


Here an example from my current setup:
I have the following types defined, in schema.xml:

<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory" />
       <filter class="solr.LengthFilterFactory" min="2" max="5000" />
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" />
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.SnowballPorterFilterFactory" language="German"
/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
       </analyzer>
       <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory" />
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_de.txt" />
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
       <filter class="solr.LowerCaseFilterFactory" />
       <filter class="solr.SnowballPorterFilterFactory" language="German"
/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
       </analyzer>
</fieldType>

Used for those fields:

<field name="title" type="keyword" index="true" stored="true"
required="true" />
<field name="title_de" type="text_de" index="true" stored="false"
required="false" />
<field name="subtitle_text_de" type="text_de" index="true" stored="true"
required="false" />
<field name="dtext_de" type="text_de" index="true" stored="false"
required="false" />

Which are used to populate this field using the copy field directive:

<field name="all_text_de" type="text_de" indexed="true" stored="false"
                       multiValued="true" />

like that (that is what I do, now, at least):

<copyField source="title" dest="title_de" />
<copyField source="title" dest="all_text_de" />
<copyField source="subtitle_text_de" dest="all_text_de" />
<copyField source="dtext_de" dest="all_text_de" />


I am copying fields with different types to all_text_de, e.g. title is
different from subtitle_text_de. Is the valued copied to the destination
field the raw (input) value or the already analyzed one?


Thanks!
Chantal


--
Chantal Ackermann


--
Chantal Ackermann
Consultant

mobil    +49 (176) 10 00 09 45
email    chantal.ackerm...@btelligent.de

--------------------------------------------------------------------------------------------------------

b.telligent GmbH & Co. KG
Lichtenbergstraße 8
D-85748 Garching / München

fon       +49 (89) 54 84 25 60
fax        +49 (89) 54 84 25 69
web      www.btelligent.de

Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented by Sebastian Amtage and Klaus Blaschek
USt.Id.-Nr. DE814054803



Confidentiality Note
This email is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and exempt from disclosure under applicable law. If the reader of this email message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is prohibited. If you have received this email in error, please notify us immediately by telephone at +49 (0) 89 54 84 25 60. Thank you.

Reply via email to