Re: Applying Tokenizers and Filters to CopyFields

Martin Wunderlich Thu, 26 Mar 2015 07:11:25 -0700

Thanks so much, Erick and Michael, for all the additional explanation. 
The crucial information in the end turned out to be the one about the Default 
Search Field („df“). In solrconfig.xml this parameter was to point to the 
original text, which is why the expanded queries didn’t work. When I set the df 
parameter to one of the fields with the expanded text, the search works fine. I 
have also removed the copyField declarations.


It’s all working as expected now. Thanks again for the help. 

Cheers, 

Martin
 



> Am 25.03.2015 um 23:43 schrieb Erick Erickson <[email protected]>:
> 
> Martin:
> Perhaps this would help
> 
> indexed=true, stored=true
> field can be searched. The raw input (not analyzed in any way) can be
> shown to the user in the results list.
> 
> indexed=true, stored=false
> field can be searched. However, the field can't be returned in the
> results list with the document.
> 
> indexed=false, stored=true
> The field cannot be searched, but the contents can be returned in the
> results list with the document. There are some use-cases where this is
> desirable behavior.
> 
> indexed=false, stored=false
> The entire field is thrown out, it's just as if you didn't send the
> field to be indexed at all.
> 
> And one other thing, the copyField gets the _raw_ data not the
> analyzed data. Let's say you have two fields, "src" and "dst".
> copying from src to dest in schema.xml is identical to
> <add>
>  <doc>
>    <field name=src>original text</field>
>   <field name=dst>original text</field>
> </doc>
> </add>
> 
> that is, copyfield directives are not chained.
> 
> Also, watch out for your query syntax. Michael's comments are spot-on,
> I'd just add this:
> 
> http://localhost:8983/solr/windex/select?q=Sprache&fq=original&wt=json&indent=true
> 
> is kind of odd. Let's assume you mean "qf" rather than "fq". That
> _only_ matters if your query parser is "edismax", it'll be ignored in
> this case I believe.
> 
> You'd want something like
> q=src:Sprache
> or
> q=dst:Sprache
> or even
> http://localhost:8983/solr/windex/select?q=Sprache&df=src
> http://localhost:8983/solr/windex/select?q=Sprache&df=dst
> 
> where "df" is "default field" and the search is applied against that
> field in the absence of a field qualification like my first two
> examples.
> 
> Best,
> Erick
> 
> On Wed, Mar 25, 2015 at 2:52 PM, Michael Della Bitta
> <[email protected]> wrote:
>> I agree the terminology is possibly a little confusing.
>> 
>> Stored refers to values that are stored verbatim. You can retrieve them
>> verbatim. Analysis does not affect stored values.
>> Indexed values are tokenized/transformed and stored inverted. You can't
>> recover the literal analyzed version (at least, not easily).
>> 
>> If what you really want is to store and retrieve case folded versions of
>> your data as well as the original, you need to use something like a
>> UpdateRequestProcessor, which I personally am less familiar with.
>> 
>> 
>> On Wed, Mar 25, 2015 at 5:28 PM, Martin Wunderlich <[email protected]>
>> wrote:
>> 
>>> So, the pre-processing steps are applied under <analyzer type=„index“>.
>>> And this point is not quite clear to me: Assuming that I have a simple
>>> case-folding step applied to the target of the copyField: How or where are
>>> the lower-case tokens stored, if the text isn’t added to the index? How is
>>> the query supposed to retrieve the lower-case version?
>>> (sorry, if this sounds like a naive question, but I have a feeling that I
>>> am missing something really basic here).
>>> 
>> 
>> 
>> Michael Della Bitta
>> 
>> Senior Software Engineer
>> 
>> o: +1 646 532 3062
>> 
>> appinions inc.
>> 
>> “The Science of Influence Marketing”
>> 
>> 18 East 41st Street
>> 
>> New York, NY 10017
>> 
>> t: @appinions <https://twitter.com/Appinions> | g+:
>> plus.google.com/appinions
>> <https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts>
>> w: appinions.com <http://www.appinions.com/>

Re: Applying Tokenizers and Filters to CopyFields

Reply via email to