Re: Is Running the Same Filters on Index and Query Redundant?

Andrea Gazzarini Wed, 15 Aug 2018 12:17:37 -0700

Hi Thomas,

as you know, the two analyzers play in a different moment, with adifferent input and a different goal for the corresponding output:


 * index analyzer: input is a field value, output is used for building
   the index
 * query analyzer: input is a (user) query string, output is used for
   building a (Solr) query

At index time a term dictionary is built, and a retrieval time theoutput query tries to find a match in that dictionary. I wouldn't callit "redundancy" because even if the filter is the same, it is applied toa different input and it has a different goal.

Some filters must be present both at index at query time becauseotherwise you won't find any match: if you put a lowercase filter onlyon the index side, queries with uppercase chars won't find any match.Some others don't (one example is the SynonymGraphFilter you've usedonly at query time). In general, everything depends on your needs andit's perfectly valid to have symmetric (index analyzer = query analyzer)and asymmetric text analysis (index analyzer != query analyzer).

Without knowing your context is very hard to guess if there's somethingwrong in the configuration. What is the part of the analyzers you thinkis redundant?

On top of that: in your chain the HTMLStripCharFilterFactory applied atquery time is something unusual, because while it makes perfectly senseat index time (where I guess you index some HTML source), at query timeI can't imagine a scenario where the user inputs queries containing HTMLtags.


Best,
Andrea

On 15/08/18 20:43, Zimmermann, Thomas wrote:

Hi,

We have the text field below configured on fields that are both stored and 
indexed. It seems to me that applying the same filters on both index and query 
would be redundant, and perhaps a waste of processing on the retrieval side if 
the filter work was already done on the index side. Is this a fair statement to 
make? Should I only be applying filters on one end of the transaction?

Thanks,
TZ


    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">

       <analyzer type="index">

         <charFilter class="solr.HTMLStripCharFilterFactory"/>

         <tokenizer class="solr.WhitespaceTokenizerFactory"/>

         <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />

         <filter class="solr.LowerCaseFilterFactory"/>

         <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>

         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

       </analyzer>

       <analyzer type="query">

         <charFilter class="solr.HTMLStripCharFilterFactory"/>

         <tokenizer class="solr.WhitespaceTokenizerFactory"/>

         <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>

         <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" />

         <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>

         <filter class="solr.LowerCaseFilterFactory"/>

         <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>

         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

       </analyzer>

     </fieldType>

Re: Is Running the Same Filters on Index and Query Redundant?

Reply via email to