Re: Schema.xml, copyField, Slash, ignoreCase ?

Erick Erickson Fri, 11 Jan 2019 08:18:31 -0800

The admin UI>>(select a core)>>analysis page is your friend here. It'll
show you exactly what each filter in your analysis chain does and from
there you'll need to mix and match filters, your tokenizer and the like
to support the use-cases you need.


My guess is that the field type you're using contains
WordDelimiterFilterFactory which is splitting up on the slash.
Similarly for your aribag/airbags problem, probably you have
one of the stemmers in your analysis chain.

See "Filter Descriptions" in your version of the ref guide.

And one caution: The admin>>core>>analysis chain
shows you what happens _after_ query parsing. So if
you enter (without quotes) "bing bong" those tokens
will be shown. What fools people is that the query _parser_
gets in there first, so they'll then wonder why
field:bing bong
doesn't work. It's because the parser made it into
field:bing default_field:bong. So you'll still (potentially)
have to quote or escape some terms on input, it depends
on the query parser you're using.

Best,
Erick

On Fri, Jan 11, 2019 at 1:40 AM Bruno Mannina
<bmann...@matheo-software.com> wrote:
>
> Hello,
>
>
>
> I’m facing a problem concerning the default field “text” (SOLR 5.4) and
> queries which contains / (slash)
>
>
>
> I need to have default “text” field with:
>
> - ignoreCase,
>
> - no auto truncation,
>
> - process slash char
>
>
>
> I would like to perform only query on the field “text”
>
> Queries can contain:  code or keywords or both.
>
>
>
> I have 2 fields named symbol and title, and 1 alias ti (old field that I
> can’t delete or modify)
>
>
>
> * Symbol contains code with slash (i.e A62C21/02)
>
> <field name="symbol" type="string_ci" multiValued="false" indexed="true"
> required="true" stored="true"/>
>
>
>
> * Title contains English text and also symbol
>
>     <field name="title" type="text_en" multiValued="true" indexed="true"
> stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
>
>
>
> { "symbol": "B65D81/20",
>
> "title": [
>
>  "under vacuum or superatmospheric pressure, or in a special atmosphere,
> e.g. of inert gas  {(B65D81/28  takes precedence; containers with
> pressurising means for maintaining ball pressure A63B39/025)} "
>
> ]}
>
>
>
> * Ti is an alias of title
>
>     <field name="ti" type="text_general" multiValued="true" indexed="true"
> stored="true" termVectors="true" termPositions="true" termOffsets="true"/>
>
>
>
> * Text is
>
> <field name="text" type="text_general" indexed="true" stored="false"
> multiValued="true"/>
>
>
>
> - Alias are:
>
>
>
>     <copyField source="title"  dest="ti"/>
>
>     <!-- ALIAS TEXT -->
>
>     <copyField source="title"  dest="text"/>
>
>     <copyField source="symbol" dest="text"/>
>
>
>
>
>
> If I do these queries :
>
>
>
> * ti:airbag                           à it’s ok
>
> * title:airbag                      à not good for me because it found
> airbags
>
> * ti:b65D81/28                  à not good, debug shows ti:b65d81 OR ti:28
>
> * ti:”b65D81/28”              à it’s ok
>
> * symbol:b65D81/28      à it’s ok (even without “ “)
>
>
>
> NOW with “text” field
>
> * b65D81/28                      à not good, debug shows text:b65d81 OR
> text:28
>
> * airbag                               à it’s ok
>
> * “b65D81/28”                  à it’s ok
>
>
>
> It will be great if I can enter symbol without “ “
>
>
>
> Could you help me to have a text field which solve this problem ? (please
> find below all def of my fields)
>
>
>
> Many thanks for your help.
>
>
>
> String_ci is my own definition
>
>
>
>     <fieldType name="string_ci" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>
>     <analyzer>
>
>       <tokenizer class="solr.KeywordTokenizerFactory"/>
>
>       <filter class="solr.LowerCaseFilterFactory"/>
>
>     </analyzer>
>
>     </fieldType>
>
>
>
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>
>       <analyzer type="index">
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>       </analyzer>
>
>       <analyzer type="query">
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>       </analyzer>
>
>     </fieldType>
>
>
>
>     <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>
>       <analyzer type="index">
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_en.txt"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>
>         <filter class="solr.PorterStemFilterFactory"/>
>
>       </analyzer>
>
>       <analyzer type="query">
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_en.txt"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>         <filter class="solr.EnglishPossessiveFilterFactory"/>
>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>
>         <filter class="solr.PorterStemFilterFactory"/>
>
>       </analyzer>
>
>     </fieldType>
>
>
>
>
>
> Best Regards
>
> Bruno
>
>
>
>
>
> ---
> L'absence de virus dans ce courrier électronique a été vérifiée par le 
> logiciel antivirus Avast.
> https://www.avast.com/antivirus

Re: Schema.xml, copyField, Slash, ignoreCase ?

Reply via email to