Hi Steve, Many thanks for this field, I will test it this afternoon in my dev' server.
Thanks also for your explanation ! Have a nice day ! Bruno -----Message d'origine----- De : Steve Rowe [mailto:sar...@gmail.com] Envoyé : vendredi 11 janvier 2019 17:43 À : solr-user@lucene.apache.org Objet : Re: Schema.xml, copyField, Slash, ignoreCase ? Hi Bruno, ignoreCase: Looks like you already have achieved this? auto truncation: This is caused by inclusion of PorterStemFilterFactory in your "text_en" field type. If you don't want its effects (i.e. treating different forms of the same word interchangeably), remove the filter. process slash char: I think you want the slash to be included in symbol terms rather than interpreted as a term separator. One way to achieve this is to first, pre-tokenization, convert the slash to a string that does not include a term separator, and then post-tokenization, convert the substituted string back to a slash. Here's a version of your text_en that uses PatternReplaceCharFilterFactory[1] to convert slashes inside of symbol-ish terms (the pattern is a guess based on the symbol text you've provided; you'll likely need to adjust it) to "_": a string unlikely to otherwise occur, and which will not be interpreted by StandardTokenizer as a term separator; and then PatternReplaceFilterFactory[1] to convert "_" back to slashes. Note that the patterns for the two are slightly different, since the *char filter* is given as input the entire field text, while the *filter* is given the text of single terms. ----- <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b([A-Za-z]\d+[A-Za-z]\d+)/(\d+)\b" replacement="$1_$2"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^([A-Za-z]\d+[A-Za-z]\d+)_(\d+)$" replacement="$1/$2"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\b([A-Za-z]\d+[A-Za-z]\d+)/(\d+)\b" replacement="$1_$2"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="^([A-Za-z]\d+[A-Za-z]\d+)_(\d+)$" replacement="$1/$2"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> ----- [1] http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.4.pdf -- Steve > On Jan 11, 2019, at 4:18 AM, Bruno Mannina <bmann...@matheo-software.com> > wrote: > > I need to have default “text” field with: > > - ignoreCase, > > - no auto truncation, > > - process slash char > > > > I would like to perform only query on the field “text” > > Queries can contain: code or keywords or both. > > > > I have 2 fields named symbol and title, and 1 alias ti (old field that > I can’t delete or modify) > > > > * Symbol contains code with slash (i.e A62C21/02) > > <field name="symbol" type="string_ci" multiValued="false" indexed="true" > required="true" stored="true"/> > > > > * Title contains English text and also symbol > > <field name="title" type="text_en" multiValued="true" indexed="true" > stored="true" termVectors="true" termPositions="true" > termOffsets="true"/> > > > > { "symbol": "B65D81/20", > > "title": [ > > "under vacuum or superatmospheric pressure, or in a special > atmosphere, e.g. of inert gas {(B65D81/28 takes precedence; > containers with pressurising means for maintaining ball pressure A63B39/025)} > " > > ]} > > > > * Ti is an alias of title > > <field name="ti" type="text_general" multiValued="true" indexed="true" > stored="true" termVectors="true" termPositions="true" > termOffsets="true"/> > > > > * Text is > > <field name="text" type="text_general" indexed="true" stored="false" > multiValued="true"/> > > > > - Alias are: > > > > <copyField source="title" dest="ti"/> > > <!-- ALIAS TEXT --> > > <copyField source="title" dest="text"/> > > <copyField source="symbol" dest="text"/> > > > > > > If I do these queries : > > > > * ti:airbag à it’s ok > > * title:airbag à not good for me because it found > airbags > > * ti:b65D81/28 à not good, debug shows ti:b65d81 OR ti:28 > > * ti:”b65D81/28” à it’s ok > > * symbol:b65D81/28 à it’s ok (even without “ “) > > > > NOW with “text” field > > * b65D81/28 à not good, debug shows text:b65d81 OR > text:28 > > * airbag à it’s ok > > * “b65D81/28” à it’s ok > > > > It will be great if I can enter symbol without “ “ > > > > Could you help me to have a text field which solve this problem ? > (please find below all def of my fields) > > > > Many thanks for your help. > > > > String_ci is my own definition > > > > <fieldType name="string_ci" class="solr.TextField" > sortMissingLast="true" omitNorms="true"> > > <analyzer> > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <fieldType name="text_general" class="solr.TextField" > positionIncrementGap="100" multiValued="true"> > > <analyzer type="index"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > </fieldType> > > > > <fieldType name="text_en" class="solr.TextField" > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_en.txt"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.EnglishPossessiveFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.StandardTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_en.txt"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.EnglishPossessiveFilterFactory"/> > > <filter class="solr.KeywordMarkerFilterFactory" > protected="protwords.txt"/> > > <filter class="solr.PorterStemFilterFactory"/> > > </analyzer> > > </fieldType> > > > > > > Best Regards > > Bruno > > > > > > --- > L'absence de virus dans ce courrier électronique a été vérifiée par le > logiciel antivirus Avast. > https://www.avast.com/antivirus