Re: Solr - Match whole word only in text fields

Kydryavtsev Andrey Thu, 26 Dec 2013 20:19:41 -0800

Hi everybody!

Ahmet, do I get it correct - if I use this text_char_norm field type, for input 
"myName=aaa bbb" I'll index terms "myName", "aaa", "bbb"? So I'll match with 
query like "myName" or query like  "bbb", but not match with "myName aaa". I 
can use this type for query value, so split "myName aaa" into ( "myName" && 
"aaa") - and it will work. But this approach will give false positive match 
with "myName bbb". What do you think, how I can handle this? One of the  
approaches is to use in this field type KeywordTokenizer+ShingleFilter instead 
of WhitespaceTokenizerFactory, so tokens like "myName", "myName aaa", "myName 
aaa bbb", "aaa", "aaa bbb", "bbb" will be indexed, but it significantly 
increased index size in case of long values.


26.12.2013, 03:20, "Ahmet Arslan" <iori...@yahoo.com>:
> Hi Haya,
>
> With MappingCharFilter you can have full control over character set that you 
> want to split.
>
> in mappings.txt you will have
>
> ":" => " "
> "=" => " "
>
> Use the following type and see if it suits for your needs. Update 
> mappings.txt according to your needs.
>
>     <fieldType name="text_char_norm" class="solr.TextField" 
> positionIncrementGap="100" >
>       <analyzer>
>         <charFilter class="solr.MappingCharFilterFactory" 
> mapping="mappings.txt"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>       </analyzer>
>     </fieldType>
>
> On Sunday, December 22, 2013 9:19 PM, haya.axelrod <haya.axel...@gmail.com> 
> wrote:
> I have a text field that can contain very long values (like text files). I
> want to create field type for it (text, not string), in order to have
> something like "Match whole word only" in notepad++, but the delimiter
> should not be only white spaces. If i have:
>
> myName=aaa bbb
>
> I would like to get it for the following search strings "aaa", "bbb", "aaa
> bbb", "myName=aaa bbb", "myName", but not for "aa" or "ame=a" or "a bb".
> Another example is:
>
> <myName>aaa bbb</myName>
> Can i do this somehow?
>
> What should be my field type definition?
>
> The text can contain any character. Before search i'm escaping the search
> string using
> http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/util/ClientUtils.html
>
> Thanks
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Match-whole-word-only-in-text-fields-tp4107795.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr - Match whole word only in text fields

Reply via email to