I want the following searches to work:
MyField:SDD_Expedition_PCB
This should match the word "SDD_Expedition_PCB" only, and not matching
individual words such as "SDD" or "Expedition", or "PCB".
And the following search:
MyField:SDD_Expedition*
Should match any word starting with "SDD_Expedition" and ending with anything
else such as "SDD_Expedition_PBC", "SDD_Expedition_One", "SDD_Expedition_Two",
"SDD_ExpeditionSolr", "SDD_ExpeditionSolr1.4", etc, but not matching individual
words such as "SDD" or "Expedition".
The field type for "MyField" is (the field name is keywords):
<field name="Keywords" type="text" indexed="true" stored="false"
required="false" multiValued="true"></field>
And here is the analyzer I'm using:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
-->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/> -->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0"
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Any help on how I can achieve the above is greatly appreciated.
Btw, if at all possible, I would like to be able to achieve this search without
having to change how I'm indexing / tokenizing the data. I'm looking for
search syntax to make this work.
-- JM
-----Original Message-----
From: Ahmet Arslan [mailto:[email protected]]
Sent: Tuesday, January 19, 2010 7:57 AM
To: [email protected]
Subject: Re: Tokenization and wild card search
> I have an issue and I'm not sure how to address it, so I
> hope someone can help me.
>
> I have the following text in one of my fields:
> "ABC_Expedition_ERROR".���When I search on it
> like: "MyField:SDD_Expedition_PCB" (without quotes) it will
> fail to find me only this word �ABC_Expedition_ERROR�
> which I think is due to tokenization because of the
> underscore.
Do you want or do not want your query MyField:SDD_Expedition_PCB to return
documents containing ABC_Expedition_ERROR?
> My solution is: "MyField:"SDD_Expedition_PCB"" (without the
> outer quotes, but quotes around the word
> �ABC_Expedition_ERROR�).� This works fine.�
> But then, how do I search on "SDD_Expedition_PCB" with wild
> card?� For example: "MyField:SDD_Expedition*" will not
> work.
Can you paste your field type of MyField? And give some examples what queries
should return what documents.