Re: matching "starts with" only

Shawn Heisey Wed, 09 Oct 2013 12:47:13 -0700

On 10/9/2013 12:57 PM, adm1n wrote:

My index contains documents which could be a single word or a short sentence
which contains up to 4-5 words. I need to return documents, which "starts
with" only from the searched pattern.
in regex it would be [^my_query].


for example, for a docs:

black
beautiful black cat
cat
cat is black
black cat

and for the query: "black"

only "black" and "black cat" should be returned.

The text field I'm using is as follows:
<fieldType name="text_general_aa" class="solr.TextField"
positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.NGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.NGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
     </fieldType>
Solr version is 4.2

thanks!

The presence of either the whitespace tokenizer or the NGram filter makethis impossible, because they both break the indexed value into smallerpieces. Together, they *really* break things up. Matching is done on aper-term basis, and these two components in your analysis chain ensurethat "black" will be a term for all of those input documents, whether itappears at the beginning, middle, or end.

If you set up a copyField to a new field whose fieldType uses theKeyword tokenizer (which treats the entire string as a single token) andthe lowercase filter, you would be able use the regex support in Solr4.x and have this as your query string:


newfield:/^black/

Thanks,
Shawn

Re: matching "starts with" only

Reply via email to