On 10/9/2013 12:57 PM, adm1n wrote:
My index contains documents which could be a single word or a short sentence
which contains up to 4-5 words. I need to return documents, which "starts
with" only from the searched pattern.
in regex it would be [^my_query].
for example, for a docs:
black
beautiful black cat
cat
cat is black
black cat
and for the query: "black"
only "black" and "black cat" should be returned.
The text field I'm using is as follows:
<fieldType name="text_general_aa" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Solr version is 4.2
thanks!
The presence of either the whitespace tokenizer or the NGram filter make
this impossible, because they both break the indexed value into smaller
pieces. Together, they *really* break things up. Matching is done on a
per-term basis, and these two components in your analysis chain ensure
that "black" will be a term for all of those input documents, whether it
appears at the beginning, middle, or end.
If you set up a copyField to a new field whose fieldType uses the
Keyword tokenizer (which treats the entire string as a single token) and
the lowercase filter, you would be able use the regex support in Solr
4.x and have this as your query string:
newfield:/^black/
Thanks,
Shawn