What precisely do you mean by the term "exact search". I mean, Solr (and
Lucene) do not have that concept for tokenized text fields.
Or did you simply mean "quoted phrase". In which case, you need to be aware
that all the quotes do is assure that the terms occur in that order or in
close proximity according to the default or specified "phrase slop"
distance. But each term is still analyzed according to the analyzer for the
field.
Technically, Lucene will in fact analyze the full quoted phrase as one
stream, which for non-tokenized fields will be one term, but for any
tokenized fields which split on white space, the phrase will be broken into
separate tokens and special characters will tend to be removed as well. The
keyword tokenizer will indeed treat the entire phrase as a single token, and
the white space tokenizer will preserve special characters, but the standard
tokenizer will not preserve either white space or special characters.
Nominally, the keyword tokenizer does generate a single term at least at the
tokenization stage, but the world delimiter filter then splits individual
terms into multiple terms, thus guaranteeing that a phrase with white space
will be multiple terms and special characters are removed as well.
The other technicality is that quoting a phrase does prevent the phrase from
being interpreted as query parser syntax, such as AND and OR operators or
treating special characters as query parser operators.
But, the fact remains that a quoted phrase is not treated as an "exact"
string literal for any normal tokenized fields.
Out of curiosity, what references have lead you to believe that a quoted
phrase is an "exact match"?
Use a "string" (not "tokenized text") field if you wish to make an "exact
match" on a literal string, but the concept of "exact match" is not
supported for tokenized and filtered text fields.
So, please describe, in plain English, plus examples, exactly what you
expect your analyzer to do, both in terms of how it treats text to be
indexed and how you expect to be able to query that text.
-- Jack Krupansky
-----Original Message-----
From: Shay Sofer
Sent: Sunday, August 24, 2014 5:58 AM
To: solr-user@lucene.apache.org
Subject: Exact search with special characters
Hi all,
I have a docs that's indexed by text field with mention schema.
I have those docs names:
- Test host
- Test_host
- Test-host
- Test $host
When I'm trying to do exact search like: "test host"
All the results from above are shown as a results.
How can I use exact match so I'll will get only one result?
I prefer to do my changes in search time but if I need to change my schema
please offer that.
Thanks,
Shay.
This is my schema:
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
splitOnNumerics="0" splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
splitOnNumerics="0" splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>