Is this really a text field where you want to search for tokenized keywords?
Or is it a string field where you wish strictly to deal with equality of the
entire string or explicit wildcards for substring matches, as you've show.
You haven't told us your full requirements for this field.
The standard tokenizer breaks the input into individual tokens or keywords.
Yes, you can use wildcards on those tokens, but only on one token at a time,
not two as you have shown.
You may want to consider two fields, such as cust and cust_str. The former
would be tokenized, like standard tokenizer and allow keyword search, but
the latter would be a single string or a single token. Either make the
latter a true string type, or use a TextField that uses the keyword
tokenizer, which preserves whitespace and special characters. You probably
shouldn't use the stop filter for the second field.
You'll have the explicitly escape the spaces in your queries using a
backslash. You can't enclose the query in quotes since that would disable
the wildcard.
You could also use regex queries on that field:
/.*san.m.*/
-- Jack Krupansky
-----Original Message-----
From: kobe.free.wo...@gmail.com
Sent: Friday, May 17, 2013 7:42 AM
To: solr-user@lucene.apache.org
Subject: Searching for terms having embedded white spaces like "word1 word2"
Hi Guys,
I have a field defined with the following custom data type,
<fieldType name="cust_str" class="solr.TextField" positionIncrementGap="100"
sortMissingLast="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
This field has values like "SAN MIGUEL","SAN JUAN","SAN DIEGO" etc. I wish
to perform a "Starts With" and "Contains" search on these values and I
perform the query in SOLR as follows,
-Starts With: field:SAN M*
-Contains: field:*SAN M*
But, the SOLR is not returning correct results because of the white space.
What modifications do I need to make in order to make the sreahces work for
the values with embedded white spaces?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Searching-for-terms-having-embedded-white-spaces-like-word1-word2-tp4064170.html
Sent from the Solr - User mailing list archive at Nabble.com.