RE: Partial Word Search

Teague James Thu, 06 Feb 2014 08:32:51 -0800

Update: RESOLVED

On a hunch I decided to forego trying to separate the EdgeNGramFilterFactory
from this one column and apply it to all columns that are copied into the
'text' filed that Solr uses for searching. I moved the filter factory into
fieldType  'text_general' which is the type that 'text' uses. Everything
worked! Thanks for your help Jack!

-Teague

-----Original Message-----
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Wednesday, February 05, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial Word Search

1. The ngramming occurs in the index, but does not modify the original,
"stored" value that a query will return. So, "Example" will be returned even
though the index will have all the sub-terms indexed (but not stored.)

2. You need the ngram filters to be asymmetric with regard to indexing and
query - the index analyzer does ngramming, but the query analyzer will not. 
You have a single analyzer, which means that the query will be expanded into
a sequence of sub-terms, which will be ORed or ANDed depending on your
default query operator. OR will generally work since it will query for all
the sub-terms, but AND will only work if all the sub-terms occur in the
document field.

-- Jack Krupansky

-----Original Message-----
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search

I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fields to "text" for searching my expectation is that the NGramFilterFactory
would create ngrams for everything sent to it, which is unnecessary and
probably costly - right?

In an effort to try and troubleshoot the issue I created a new field in the
schema and stored it so that I could see what was getting populated.
However, what I'm finding is that no ngrams are being generated, just the
actual data that gets indexed from the database.

Here's what my setup looks like:
NOTE: Every record in my test environment has the same value "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
</fieldType>

When I query Solr it reports:
<arr name="PartialSubject">
<str>Example</str>
</arr>

I was expecting exa, exam, examp, example, example to be the values for
PartialSubject so that a search for "exam" would turn up all of the records
in this test index. Instead I get 0 results.

Can anyone provide any guidance on this please?

RE: Partial Word Search

Reply via email to