Re: Queries with wildcards

Chris Hostetter Fri, 18 Aug 2006 12:41:39 -0700

: For example, using the latest build (Aug 18) and the example documents, a 
search for
: Enterprise matches the SOLR1000 document, but a search for Enter* does not.


try searching for:    enter*

...this is a somewhat long standing anoyance with Lucene, that exists
because there's really no good way to deal with it -- when using
Wildcards, the Lucene QueryParser does not analyze the input -- if you ask
for a wildCard search on Enter*, a PrefixQuery is constructed with that
exact prefix, case an all.  But in this case, the default search field is
"text" which uses the LowerCaseFilter -- so you'll never get a match on a
prefix with an upersapce character.

The reason that the QueryParser doesn't attempt to analyze the input you
give it when doing a PrefixQuery, is because it might get analyzed in a
completley differnet way then the words that prefix "logically" matches
on.  Consider for example using a Porter stemmer on "enterprise" -- that
produces "enterpris" ... but if you asked for a prefix search for
"enterpris*", and the query parser analyzed "enterpris" then the
PorterStemmer would produce "enterpri"

The problem gets even worse when dealing with mid-word WildCards like
"Ent*prise" ... how can the QueryParser even approach trying to analyze
that input -- the * certianly isnt' aprt ofthe text, should it split it up
into two words and analyze them seperatly, and then rejoin them with a
Star in the middle?

In general, Wildcard queries are "hard" and only make sense on fields that
have very simplistic Index time analyzers (like WhitespaceAnalyzer)
-- even then you might want to use the LowercaseFilter and override the
QueryParser's getPrfixQuery and getWildCardQuery methods to do things like
lowercase the input string for certain fields so you don't get anoying
situations like enter* not matching Enterprise.


-Hoss

Re: Queries with wildcards

Reply via email to