Maybe you are running into the same problem I posted on another message
thread about the hard-coded maxExpansions limit of 50. In other words, once
Lucene finds 50 terms that do match, it won't find the additional matches.
And that is not necessarily the top 50, but the first 50 in the index.
See if you can reproduce the problem with a small data set of no more than a
couple dozen documents.
-- Jack Krupansky
-----Original Message-----
From: Ryan Wilson
Sent: Thursday, May 16, 2013 9:28 AM
To: solr-user@lucene.apache.org
Subject: RE: Strange fuzzy behavior in 4.2.1
In answering your first questions, any changes we’ve been making have been
followed by a reindex.
The data that is being indexed generally looks something like this (<space>
indicating an actual space):
TIM <space> , <space> JULIO
JULIE <space> , <space> JIM
So based off what we see from looking at top terms in the field and the
analysis tool, at index time these records are being broken up such that
TIM , JULIO can be found with tim or Julio.
Just to make sure I’m not misunderstanding something about Solr/Lucene,
when a record is indexed the index analysis chain result (<tim> <,>
<julio>) is what is written to disk correct? So far as I understand it it’s
the query analysis chain that has the issue with most filters not being
applied during wildcard and fuzzy queries.
Finally, some clarification as I’ve realized my original email might not
have made this point well. I can have a particular record with a primary
key of X and a name value of LEWIS , JULIA and be able to find that exact
record with bulia~1 but not aulia~1, or GUERRERO , JULIAN , JULIAN can be
found with julan~1 but not julia~1. It’s not that records go missing when
searched for with fuzzy, but rather the fuzzy terms that will find them
seem, to my eyes, inconsistent.
Regards,
Ryan Wilson
rpwils...@gmail.com