On 10/5/2010 6:34 PM, Ken Krugler wrote:
Is there any existing way to remove empty terms during analysis? I
tried TrimFilterFactory but that made no difference.
You could use LengthFilterFactory to restrict terms to being at least
one character long.
Is this a bug in PatternReplaceFilterFactory?
No, I don't believe so. PatternReplaceFilterFactory creates a
PatternReplaceFilter, and the JavaDoc for that says:
Note: Depending on the input and the pattern used and the input
TokenStream, this TokenFilter may produce Tokens whose text is the
empty string.
That fixed it. Thank you. If I have time, I'll peek at the
patternfilter source code and see if I can figure out how to make it
optionally remove empty terms. For me, it's not terribly critical,
because my database is the bottleneck in my indexing process, so Solr is
much faster than the data coming in. For someone else, the time
involved in another analyzer step might actually matter.
Shawn