On 10/5/2010 6:34 PM, Ken Krugler wrote:

Is there any existing way to remove empty terms during analysis? I tried TrimFilterFactory but that made no difference.

You could use LengthFilterFactory to restrict terms to being at least one character long.

Is this a bug in PatternReplaceFilterFactory?

No, I don't believe so. PatternReplaceFilterFactory creates a PatternReplaceFilter, and the JavaDoc for that says:
Note: Depending on the input and the pattern used and the input TokenStream, this TokenFilter may produce Tokens whose text is the empty string.



That fixed it. Thank you. If I have time, I'll peek at the patternfilter source code and see if I can figure out how to make it optionally remove empty terms. For me, it's not terribly critical, because my database is the bottleneck in my indexing process, so Solr is much faster than the data coming in. For someone else, the time involved in another analyzer step might actually matter.

Shawn

Reply via email to