On 10/5/2010 10:38 PM, Shawn Heisey wrote:
That fixed it. Thank you. If I have time, I'll peek at the patternfilter source code and see if I can figure out how to make it optionally remove empty terms. For me, it's not terribly critical, because my database is the bottleneck in my indexing process, so Solr is much faster than the data coming in. For someone else, the time involved in another analyzer step might actually matter.
I looked into the code for 1.4.1. PatternReplaceFilter works by overriding incrementToken, and I could not figure out a way in that context to remove the token. Looking into how other things remove tokens, I found that LengthFilter is using a deprecated API call, and RemoveDuplicates does it by overriding TokenStream. Changing how PatternReplaceFilter is implemented is perhaps more than I am prepared to tackle.
If someone knows a way within incrementToken to remove a token, let me know and I will give it a try. I will also look into the branch_3x code and see if I can find something helpful there.
Thanks, Shawn