The problem ist hat your transformation method needs Strings, but your
incrementToken method also has a serious bug: It does not respect the length of
the buffer, so it may hit additional garbage!
The easiest way to do this in lots less code and not having those bugs:
public boolean incrementToken() throws IOException {
if (!input.incrementToken()) {
return false;
}
final String normalizedLCcallnum =
getLCShelfkey(charTermAttr.toString());
charTermAttr.setEmpty().append(normalizedLCcallnum);
return true;
}
This fixes part of your performance problem: It does not 2 times convert the
result of your transformation between char arrays, Strings,..
To further improve speed, make the method getLCShelfKey directly operatate on
char[] and length.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]
> -----Original Message-----
> From: Osullivan L. [mailto:[email protected]]
> Sent: Friday, September 14, 2012 11:58 AM
> To: [email protected]
> Subject: Custom Filter Indexing Slow
>
> Hi Folks,
>
> I have a custom filter which does everything I need it to but it has reduced
> my
> indexing speed to a crawl. Are there any methods I need to call to clear /
> clean
> things up once my script (details below) has done it's work?
>
> Thanks,
>
> Luke
>
> public LCCNormalizeFilter(TokenStream input)
> {
> super(input);
> this.charTermAttr = addAttribute(CharTermAttribute.class);
> }
>
> public boolean incrementToken() throws IOException {
>
> if (!input.incrementToken()) {
> return false;
> }
>
> char[] buffer = charTermAttr.buffer();
> String rawLCcallnum = new String(buffer);
> String normalizedLCcallnum = getLCShelfkey(rawLCcallnum);
> char[] newBuffer = normalizedLCcallnum.toCharArray();
> charTermAttr.setEmpty();
> charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
> return true;
> }=