In LUCENE-5472, Lucene was changed to throw an error if a term is too long, 
rather than just logging a message. I have fields with terms that are too long, 
but I don't care - I just want to ignore them and move on.

The recommended solution in the docs is to use LengthFilterFactory, but this 
limits the terms by the number of characters, rather than the number of UTF-8 
bytes. So you can't just do something clever like set max=32766, due to the 
possibility of multibyte characters.

So, is there a way of using LengthFilterFactory to do this such that an error 
will never be thrown? Thinking I could use some max less than 32766 / 3, but I 
want to be absolutely sure that there is not some edge case that is going to 
break. I guess I could just set it to something sane like 1000. Or is there 
another more direct solution to this problem?

-Michael

Reply via email to