Hi, I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was indexed as "er l n", some characters were trimmed while indexing.
Here is my code protected Analyzer.TokenStreamComponents createComponents(final String fieldName, final Reader reader) { final ClassicTokenizer src = new ClassicTokenizer(getVersion(), reader); src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH); TokenStream tok = new ClassicFilter(src); tok = new LowerCaseFilter(getVersion(), tok); tok = new StopFilter(getVersion(), tok, stopwords); tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive search return new Analyzer.TokenStreamComponents(src, tok) { @Override protected void setReader(final Reader reader) throws IOException { src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH); super.setReader(reader); } }; } Am I missing anything? Is that expected behavior for my input or any reason behind such abnormal behavior? -- Regards, Chitra