subject:"ClassicAnalyzer Behavior on accent character"

Re: ClassicAnalyzer Behavior on accent character

2017-10-20 Thread Chitra

Hi, So, Isn't advisable to use classicTokenizer and classicAnalyzer? On Thu, Oct 19, 2017 at 8:29 PM, Erick Erickson wrote: > Have you looked at the specification to see how it's _supposed_ to work? > > From the javadocs: > "implements Unicode text segmentation, * as specified by UAX#29." >

Re: ClassicAnalyzer Behavior on accent character

2017-10-19 Thread Erick Erickson

Have you looked at the specification to see how it's _supposed_ to work? >From the javadocs: "implements Unicode text segmentation, * as specified by UAX#29." See http://unicode.org/reports/tr29/#Word_Boundaries If you look at the spec and feel that ClassicAnalyzer incorrectly implements the wor

ClassicAnalyzer Behavior on accent character

2017-10-19 Thread Chitra

Hi, I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was indexed as "er l n", some characters were trimmed while indexing. Here is my code protected Analyzer.TokenStreamComponents createComponents(final String fieldName, final Reader reader) { final ClassicTokenizer