Hi,
So, Isn't advisable to use classicTokenizer and classicAnalyzer?
On Thu, Oct 19, 2017 at 8:29 PM, Erick Erickson
wrote:
> Have you looked at the specification to see how it's _supposed_ to work?
>
> From the javadocs:
> "implements Unicode text segmentation, * as specified by UAX#29."
>
Have you looked at the specification to see how it's _supposed_ to work?
>From the javadocs:
"implements Unicode text segmentation, * as specified by UAX#29."
See http://unicode.org/reports/tr29/#Word_Boundaries
If you look at the spec and feel that ClassicAnalyzer incorrectly
implements the wor
Hi,
I indexed a term 'ⒶeŘꝋꝒɫⱯŋɇ' (aeroplane) and the term was
indexed as "er l n", some characters were trimmed while indexing.
Here is my code
protected Analyzer.TokenStreamComponents createComponents(final String
fieldName, final Reader reader)
{
final ClassicTokenizer