Hi all,

StandardTokenizer don't split the text with an apostrophe (punctuation mark
' ) and with a colon (punctuation mark : ).

Just to be clear looking at documentation all punctation marks are
delimiters, with an exception for periods (dots), so I suppose that a pair
of Italian word like "nell'aria" should be split in two words "nell" and
"aria".

So I have bypassed the problem using a WordDelimiterFilterFactory.

Is this a bug or an undocumented behaviour? In any case, what to do next?

Best regards,
Vincenzo


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251

Reply via email to