Hi all, StandardTokenizer don't split the text with an apostrophe (punctuation mark ' ) and with a colon (punctuation mark : ).
Just to be clear looking at documentation all punctation marks are delimiters, with an exception for periods (dots), so I suppose that a pair of Italian word like "nell'aria" should be split in two words "nell" and "aria". So I have bypassed the problem using a WordDelimiterFilterFactory. Is this a bug or an undocumented behaviour? In any case, what to do next? Best regards, Vincenzo -- Vincenzo D'Amore email: v.dam...@gmail.com skype: free.dev mobile: +39 349 8513251