Does it make sense to apply WordDelimiterFilterFactory to non-english language, such as spanish? What about asian lanaguage?
The following are the typical use case for WordDelimiterFilterFactory. Is 1, 2, 3, and 4 applicable to all wester language (including spanish)? For asian language, is 1, 2, and 4 applicable for asian lanauge, such as Chinese? Since 1 and 2 are based on alpha-numeric and letter-number, I am not sure whether there is any alpha or letter in chinese character. 1. split on intra-word delimiters (all non alpha-numeric characters). "Wi-Fi" -> "Wi", "Fi" 2. split on case transition <-- only applicable for language with case, right? 3. split on letter-number transition. "SD500" -> "SD", "500" 4. leading and trailing intra-word delimiters on each subword are ignored "//hello---there, 'dude'" -> "hello", "there", "dude" 5. trailing "'s" are removed for each subword. "O'Neil's" -> "O", "Neil" Appreciate your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-WordDelimiterFilterFactory-applicable-to-non-english-language-tp2678199p2678199.html Sent from the Solr - User mailing list archive at Nabble.com.