Is WordDelimiterFilterFactory applicable to non-english language?

cyang2010 Mon, 14 Mar 2011 18:32:40 -0700

Does it make sense to apply WordDelimiterFilterFactory to non-english
language, such as spanish?  What about asian lanaguage?



The following are the typical use case for WordDelimiterFilterFactory.   Is
1, 2, 3, and 4 applicable to all wester language (including spanish)?   For
asian language, is 1, 2, and 4 applicable for asian lanauge, such as
Chinese?   Since 1 and 2 are based on alpha-numeric and letter-number, I am
not sure whether there is any alpha or letter in chinese character.

1. split on intra-word delimiters (all non alpha-numeric characters).
      "Wi-Fi" -> "Wi", "Fi"

2. split on case transition <-- only applicable for language with case,
right?

3. split on letter-number transition.   "SD500" -> "SD", "500"

4. leading and trailing intra-word delimiters on each subword are ignored

      "//hello---there, 'dude'" -> "hello", "there", "dude"

5. trailing "'s" are removed for each subword.   "O'Neil's" -> "O", "Neil"
    

Appreciate your help.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-WordDelimiterFilterFactory-applicable-to-non-english-language-tp2678199p2678199.html
Sent from the Solr - User mailing list archive at Nabble.com.

Is WordDelimiterFilterFactory applicable to non-english language?

Reply via email to