The alif and ayn can also be used as diacritic-like characters in Korean; this is a known practice. But thanks anyway.
On May 24, 2012, at 9:30 AM, Charles Riley wrote: > Hi Naomi, > > I don't have a conclusive answer for you on this yet, but let me pick up on a > few points. > > First, the apostrophe is probably being handled through ignoring punctuation > in the ICUCollationKeyFilterFactory. > > Alif isn't a diacritic but a letter, and its character properties would be > handled as such, apparently also outside the scope of what the folding filter > factory does unless it's tailored. > > From the solrwiki, this looks like a helpful rule of thumb: > > "When To use a CharFilter vs a TokenFilter > There are several pairs of CharFilters and TokenFilters that have related > (ie: MappingCharFilter and ASCIIFoldingFilter) or nearly identical > functionality (ie: PatternReplaceCharFilterFactory and > PatternReplaceFilterFactory) and it may not always be obvious which is the > best choice. > > The ultimate decision depends largely on what Tokenizer you are using, and > whether you need to "out smart" it by preprocessing the stream of characters. > > For example, maybe you have a tokenizer such as StandardTokenizer and you are > pretty happy with how it works overall, but you want to customize how some > specific characters behave. > > In such a situation you could modify the rules and re-build your own > tokenizer with javacc, but perhaps its easier to simply map some of the > characters before tokenization with a CharFilter." > > > Charles > > On Tue, May 15, 2012 at 2:47 PM, Naomi Dushay <ndus...@stanford.edu> wrote: > We are using the ICUFoldingFilterFactory with great success to fold > diacritics so searches with and without the diacritics get the same results. > > We recently discovered we have some Korean records that use an alif diacritic > instead of an apostrophe, and this diacritic is NOT getting folded. Has > anyone experienced this for alif or ayn characters? Do you have a solution? > > > - Naomi > > -- > You received this message because you are subscribed to the Google Groups > "solrmarc-tech" group. > To post to this group, send email to solrmarc-t...@googlegroups.com. > To unsubscribe from this group, send email to > solrmarc-tech+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/solrmarc-tech?hl=en. > > > > > -- > Charles L. Riley > Catalog Librarian for Africana > Sterling Memorial Library, Yale University > <zenodo...@gmail.com> > 203-432-7566 > > > -- > You received this message because you are subscribed to the Google Groups > "solrmarc-tech" group. > To post to this group, send email to solrmarc-t...@googlegroups.com. > To unsubscribe from this group, send email to > solrmarc-tech+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/solrmarc-tech?hl=en.