True, no argument there as to usage. I should have clarified that the encoding of the character used for alif (02BE) carries with it an assigned property in the Unicode database of (Lm), putting it into the category of 'Modifier_Letter', which contrasts with the property (Sk), 'Modifier_Symbol', a property assigned to characters that are more commonly used as diacritics.
I think the inclusion of characters into the filter factories was determined based on these properties as assigned, though yes, there's often a broader range of uses that each character is actually used for. Charles On Thu, May 24, 2012 at 1:41 PM, Naomi Dushay <ndus...@stanford.edu> wrote: > The alif and ayn can also be used as diacritic-like characters in Korean; > this is a known practice. But thanks anyway. > > On May 24, 2012, at 9:30 AM, Charles Riley wrote: > > Hi Naomi, > > I don't have a conclusive answer for you on this yet, but let me pick up > on a few points. > > First, the apostrophe is probably being handled through ignoring > punctuation in the ICUCollationKeyFilterFactory. > > Alif isn't a diacritic but a letter, and its character properties would be > handled as such, apparently also outside the scope of what the folding > filter factory does unless it's tailored. > > From the solrwiki, this looks like a helpful rule of thumb: > > "When To use a CharFilter vs a TokenFilter > > There are several pairs of CharFilters and TokenFilters that have related > (ie: MappingCharFilter and ASCIIFoldingFilter) or nearly identical > functionality (ie: PatternReplaceCharFilterFactory and > PatternReplaceFilterFactory) and it may not always be obvious which is the > best choice. > > The ultimate decision depends largely on what Tokenizer you are using, and > whether you need to "out smart" it by preprocessing the stream of > characters. > > For example, maybe you have a tokenizer such as StandardTokenizer and you > are pretty happy with how it works overall, but you want to customize how > some specific characters behave. > In such a situation you could modify the rules and re-build your own > tokenizer with javacc, but perhaps its easier to simply map some of the > characters before tokenization with a CharFilter." > > > Charles > > On Tue, May 15, 2012 at 2:47 PM, Naomi Dushay <ndus...@stanford.edu>wrote: > >> We are using the ICUFoldingFilterFactory with great success to fold >> diacritics so searches with and without the diacritics get the same results. >> >> We recently discovered we have some Korean records that use an alif >> diacritic instead of an apostrophe, and this diacritic is NOT getting >> folded. Has anyone experienced this for alif or ayn characters? Do you >> have a solution? >> >> >> - Naomi >> >> -- >> You received this message because you are subscribed to the Google Groups >> "solrmarc-tech" group. >> To post to this group, send email to solrmarc-t...@googlegroups.com. >> To unsubscribe from this group, send email to >> solrmarc-tech+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/solrmarc-tech?hl=en. >> >> > > > -- > *Charles L. Riley* > *Catalog Librarian for Africana* > *Sterling Memorial Library, Yale University* > *<**zenodo...@gmail.com* <zenodo...@gmail.com>*>* > *203-432-7566* > > > -- > You received this message because you are subscribed to the Google Groups > "solrmarc-tech" group. > To post to this group, send email to solrmarc-t...@googlegroups.com. > To unsubscribe from this group, send email to > solrmarc-tech+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/solrmarc-tech?hl=en. > > > -- > You received this message because you are subscribed to the Google Groups > "solrmarc-tech" group. > To post to this group, send email to solrmarc-t...@googlegroups.com. > To unsubscribe from this group, send email to > solrmarc-tech+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/solrmarc-tech?hl=en. > -- *Charles L. Riley* *Catalog Librarian for Africana* *Sterling Memorial Library, Yale University* *<**zenodo...@gmail.com* <zenodo...@gmail.com>*>* *203-432-7566*