Thanks for the advice. I have tried the field type and it seems to do what it 
is supposed to in combination with a lower case filter.

However, that raises another slight problem:

German umlauts are supposed to be treated slightly different for the purpose of 
searching than for sorting. For sorting a normal ICUCollationField with 
standard rules should suffice*, for the purpose of searching I cannot just 
replace an "ü" with a "u", "ü" is supposed to equal "ue", or, in terms of 
RuleBasedCollators, there is a secondary difference.

The rules for the collator include:

& ue , ü
& ae , ä
& oe , ö
& ss , ß

(again, that applies to searching *only*, for the sorting the rule "& a , ä" 
would apply, which is implied in the default rules.)

I can of course program a filter that does these rudimentary replacements 
myself, at best after the lower case filter but before the ASCIIFoldingFilter, 
I am just wondering if there isn't some way to use collations keys for full 
text search.


________________

* even though Latin script and specifically German is my primary concern, I 
want some rudimentary support for all European languages, including ones that 
use Cyrillic and Greek script, special symbols in Icelandic that are not 
strictly Latin and ligatures like "Æ", which collation keys could easily 
provide.





Ahmet Arslan <iori...@yahoo.com.INVALID> schrieb am 22:10 Mittwoch, 20.Mai 2015:
Hi Bjorn,

solr.ICUCollationField is useful for *sorting*, and you cannot sort on 
tokenized fields.

Your example looks like diacritics insensitive search. 
Please see : ASCIIFoldingFilterFactory

Ahmet



On Wednesday, May 20, 2015 2:53 PM, Björn Keil <deeph...@web.de> wrote:
Hello,

might anyone suggest a field type with which I may do both a full text
search (i.e. there is an analyzer including a tokenizer) and apply a
collation?

An example for what I want to do:
There is a field "composer" for which I passed the value "Dvořák, Antonín".

I want the following queries to match:
composer:(antonín dvořák)
composer:dvorak
composer:"dvorak, antonin"

the latter case is possible using a solr.ICUCollationField, but that
type does not support an Analyzer and consequently no tokenizer, thus,
it is not helpful.

Unlike former versions of solr there do not seem to be
CollationKeyFilters which you may hang into the analyzer of a
solr.TextField... so I am a bit at a loss how I get *both* a tokenizer
and a collation at the same time.

Thanks for help,
Björn

Reply via email to