Hi Alex,

for me, ICUFoldingFilterFactory works very good. It does lowercasing and
removes diacritica (this is how umlauts and accenting of letters is
called - punctuation means comma, points etc.). It will work for any any
language, not only German. And it will also handle apostrophs as in
"C'est bien".

ICU requires additional libraries in the classpath. For an in-built solr
solution have a look at ASCIIFoldingFilterFactory.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory



Example configuration:
<fieldType name="text_sort" class="solr.TextField"
        positionIncrementGap="100">
        <analyzer>
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.ICUFoldingFilterFactory" />
        </analyzer>
</fieldType>

And dependencies (example for Maven) in addition to solr-core:
<dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-icu</artifactId>
        <version>${solr.version}</version>
        <scope>runtime</scope>
</dependency>
<dependency>
        <groupId>org.apache.solr</groupId>
        <artifactId>solr-analysis-extras</artifactId>
        <version>${solr.version}</version>
        <scope>runtime</scope>
</dependency>

Cheers,
Chantal

On Fri, 2012-01-13 at 00:09 +0100, alx...@aim.com wrote:
> Hello,
> 
> I would like to know if solr has a functionality to automatically search for 
> a different punctuation of a word. 
> For example if I if a user searches for a word Uber, and stemmer is german 
> lang, then solr looks for both Uber and  Über,  like in synonyms.
> 
> Is it possible to give a file with a list of possible substitutions of 
> letters to solr and have it search for all possible punctuations?
> 
> 
> Thanks.
> Alex.

Reply via email to