Hi Clemens, I recently added typeahead functionality to something I'm playing with and I used the EdgeNGramFilterFactory to help. I just tried this out after adding a doc with "Chamäleon" in my title.
I was able to get "Chamäleon", with a capital C, returned I searched for chama, Chama, chamã, and Chamã. Here's what I have in my files: ----------------- solrconfig.xml: <requestHandler name="/suggest_movie" class="solr.SearchHandler"> <lst name="defaults"> <str name="wt">json</str> <str name="defType">edismax</str> <str name="rows">10</str> <str name="omitHeader">true</str> <!-- keeping the response as lean as possible so not returning header info.. --> <str name="fl">value:title</str> <!-- only returning 'title', and I want that key to be called 'value' in the response.. --> <str name="qf">title^10 suggest_ngram</str> <!-- boosting title to show on top if exact match with query.. --> </lst> </requestHandler> ----------------- schema.xml: <fieldType name="text_suggest_ngram" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.UAX29URLEmailTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.ASCIIFoldingFilterFactory" /> <filter class="solr.EnglishPossessiveFilterFactory" /> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="10" /> <!-- create edge n-grams of each term when indexing, not when querying.. --> </analyzer> <analyzer type="query"> <tokenizer class="solr.UAX29URLEmailTokenizerFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.ASCIIFoldingFilterFactory" /> <filter class="solr.EnglishPossessiveFilterFactory" /> </analyzer> </fieldType> ... <field name="suggest_ngram" type="text_suggest_ngram" indexed="true" stored="false" /> ... <copyField source="title" dest="suggest_ngram" /> ----------------- request: http://localhost:8983/solr/movies/suggest_movie?q=chama ----------------- response: { "response": { "numFound": 1, "start": 0, "docs": [ { "value": "Chamäleon" } ] } } Hope this helps? Ryan On Tue Dec 09 2014 at 7:21:02 AM Michael Sokolov < msoko...@safaribooksonline.com> wrote: > Clemens -- > > what I do (see suggestions of titles of books on $EMPLOYER's web > site) is to define a field with no analysis (type=keyword, use > KeywordAnalyzer) and build the suggestions from that. Then tell AIS to > use an analyzer internally to pick out word from that (StandardAnalyzer, > or WhitespaceAnalyzer, with LowerCaseFilter - however you want the > matching to work in the suggester). It will return the terms from the > source field. > > You didn't show the definition of your "suggest" field - I expect it > must be analyzed, right? Just don't do that. > > -Mike > > On 12/09/2014 08:58 AM, Clemens Wyss DEV wrote: > > Thanks for all the insightful links. > > I tried http://www.cominvent.com/2012/01/25/super-flexible-autocompl > ete-with-solr but that approach returns searchresults instead of > term-suggestions. > > > > I have (at the moment) a solution based on http://wiki.apache.org/solr/ > TermsComponent . But I might want multi-term-suggestions (and fuzzyness). > > Therefore I'd be very much interested how AnalyzingInfixLookupFactory > (or any other suggest-component) would allow to > > a) return case-sensitive suggestions (i.e. as-indexed/stored) > > b) allow case-insensitive suggestion-lookup > > ? > > Anybody else doing what I'd like to do? > > > > -----Ursprüngliche Nachricht----- > > Von: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] > > Gesendet: Montag, 8. Dezember 2014 19:25 > > An: solr-user@lucene.apache.org > > Betreff: Re: AW: Keeping capitalization in suggestions? > > > > Hi Clemens, > > > > There a a number of ways to implement auto complete/suggest. Some of > them pull data from indexed terms, therefore they will be lowercased. Some > pull data from stored values, therefore capitalisation is preserved. > > > > Here are great resources on this topic. > > > > https://lucidworks.com/blog/auto-suggest-from-popular-querie > s-using-edgengrams/ > > http://blog.trifork.com/2012/02/15/different-ways-to-make-au > to-suggestions-with-solr/ > > http://www.cominvent.com/2012/01/25/super-flexible-autocompl > ete-with-solr/ > > > > Ahmet > > > > > > On Monday, December 8, 2014 5:43 PM, Clemens Wyss DEV < > clemens...@mysign.ch> wrote: > > > > Allthough making use of AnalyzingInfixSuggester I still getting "either > or". > > > > When lowercase-filter is active I always get suggestions, BUT they are > lowercased (i.e. "chamäleon"). > > When lowercase-filter is not active I only get suggestions when querying > "Chamä" > > > > my solrconfig.xml > > ... > > <requestHandler class="org.apache.solr.handler.component.SearchHandler" > name="/suggest"> > > <lst name="defaults"> > > <str name="echoParams">none</str> > > <str name="wt">json</str> > > <str name="indent">false</str> > > <str name="spellcheck">true</str> > > <str name="spellcheck.dictionary">suggestDictionary</str> > > <str name="spellcheck.onlyMorePopular">true</str> > > <str name="spellcheck.count">5</str> > > <str name="spellcheck.collate">false</str> > > </lst> > > <arr name="components"> > > <str>suggest</str> > > </arr> > > </requestHandler> > > ... > > <searchComponent class="solr.SpellCheckComponent" name="suggest"> > > <lst name="spellchecker"> > > <str name="name">suggestDictionary</str> > > <str name="classname">org.apache.solr.spelling.suggest. > Suggester</str> > > <str name="lookupImpl">org.apache.solr.spelling.suggest.fst. > AnalyzingInfixLookupFactory</str> > > <str name="dictionaryImpl">org.apache.solr.spelling.suggest. > DocumentDictionaryFactory</str> > > <str name="field">suggest</str> > > <str name="buildOnCommit">true</str> > > <str name="storeDir">suggester</str> > > <str name="suggestAnalyzerFieldType">text_suggest</str> > > <str name="minPrefixChars">4</str> > > </lst> > > </searchComponent> > > ... > > > > my schema.xml > > ... > > <field indexed="true" multiValued="true" name="suggest" stored="false" > type="text_suggest"/> ... > > <fieldType class="solr.TextField" name="text_suggest" > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > > <!-- <filter class="solr.LowerCaseFilterFactory"/> --> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/> > > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > > <!-- <filter class="solr.LowerCaseFilterFactory"/> --> > > </analyzer> > > </fieldType> > > ... > > > > > > -----Ursprüngliche Nachricht----- > > Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] > > Gesendet: Donnerstag, 4. Dezember 2014 14:05 > > An: solr-user@lucene.apache.org > > Betreff: Re: Keeping capitalization in suggestions? > > > > Have a look at AnalyzingInfixSuggester - it does what you want. > > > > -Mike > > > > On 12/4/14 3:05 AM, Clemens Wyss DEV wrote: > >> When I index a text such as "Chamäleon" and look for suggestions for > "chamä" and/or "Chamä", I'd expect to get "Chamäleon" (uppercased). > >> But what happens is > >> > >> If lowecasefilter (see below (1)) set > >> "chamä" returns "chamäleon" > >> "Chamä" does not match > >> > >> If lowecasefilter (1) not set > >> "Chamä" returns "Chamäleon" > >> "chamä" does not match > >> > >> I guess lowecasefilter should not be set/active, but then how do I get > matches even if the search term is lowercased? > >> > >> Context: > >> schema.xml > >> ... > >> <fieldType class="solr.TextField" name="text_de" > positionIncrementGap="100"> > >> <analyzer type="index"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_de.txt"/> > >> <filter class="solr.GermanLightStemFilterFactory"/> > >> </analyzer> > >> <analyzer type="query"> > >> <tokenizer class="solr.StandardTokenizerFactory"/> > >> <filter class="solr.SynonymFilterFactory" expand="true" > ignoreCase="true" synonyms="synonyms.txt"/> > >> <filter class="solr.LowerCaseFilterFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_de.txt"/> > >> <filter class="solr.GermanLightStemFilterFactory"/> > >> </analyzer> > >> </fieldType> > >> ... > >> <fieldType class="solr.TextField" name="text_suggest" > positionIncrementGap="100"> > >> <analyzer> > >> <tokenizer class="solr.UAX29URLEmailTokenizerFactory"/> > >> <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > >> <filter class="solr.LowerCaseFilterFactory"/> <!-- (1) --> > >> </analyzer> > >> </fieldType> > >> > >> solrconfig.xml > >> ----------------- > >> ... > >> <requestHandler > >> class="org.apache.solr.handler.component.SearchHandler" > name="/suggest"> > >> <lst name="defaults"> > >> <str name="echoParams">none</str> > >> <str name="wt">json</str> > >> <str name="indent">false</str> > >> <str name="spellcheck">true</str> > >> <str name="spellcheck.dictionary">suggestDictionary</str> > >> <str name="spellcheck.onlyMorePopular">true</str> > >> <str name="spellcheck.count">5</str> > >> <str name="spellcheck.collate">false</str> > >> </lst> > >> <arr name="components"> > >> <str>suggest</str> > >> </arr> > >> </requestHandler> > >> ... > >> <searchComponent class="solr.SpellCheckComponent" name="suggest"> > >> <lst name="spellchecker"> > >> <str name="name">suggestDictionary</str> > >> <str name="classname">org.apache.solr.spelling.suggest. > Suggester</str> > >> <str name="lookupImpl">org.apache.s > olr.spelling.suggest.fst.FSTLookupFactory</str> > >> <str name="field">suggest</str> > >> <float name="threshold">0.</float> > >> <str name="buildOnCommit">true</str> > >> </lst> > >> </searchComponent> > >> ... > >> > >