Another approach for this problem is to use another Solr core for storing users queries for auto complete functionality ( see http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ ) and index not only user_query field, but also transliterated and diff_layout versions and use dismax query parser to search suggestions in all fields.
This solution is only viable if you have huge log of user queries ( which I believe google does ). HTH, Alex 2010/10/29 Alexander Kanarsky <kanarsky2...@gmail.com>: > Pavel, > > it depends on size of your documents corpus, complexity and types of > the queries you plan to use etc. I would recommend you to search for > the discussions on synonyms expansion in Lucene (index time vs. query > time tradeoffs etc.) since your problem is quite similar to that > (think Moskva vs. Moskwa). Unless you have a small corpus, I would go > with the second approach and expand the terms during the query time. > However, the first approach might be useful, too: say, you may want to > boost the score for the documents that naturally contain the word > 'Moskva', so such a documents will be at the top of the result list. > Having both forms indexed will allow you to achieve this easily by > utilizing Solr's dismax query (to boost the results from the field > with the original terms): > http://localhost:8983/solr/select/?q=Moskva&defType=dismax&qf=text^10.0+text_translit^0.1 > ('text' field has the original Cyrillic tokens, 'text_translit' is for > transliterated ones) > > -Alexander > > > 2010/10/28 Pavel Minchenkov <char...@gmail.com>: >> Alexander, >> >> Thanks, >> What variat has better performance? >> >> >> 2010/10/28 Alexander Kanarsky <kanarsky2...@gmail.com> >> >>> Pavel, >>> >>> I think there is no single way to implement this. Some ideas that >>> might be helpful: >>> >>> 1. Consider adding additional terms while indexing. This assumes >>> conversion of Russian text to both "translit" and "wrong keyboard" >>> forms and index converted terms along with original terms (i.e. your >>> Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You >>> may re-use the same field (if you plan for a simple term queries) or >>> create a separate fields for the generated terms (better for phrase, >>> proximity queries etc. since it keeps the original text positional >>> info). Then the query could use any of these forms to fetch the >>> document. If you use separate fields, you'll need to expand/create >>> your query to search for them, of course. >>> 2. If you have to index just an original Russian text, you might >>> generate all term forms while analyzing the query, then you could >>> treat the converted terms as a synonyms and use the combination of >>> TermQuery for all term forms or the MultiPhraseQuery for the phrases. >>> For Solr in this case you probably will need to add a custom filter >>> similar to SynonymFilter. >>> >>> Hope this helps, >>> -Alexander >>> >>> On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov <char...@gmail.com> >>> wrote: >>> > Hi, >>> > >>> > When I'm trying to search Google with wrong keyboard layout -- it >>> corrects >>> > my query, example: http://www.google.ru/search?q=vjcrdf (I typed word >>> > "Moscow" in Russian but in English keyboard layout). >>> > <http://www.google.ru/search?q=vjcrdf>Also, when I'm searching using >>> > translit, It does the same: http://www.google.ru/search?q=moskva >>> > >>> > What is the right way to implement this feature in Solr? >>> > >>> > -- >>> > Pavel Minchenkov >>> > >>> >> >> >> >> -- >> Pavel Minchenkov >> >