Pavel, it depends on size of your documents corpus, complexity and types of the queries you plan to use etc. I would recommend you to search for the discussions on synonyms expansion in Lucene (index time vs. query time tradeoffs etc.) since your problem is quite similar to that (think Moskva vs. Moskwa). Unless you have a small corpus, I would go with the second approach and expand the terms during the query time. However, the first approach might be useful, too: say, you may want to boost the score for the documents that naturally contain the word 'Moskva', so such a documents will be at the top of the result list. Having both forms indexed will allow you to achieve this easily by utilizing Solr's dismax query (to boost the results from the field with the original terms): http://localhost:8983/solr/select/?q=Moskva&defType=dismax&qf=text^10.0+text_translit^0.1 ('text' field has the original Cyrillic tokens, 'text_translit' is for transliterated ones)
-Alexander 2010/10/28 Pavel Minchenkov <char...@gmail.com>: > Alexander, > > Thanks, > What variat has better performance? > > > 2010/10/28 Alexander Kanarsky <kanarsky2...@gmail.com> > >> Pavel, >> >> I think there is no single way to implement this. Some ideas that >> might be helpful: >> >> 1. Consider adding additional terms while indexing. This assumes >> conversion of Russian text to both "translit" and "wrong keyboard" >> forms and index converted terms along with original terms (i.e. your >> Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You >> may re-use the same field (if you plan for a simple term queries) or >> create a separate fields for the generated terms (better for phrase, >> proximity queries etc. since it keeps the original text positional >> info). Then the query could use any of these forms to fetch the >> document. If you use separate fields, you'll need to expand/create >> your query to search for them, of course. >> 2. If you have to index just an original Russian text, you might >> generate all term forms while analyzing the query, then you could >> treat the converted terms as a synonyms and use the combination of >> TermQuery for all term forms or the MultiPhraseQuery for the phrases. >> For Solr in this case you probably will need to add a custom filter >> similar to SynonymFilter. >> >> Hope this helps, >> -Alexander >> >> On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov <char...@gmail.com> >> wrote: >> > Hi, >> > >> > When I'm trying to search Google with wrong keyboard layout -- it >> corrects >> > my query, example: http://www.google.ru/search?q=vjcrdf (I typed word >> > "Moscow" in Russian but in English keyboard layout). >> > <http://www.google.ru/search?q=vjcrdf>Also, when I'm searching using >> > translit, It does the same: http://www.google.ru/search?q=moskva >> > >> > What is the right way to implement this feature in Solr? >> > >> > -- >> > Pavel Minchenkov >> > >> > > > > -- > Pavel Minchenkov >