Pavel,

it depends on size of your documents corpus, complexity and types of
the queries you plan to use etc. I would recommend you to search for
the discussions on synonyms expansion in Lucene (index time vs. query
time tradeoffs etc.) since your problem is quite similar to that
(think Moskva vs. Moskwa). Unless you have a small corpus, I would go
with the second approach and expand the terms during the query time.
However, the first approach might be useful, too: say, you may want to
boost the score for the documents that naturally contain the word
'Moskva', so such a documents will be at the top of the result list.
Having both forms indexed will allow you to achieve this easily by
utilizing Solr's dismax query (to boost the results from the field
with the original terms):
http://localhost:8983/solr/select/?q=Moskva&defType=dismax&qf=text^10.0+text_translit^0.1
('text' field has the original Cyrillic tokens, 'text_translit' is for
transliterated ones)

-Alexander


2010/10/28 Pavel Minchenkov <char...@gmail.com>:
> Alexander,
>
> Thanks,
> What variat has better performance?
>
>
> 2010/10/28 Alexander Kanarsky <kanarsky2...@gmail.com>
>
>> Pavel,
>>
>> I think there is no single way to implement this. Some ideas that
>> might be helpful:
>>
>> 1. Consider adding additional terms while indexing. This assumes
>> conversion of Russian text to both "translit" and "wrong keyboard"
>> forms and index converted terms along with original terms (i.e. your
>> Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You
>> may re-use the same field (if you plan for a simple term queries) or
>> create a separate fields for the generated terms (better for phrase,
>> proximity queries etc. since it keeps the original text positional
>> info). Then the query could use any of these forms to fetch the
>> document. If you use separate fields, you'll need to expand/create
>> your query to search for them, of course.
>> 2. If you have to index just an original Russian text, you might
>> generate all term forms while analyzing the query, then you could
>> treat the converted terms as a synonyms and use the combination of
>> TermQuery for all term forms or the MultiPhraseQuery for the phrases.
>> For Solr in this case you probably will need to add a custom filter
>> similar to SynonymFilter.
>>
>> Hope this helps,
>> -Alexander
>>
>> On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov <char...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > When I'm trying to search Google with wrong keyboard layout -- it
>> corrects
>> > my query, example: http://www.google.ru/search?q=vjcrdf (I typed word
>> > "Moscow" in Russian but in English keyboard layout).
>> > <http://www.google.ru/search?q=vjcrdf>Also, when I'm searching using
>> > translit, It does the same: http://www.google.ru/search?q=moskva
>> >
>> > What is the right way to implement this feature in Solr?
>> >
>> > --
>> > Pavel Minchenkov
>> >
>>
>
>
>
> --
> Pavel Minchenkov
>

Reply via email to