Another approach for this problem is to use another Solr core for
storing users queries for auto complete functionality ( see
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
) and index not only user_query field, but also transliterated and
diff_layout versions and use dismax query parser to search suggestions
in all fields.

This solution is only viable if you have huge log of user queries (
which I believe google does ).

HTH,
Alex



2010/10/29 Alexander Kanarsky <kanarsky2...@gmail.com>:
> Pavel,
>
> it depends on size of your documents corpus, complexity and types of
> the queries you plan to use etc. I would recommend you to search for
> the discussions on synonyms expansion in Lucene (index time vs. query
> time tradeoffs etc.) since your problem is quite similar to that
> (think Moskva vs. Moskwa). Unless you have a small corpus, I would go
> with the second approach and expand the terms during the query time.
> However, the first approach might be useful, too: say, you may want to
> boost the score for the documents that naturally contain the word
> 'Moskva', so such a documents will be at the top of the result list.
> Having both forms indexed will allow you to achieve this easily by
> utilizing Solr's dismax query (to boost the results from the field
> with the original terms):
> http://localhost:8983/solr/select/?q=Moskva&defType=dismax&qf=text^10.0+text_translit^0.1
> ('text' field has the original Cyrillic tokens, 'text_translit' is for
> transliterated ones)
>
> -Alexander
>
>
> 2010/10/28 Pavel Minchenkov <char...@gmail.com>:
>> Alexander,
>>
>> Thanks,
>> What variat has better performance?
>>
>>
>> 2010/10/28 Alexander Kanarsky <kanarsky2...@gmail.com>
>>
>>> Pavel,
>>>
>>> I think there is no single way to implement this. Some ideas that
>>> might be helpful:
>>>
>>> 1. Consider adding additional terms while indexing. This assumes
>>> conversion of Russian text to both "translit" and "wrong keyboard"
>>> forms and index converted terms along with original terms (i.e. your
>>> Analyzer/Filter should produce Moskva and Vjcrdf for term Москва). You
>>> may re-use the same field (if you plan for a simple term queries) or
>>> create a separate fields for the generated terms (better for phrase,
>>> proximity queries etc. since it keeps the original text positional
>>> info). Then the query could use any of these forms to fetch the
>>> document. If you use separate fields, you'll need to expand/create
>>> your query to search for them, of course.
>>> 2. If you have to index just an original Russian text, you might
>>> generate all term forms while analyzing the query, then you could
>>> treat the converted terms as a synonyms and use the combination of
>>> TermQuery for all term forms or the MultiPhraseQuery for the phrases.
>>> For Solr in this case you probably will need to add a custom filter
>>> similar to SynonymFilter.
>>>
>>> Hope this helps,
>>> -Alexander
>>>
>>> On Wed, Oct 27, 2010 at 1:31 PM, Pavel Minchenkov <char...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > When I'm trying to search Google with wrong keyboard layout -- it
>>> corrects
>>> > my query, example: http://www.google.ru/search?q=vjcrdf (I typed word
>>> > "Moscow" in Russian but in English keyboard layout).
>>> > <http://www.google.ru/search?q=vjcrdf>Also, when I'm searching using
>>> > translit, It does the same: http://www.google.ru/search?q=moskva
>>> >
>>> > What is the right way to implement this feature in Solr?
>>> >
>>> > --
>>> > Pavel Minchenkov
>>> >
>>>
>>
>>
>>
>> --
>> Pavel Minchenkov
>>
>

Reply via email to