Re: Russian stemmer

2010-07-27 Thread Dennis Gearon
TH all six cases and 3 genders. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Tue, 7/27/10, Robert Muir wrote: > From: Robert Muir > Subject: Re:

Re: Russian stemmer

2010-07-27 Thread Robert Muir
right, but your problem is this is the current output: Ковров -> Ковр Коврову -> Ковров Ковровом -> Ковров Коврове -> Ковров so, if Ковров was simply left alone, all your forms would match... 2010/7/27 Oleg Burlaca > Thanks Robert for all your help, > > The idea of ы[A-Z].* stopwords is ideal

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
Thanks Robert for all your help, The idea of ы[A-Z].* stopwords is ideal for the english language, although in russian nouns are inflected: Борис, Борису, Бориса, Борисом I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned it's more accurate). Once again thanks, Oleg Bur

Re: Russian stemmer

2010-07-27 Thread Robert Muir
2010/7/27 Oleg Burlaca > Actually the situation with Немцов из ок, > I've just checked how Yandex works with Немцов and Немцова: > http://nano.yandex.ru/project/inflect/ > > I think there are two solutions: > a) manually search for both Немцов and then Немцова > b) use wildcard query: Немцов* >

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
Actually the situation with Немцов из ок, I've just checked how Yandex works with Немцов and Немцова: http://nano.yandex.ru/project/inflect/ I think there are two solutions: a) manually search for both Немцов and then Немцова b) use wildcard query: Немцов* Robert, thanks for the RussianLightStemF

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
A similar word is Немцов. The strange thing is that searching for "Немцова" will not find documents containing "Немцов" Немцова: 14 articles http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0 Немцов: 74 articles http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B

Re: Russian stemmer

2010-07-27 Thread Oleg Burlaca
ely, in trunk there is an alternative russian stemmer > (RussianLightStemFilterFactory), which might give you less problems on > average, but I noticed it has this same problem with the example you gave. > > On Tue, Jul 27, 2010 at 4:25 AM, Robert Muir wrote: > > > All of you

Re: Russian stemmer

2010-07-27 Thread Robert Muir
another look, your problem is ковров itself... its mapped to ковр a workaround might be to use the protected words functionality to keep ковров and any other problematic people/geo names as-is. separately, in trunk there is an alternative russian stemmer (RussianLightStemFilterFactory), which

Re: Russian stemmer

2010-07-27 Thread Robert Muir
All of your examples stem to "ковров": assertAnalyzesTo(a, "Коврова Коврову Ковровом Коврове", new String[] { "ковров", "ковров", "ковров", "ковров" }); } Are you sure you enabled this at *both* index and query time? 2010/7/27 Oleg Burlaca > Hello, > > I'm using SnowballPorter

Russian stemmer

2010-07-27 Thread Oleg Burlaca
Hello, I'm using SnowballPorterFilterFactory with language="Russian". The stemming works ok except people names, geographical places. Here are some examples: searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове. Are there other stemming plugins for the russian language that

Re: Problem with Russian stemmer in Solr 1.2

2007-07-17 Thread Daniel Alheiros
Hi Andrew. This is an example for one FilterFactory: public class RussianStemFilterFactory extends BaseTokenFilterFactory { private String charset;/** * @see org.apache.solr.analysis.BaseTokenFilterFactory#init(java.util.Map) */ @Overridepublic void init(Map arg0){ super.i

Re: Problem with Russian stemmer in Solr 1.2

2007-07-17 Thread Andrew Stromnov
asons to not use directly the RussianAnalyzer was that I need > to > use an WhitespaceTokenizer removing HTML code... So I created my > factories. > > Regards, > Daniel > -- View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11646823 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with Russian stemmer in Solr 1.2

2007-07-10 Thread Daniel Alheiros
Hi Andrew Yes, I saw that. As I'm not knowledgeable in Russian I had to infer it was adequate. But as you have much more to add to it, it could be interesting if you could contribute that. The problem is Russian analyzer and it's filters are all final class, don't allowing an elegant extension. B

Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov
but I saw something like this. > > Please tell me if it works as expected and if it solves your problem (I’m > indexing Russian content and as you seem to be knowledgeable of Russian > language your comments are very useful). > > Regards, > Daniel > -- View this message

Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Daniel Alheiros
Hi Andrew In fact I did it creating all the Factories for Solr, but I think you can use it directly, changing your index like this: I’ve not tested that, but I saw something like this. Please tell me if it works as expected and if it solves your problem (I’

Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov
ease note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. > > > -- View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11505646 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Daniel Alheiros
Hi Andrew. I'm using the RussianAnalyzer (part of the Lucene analyzers) and it reduces списки to списк. Do you want to try this other Analyzer? Regards, Daniel On 9/7/07 16:06, "Andrew Stromnov" <[EMAIL PROTECTED]> wrote: > списки arrondissement turvallisuuden http://www.bbc.co.uk/ This e-m

Problem with Russian stemmer in Solr 1.2

2007-07-09 Thread Andrew Stromnov
- View this message in context: http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11503583 Sent from the Solr - User mailing list archive at Nabble.com.