Re[2]: Implementing custom analyzer for multi-language stemming

2014-09-18 Thread roman-v1
- >If you reply to this email, your message will be added to the discussion >below: >http://lucene.472066.n3.nabble.com/Implementing-custom-analyzer-for-multi-language-stemming-tp4150156p4159594.html >To unsubscribe from Implementing custom analyzer for multi-language stemming, &g

Re: Implementing custom analyzer for multi-language stemming

2014-09-18 Thread roman-v1
Is there a way to set attribute in tokenizer to document to search by word and this attribute? -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-custom-analyzer-for-multi-language-stemming-tp4150156p4159594.html Sent from the Solr - User mailing list archive at

Re: Implementing custom analyzer for multi-language stemming

2014-09-18 Thread atawfik
evelop that. In another project, I am following the same approach to develop an AutoAnalyzer for Lucene without using Solr. So, let me know if you want directions in how to do it. Regards Ameer -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-custom-analyzer-for

Re: Implementing custom analyzer for multi-language stemming

2014-09-17 Thread roman-v1
-for-multi-language-stemming-tp4150156p4159550.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing custom analyzer for multi-language stemming

2014-08-06 Thread Rich Cariens
Yes, each token could have a LanguageAttribute on it, just like ScriptAttributes. I didn't *think* a span would be necessary. I would also add a multivalued "lang" field to the document. Searching English documents for "die" might look like: "q=die&lang=eng". The "lang" param could tell the Reques

Re: Implementing custom analyzer for multi-language stemming

2014-08-05 Thread TK
On 8/5/14, 8:36 AM, Rich Cariens wrote: Of course this is extremely primitive and basic, but I think it would be possible to write a CharFilter or TokenFilter that inspects the entire TokenStream to guess the language(s), perhaps even noting where languages change. Language and position informat

Re: Implementing custom analyzer for multi-language stemming

2014-08-05 Thread Rich Cariens
I've started a GitHub project to try out some cross-lingual analysis ideas ( https://github.com/whateverdood/cross-lingual-search). I haven't played over there for about 3 months, but plan on restarting work there shortly. In a nutshell, the interesting component ("SimplePolyGlotStemmingTokenFilter

Re: Implementing custom analyzer for multi-language stemming

2014-08-04 Thread TK
On 7/30/14, 10:47 AM, Eugene wrote: Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically and use Google Translate APIs a lot (so having arbitrary number of languages in our system doesn't

Re: Implementing custom analyzer for multi-language stemming

2014-08-02 Thread Umesh Prasad
> > > Cheers, > > -Chris. > > > > ------------ > > From: "Eugene" > > Sent: Wednesday, July 30, 2014 1:48 PM > > To: solr-user@lucene.apache.org > > Subject: Implementing custom analyzer for multi-language s

Re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Sujit Pal
-- > From: "Eugene" > Sent: Wednesday, July 30, 2014 1:48 PM > To: solr-user@lucene.apache.org > Subject: Implementing custom analyzer for multi-language stemming > > Hello, fellow Solr and Lucene users and developers! > > In our project we receive text fro

re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Chris Morley
ene" Sent: Wednesday, July 30, 2014 1:48 PM To: solr-user@lucene.apache.org Subject: Implementing custom analyzer for multi-language stemming Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically

Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Eugene
Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically and use Google Translate APIs a lot (so having arbitrary number of languages in our system doesn't concern us). However we need to be able