Re: Arabic analyser

2015-11-11 Thread Mahmoud Almokadem
rafa...@gmail.com > > > > > wrote: > > > > > If this is for a significant project and you are ready to pay for it, > > > BasisTech has commercial solutions in this area I believe. > > > > > > Regards, > > >Alex. > > > &

Re: Arabic analyser

2015-11-11 Thread David Murgatroyd
RPs and even a newsletter: > > http://www.solr-start.com/ > > > > > > On 10 November 2015 at 08:46, Mahmoud Almokadem > > wrote: > > > Thanks Pual, > > > > > > Arabic analyser applying filters of normalisation and stemming only for > &

Re: Arabic analyser

2015-11-11 Thread Mahmoud Almokadem
> Regards, >Alex. > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 10 November 2015 at 08:46, Mahmoud Almokadem > wrote: > > Thanks Pual, > > > > Arabic analyser applying filters of norma

Re: Arabic analyser

2015-11-10 Thread Alexandre Rafalovitch
wrote: > Thanks Pual, > > Arabic analyser applying filters of normalisation and stemming only for > single terms out of standard tokenzier. > Gathering all synonyms will be hard work. Should I customise my Tokenizer > to handle this case? > > Sincerely, > Mahmoud > >

Re: Arabic analyser

2015-11-10 Thread Mahmoud Almokadem
Thanks Pual, Arabic analyser applying filters of normalisation and stemming only for single terms out of standard tokenzier. Gathering all synonyms will be hard work. Should I customise my Tokenizer to handle this case? Sincerely, Mahmoud On Tue, Nov 10, 2015 at 3:06 PM, Paul Libbrecht wrote

Re: Arabic analyser

2015-11-10 Thread Paul Libbrecht
Mahmoud, there is an arabic analyzer: https://wiki.apache.org/solr/LanguageAnalysis#Arabic doesn't it do what you describe? Synonyms probably work there too. Paul > Mahmoud Almokadem > 9 novembre 2015 17:47 > Thanks Jack, > > This is a good solution, but we have

Re: Arabic analyser

2015-11-09 Thread Mahmoud Almokadem
Thanks Jack, This is a good solution, but we have more combinations that I think can’t be handled as synonyms like every word starts with ‘عبد’ ‘Abd’ and ‘أبو’ ‘Abo’. When using Standard tokenizer on ‘أبو بكر’ ‘Abo Bakr’, It’ll be tokenised to ‘أبو’ and ‘بكر’ and the filters will be applied fo

Re: Arabic analyser

2015-11-09 Thread Jack Krupansky
Use an index-time (but not query time) synonym filter with a rule like: Abd Allah,Abdallah This will index the combined word in addition to the separate words. -- Jack Krupansky On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem wrote: > Hello, > > We are indexing Arabic content and facing a p

Arabic analyser

2015-11-09 Thread Mahmoud Almokadem
Hello, We are indexing Arabic content and facing a problem for tokenizing multi terms phrases like 'عبد الله' 'Abd Allah', so users will search for 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد الله' with space. We are using StandardTokenizer. Is there any configurations