Re: Problems querying Russian content

2007-06-28 Thread funtick
I know Russian better than Russians ;) I currently use default configuration for "dismax" provided by SOLR 1.1; I can add few URLs tonight to the crawler to see what happens. As I know, Lucene/Nutch can even define web page (pdf, txt, html) language by checking raw bytearray (raw HTTP Respon

Re: Problems querying Russian content

2007-06-28 Thread Daniel Alheiros
Thanks. Yes I will do it. So you may be the best person to talk about the Russian content indexing. :) My indexing process follows: 1. RussianTokenizer 2. RussianLowerCaseFilter 3. RussianStopFilter 4. RussianStemFilter Seems OK to me as I'm using the same structure used by the

Re: Problems querying Russian content

2007-06-28 Thread Daniel Alheiros
Thanks a lot! Now it is working. It was the Tomcat connector setup Regards, Daniel On 28.06.2007 17:19, "Chris Hostetter" <[EMAIL PROTECTED]> wrote: > > : You can also ensure the browser sends an utf8 encoded post by > : : It works even if the page the form is in is not an UTF-8 page. >

Re: Problems querying Russian content

2007-06-28 Thread funtick
Hi Danier, Ensure that UTF-8 is everywhere... SOLR, WebServer, AppServer, HTTP Headers, etc. And do not use q=Бамбарбиа Киркуду use this instead (encoded URL): q=%D0%91%D0%B0%D0%BC%D0%B1%D0%B0%D1%80%D0%B1%D0%B8%D0%B0+%D0%9A%D0%B8%D1%80%D0%BA%D1%83%D0%B4%D1%83 http://www.tokenizer.org is

Re: Problems querying Russian content

2007-06-28 Thread Chris Hostetter
: You can also ensure the browser sends an utf8 encoded post by : http://www.nabble.com/Cyrillic-characters-t1963293.html#a5402562 http://wiki.apache.org/solr/SolrTomcat (see URI charset section) -Hoss

Re: Problems querying Russian content

2007-06-28 Thread Jérôme Etévé
On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote: > I'm in trouble now about how to issue queries against Solr using in my "q" > parameter content in Russian (it applies to Chinese and Arabic as well). > > The problem is I can't send any Ru

Re: Problems querying Russian content

2007-06-28 Thread Yonik Seeley
On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote: I'm in trouble now about how to issue queries against Solr using in my "q" parameter content in Russian (it applies to Chinese and Arabic as well). The problem is I can't send any Russian special character in URL's because they don't fit in