I know Russian better than Russians ;)
I currently use default configuration for "dismax" provided by SOLR
1.1; I can add few URLs tonight to the crawler to see what happens. As
I know, Lucene/Nutch can even define web page (pdf, txt, html)
language by checking raw bytearray (raw HTTP Respon
Thanks.
Yes I will do it.
So you may be the best person to talk about the Russian content indexing. :)
My indexing process follows:
1. RussianTokenizer
2. RussianLowerCaseFilter
3. RussianStopFilter
4. RussianStemFilter
Seems OK to me as I'm using the same structure used by the
Thanks a lot!
Now it is working. It was the Tomcat connector setup
Regards,
Daniel
On 28.06.2007 17:19, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:
>
> : You can also ensure the browser sends an utf8 encoded post by
> : : It works even if the page the form is in is not an UTF-8 page.
>
Hi Danier,
Ensure that UTF-8 is everywhere... SOLR, WebServer, AppServer, HTTP
Headers, etc.
And do not use
q=Бамбарбиа
Киркуду
use this instead (encoded URL):
q=%D0%91%D0%B0%D0%BC%D0%B1%D0%B0%D1%80%D0%B1%D0%B8%D0%B0+%D0%9A%D0%B8%D1%80%D0%BA%D1%83%D0%B4%D1%83
http://www.tokenizer.org is
: You can also ensure the browser sends an utf8 encoded post by
: http://www.nabble.com/Cyrillic-characters-t1963293.html#a5402562
http://wiki.apache.org/solr/SolrTomcat (see URI charset section)
-Hoss
On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
> I'm in trouble now about how to issue queries against Solr using in my "q"
> parameter content in Russian (it applies to Chinese and Arabic as well).
>
> The problem is I can't send any Ru
On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
I'm in trouble now about how to issue queries against Solr using in my "q"
parameter content in Russian (it applies to Chinese and Arabic as well).
The problem is I can't send any Russian special character in URL's because
they don't fit in