SOLR - problems with non-english symbols when extracting HTML

kushti Fri, 25 Mar 2011 01:14:54 -0700

When I send plain utf-8 text to index(non-english text), all ok, but with
HTML I have wrong characters instead of non-ASCII symbols. So



$this->solr->extractContents($url,  strip_tags($code),
array("literal.url"=>$url,"fmap.content"=>"body"));

Works well, but just

$this->solr->extractContents($url,  $code,
array("literal.url"=>$url,"fmap.content"=>"body"));

not ! What's the problem ?

SOLR-PHP client used (code.google.com/p/solr-php-client/), but I think,
problem isn't here.

In both cases "text/plain" content-type noted in request(i've updated
standard lib code)

SOLR 1.4.1 / Tomcat 6 / Fedora 12

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-problems-with-non-english-symbols-when-extracting-HTML-tp2729126p2729126.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR - problems with non-english symbols when extracting HTML

Reply via email to