Re: How to indexing non-english html text in unicode with Solr?

Grant Ingersoll Fri, 24 Apr 2009 04:03:02 -0700

See the Solr Cell contrib: http://wiki.apache.org/solr/ExtractingRequestHandler. Note, it's 1.4-dev only. If you want it for 1.3, you'll have touse Tika on the client side.


Solr does support Unicode indexing.


On Apr 24, 2009, at 2:22 AM, ahmed baseet wrote:

Hi All,
I'm trying to index some regional/non-eng html pages with Solr. Ithought of
indexing the corresponding unicode text for that page as Solr supports
Unicode indexing, right?
But I'm not able to extract Xml from the html page, because forposting toSolr we require Xml. Can anyone tell me any good method ofextracting Xmlfrom html or just let me know how to index non-english html pageswith Solr
that will enable me searching with unicode queries (for corresponding
regional query). Thanks in advance.

--Ahmed.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: How to indexing non-english html text in unicode with Solr?

Reply via email to