Is it possible that your text editor/display does not support UTF-8
encoding?
Assuming the data is properly encoded, do you have the encoding="UTF-8"
attribute in your DIH dataSource tag?
-- Jack Krupansky
-----Original Message-----
From: KP Sanjailal
Sent: Monday, May 21, 2012 7:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing & Searching MySQL table with Hindi and English data
Hi,
Thank you so much for replying.
The MySQL database server is running on a Fedora Core 12 Machine with Hindi
Language Support enabled. Details of the database are - ENGINE=MyISAM and
DEFAULT CHARSET=utf8
Data is imported using the Solr DataImportHandler (mysql jdbc driver).
In the schema.xml file the title field is defined as:
<field name="title" type="text_general" indexed="true" stored="true"/>
I tried saving the query results directly to a text file from the MySQL
command prompt but it is not storing the results correctly. The file
contains the following characters.
à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥<8d>à ¤Åà ¤¾ Saur oorja
Please suggest what I have to do to solve this issue.
Regards,
Sanjailal KP
--
On Sun, May 20, 2012 at 6:59 AM, Lance Norskog <goks...@gmail.com> wrote:
Also, try saving data from a query into a file and verify that it is
UTF-8 and the characters are correct.
On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky <j...@basetechnology.com>
wrote:
> Check the analyzers for the field types containing Hindi text to be sure
> that they are not using a character mapping or "folding" filter that
might
> mangle the Hindi characters. Post the field type, say for the "title"
field.
>
> Also, try manually (using curl or the post jar) adding a single document
> that has Hindi data and see if that works.
>
> -- Jack Krupansky
>
> -----Original Message----- From: KP Sanjailal
> Sent: Thursday, May 17, 2012 5:55 AM
> To: solr-user@lucene.apache.org
> Subject: Indexing & Searching MySQL table with Hindi and English data
>
>
> Hi,
>
> I tried to setup indexing of MySQL tables in Apache Solr 3.6.
>
> Everything works fine but text in Hindi script (only some 10% of total
> records) not getting indexed properly.
>
> A search with keyword in Hindi retrieve emptly result set. Also a
> retrieved hindi record displays junk characters.
>
> The database tables contains bibliographical details of books such as
> title, author, publisher, isbn, publishing place, series etc. and out of
> the total records about 10% of records contains text in Hindi in title,
> author, publisher fields.
>
> Example:
>
> *Search Results from MySQL using PHP*
>
> 1.
> <http://192.168.0.132/shared/biblio_view.php?bibid=26913&tab=opac>
> *Title:* सौर ऊर्जा Saur
> oorja<http://192.168.0.132/shared/biblio_view.php?bibid=26913&tab=opac>
> *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** **
> *Search Results from Apache Solr (searched using keyword in English)*
>
> 1.
> <http://192.168.0.132/test/biblio_view.php?bibid=26913&tab=opac>
> *Title:* सौर ऊरॠजा Saur
> oorja<http://192.168.0.132/test/biblio_view.php?bibid=26913&tab=opac>
> *Author(s):* विनोद कॠमार मिशॠर MISHRA (VK)
*
> Material:* Books
>
>
> How do I go about solving this language problem.
>
> Thanks in advace.
>
> K. P. Sanjailal
> --
>
--
Lance Norskog
goks...@gmail.com