Hi, Thank you so much for replying.
The MySQL database server is running on a Fedora Core 12 Machine with Hindi Language Support enabled. Details of the database are - ENGINE=MyISAM and DEFAULT CHARSET=utf8 Data is imported using the Solr DataImportHandler (mysql jdbc driver). In the schema.xml file the title field is defined as: <field name="title" type="text_general" indexed="true" stored="true""/> I tried saving the query results directly to a text file from the MySQL command prompt but it is not storing the results correctly. The file contains the following characters. à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥<8d>à ¤Åà ¤¾ Saur oorja First line of the data-config.xml is <?xml version="1.0" encoding="UTF-8"?> Please suggest what I have to do to solve this issue. Regards, Sanjailal KP On 5/22/12, KP Sanjailal <kpsanjai...@gmail.com> wrote: > Hi, > > Thank you so much for replying. > > The MySQL database server is running on a Fedora Core 12 Machine with Hindi > Language Support enabled. Details of the database are - ENGINE=3DMyISAM > and > DEFAULT CHARSET=3Dutf8 > > Data is imported using the Solr DataImportHandler (mysql jdbc driver). > In the schema.xml file the title field is defined as: > <field name="title" type="text_general" indexed="true" stored="true""/> > > I tried saving the query results directly to a text file from the MySQL > command prompt but it is not storing the results correctly. The file > contains the following characters. > à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥<8d>à ¤Åà ¤¾ Saur oorja > > First line of the data-config.xml is > <?xml version="1.0" encoding="UTF-8"?> > > Please suggest what I have to do to solve this issue. > > Regards, > > Sanjailal KP > > On 5/21/12, Jack Krupansky <j...@basetechnology.com> wrote: >> Is it possible that your text editor/display does not support UTF-8 >> encoding? >> >> Assuming the data is properly encoded, do you have the encoding="UTF-8" >> attribute in your DIH dataSource tag? >> >> -- Jack Krupansky >> >> -----Original Message----- >> From: KP Sanjailal >> Sent: Monday, May 21, 2012 7:37 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Indexing & Searching MySQL table with Hindi and English data >> >> Hi, >> >> Thank you so much for replying. >> >> The MySQL database server is running on a Fedora Core 12 Machine with >> Hindi >> Language Support enabled. Details of the database are - ENGINE=MyISAM >> and >> DEFAULT CHARSET=utf8 >> >> Data is imported using the Solr DataImportHandler (mysql jdbc driver). >> In the schema.xml file the title field is defined as: >> <field name="title" type="text_general" indexed="true" stored="true"/> >> >> I tried saving the query results directly to a text file from the MySQL >> command prompt but it is not storing the results correctly. The file >> contains the following characters. >> >> >> à ¤¸à ¥Åà ¤° à ¤Šà ¤°à ¥<8d>à ¤Åà ¤¾ Saur oorja >> >> Please suggest what I have to do to solve this issue. >> >> Regards, >> >> Sanjailal KP >> -- >> >> >> >> On Sun, May 20, 2012 at 6:59 AM, Lance Norskog <goks...@gmail.com> wrote: >> >>> Also, try saving data from a query into a file and verify that it is >>> UTF-8 and the characters are correct. >>> >>> On Fri, May 18, 2012 at 7:54 AM, Jack Krupansky >>> <j...@basetechnology.com> >>> wrote: >>> > Check the analyzers for the field types containing Hindi text to be >>> > sure >>> > that they are not using a character mapping or "folding" filter that >>> might >>> > mangle the Hindi characters. Post the field type, say for the "title" >>> field. >>> > >>> > Also, try manually (using curl or the post jar) adding a single >>> > document >>> > that has Hindi data and see if that works. >>> > >>> > -- Jack Krupansky >>> > >>> > -----Original Message----- From: KP Sanjailal >>> > Sent: Thursday, May 17, 2012 5:55 AM >>> > To: solr-user@lucene.apache.org >>> > Subject: Indexing & Searching MySQL table with Hindi and English data >>> > >>> > >>> > Hi, >>> > >>> > I tried to setup indexing of MySQL tables in Apache Solr 3.6. >>> > >>> > Everything works fine but text in Hindi script (only some 10% of total >>> > records) not getting indexed properly. >>> > >>> > A search with keyword in Hindi retrieve emptly result set. Also a >>> > retrieved hindi record displays junk characters. >>> > >>> > The database tables contains bibliographical details of books such as >>> > title, author, publisher, isbn, publishing place, series etc. and out >>> > of >>> > the total records about 10% of records contains text in Hindi in >>> > title, >>> > author, publisher fields. >>> > >>> > Example: >>> > >>> > *Search Results from MySQL using PHP* >>> > >>> > 1. >>> > <http://192.168.0.132/shared/biblio_view.php?bibid=26913&tab=opac> >>> > *Title:* सौर ऊर्जा Saur >>> > oorja<http://192.168.0.132/shared/biblio_view.php?bibid=26913&tab=opac> >>> > *Author(s):* विनोद कुमार मिश्र MISHRA (VK) *Material:* Books ** ** >>> > *Search Results from Apache Solr (searched using keyword in English)* >>> > >>> > 1. >>> > <http://192.168.0.132/test/biblio_view.php?bibid=26913&tab=opac> >>> > *Title:* सौर ऊरॠजा Saur >>> > oorja<http://192.168.0.132/test/biblio_view.php?bibid=26913&tab=opac> >>> > *Author(s):* विनोद कॠमार मिशॠर MISHRA >>> > (VK) >>> * >>> > Material:* Books >>> > >>> > >>> > How do I go about solving this language problem. >>> > >>> > Thanks in advace. >>> > >>> > K. P. Sanjailal >>> > -- >>> > >>> >>> >>> >>> -- >>> Lance Norskog >>> goks...@gmail.com >>> >> >> >