Solr Japanese support

Bala Iyer Sat, 15 Mar 2014 09:51:07 -0700

Hi,

I am new to Solr japanese.
I added the support for japanese on schema.xml 
How can i insert Japanese text into that field either by solr client (java / 
php / ruby ) or by curl



schema.xml
====================================
    <field name="username" type="string" indexed="true" stored="true" 
multiValued="true" omitNorms="true" termVectors="true" />
    <field name="timestamp" type="date" indexed="true" stored="true" 
multiValued="true" omitNorms="true" termVectors="true" />
    <field name="jtxt" type="text_ja" indexed="true" stored="true" 
multiValued="true" omitNorms="true" termVectors="true" />

    <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" 
autoGeneratePhraseQueries="false">
      <analyzer>
        <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>

        <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" 
userDictionary="lang/userdict_ja.txt"/>-->
        <!-- Reduces inflected verbs and adjectives to their base/dictionary 
forms (辞書形) -->
        <filter class="solr.JapaneseBaseFormFilterFactory"/>
        <!-- Removes tokens with certain part-of-speech tags -->
        <filter class="solr.JapanesePartOfSpeechStopFilterFactory" 
tags="lang/stoptags_ja.txt" />
        <!-- Normalizes full-width romaji to half-width and half-width kana to 
full-width (Unicode NFKC subset) -->
        <filter class="solr.CJKWidthFilterFactory"/>
        <!-- Removes common tokens typically not useful for search, but have a 
negative effect on ranking -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="lang/stopwords_ja.txt" />
        <!-- Normalizes common katakana spelling variations by removing any 
last long sound character (U+30FC) -->
        <filter class="solr.JapaneseKatakanaStemFilterFactory" 
minimumLength="4"/>
        <!-- Lower-cases romaji characters -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
====================================

my insert.csv file

"id","username","timestamp","content","jtxt"
"999999999","xxxxx","2013-12-26T10:14:26Z","Hello ","マイ ドキュメント"
=========================
I am trying to insert through curl it gives me error 
curl 
"http://localhost:8983/solr/collection1/update/csv?separator=,&commit=true"; -H 
"Content-Type: text/plain; charset=utf-8" --data-binary @insert.csv


ERROR
----------------------------
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">400</int><int name="QTime">23</int
></lst><lst name="error"><str name="msg">Document is missing mandatory uniqueKey
 field: id</str><int name="code">400</int></lst>
</response>

I know i should not use "Content-Type as text/plain" 
=========================


Thanks

Solr Japanese support

Reply via email to