Your problem has nothing to do with Japanese. Perhaps a content-type for CSV would work better?
On Sat, Mar 15, 2014 at 12:50 PM, Bala Iyer <grb...@yahoo.com> wrote: > Hi, > > I am new to Solr japanese. > I added the support for japanese on schema.xml > How can i insert Japanese text into that field either by solr client (java / > php / ruby ) or by curl > > > schema.xml > ==================================== > <field name="username" type="string" indexed="true" stored="true" > multiValued="true" omitNorms="true" termVectors="true" /> > <field name="timestamp" type="date" indexed="true" stored="true" > multiValued="true" omitNorms="true" termVectors="true" /> > <field name="jtxt" type="text_ja" indexed="true" stored="true" > multiValued="true" omitNorms="true" termVectors="true" /> > > <fieldType name="text_ja" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > <analyzer> > <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/> > > <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" > userDictionary="lang/userdict_ja.txt"/>--> > <!-- Reduces inflected verbs and adjectives to their base/dictionary > forms (辞書形) --> > <filter class="solr.JapaneseBaseFormFilterFactory"/> > <!-- Removes tokens with certain part-of-speech tags --> > <filter class="solr.JapanesePartOfSpeechStopFilterFactory" > tags="lang/stoptags_ja.txt" /> > <!-- Normalizes full-width romaji to half-width and half-width kana > to full-width (Unicode NFKC subset) --> > <filter class="solr.CJKWidthFilterFactory"/> > <!-- Removes common tokens typically not useful for search, but have > a negative effect on ranking --> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_ja.txt" /> > <!-- Normalizes common katakana spelling variations by removing any > last long sound character (U+30FC) --> > <filter class="solr.JapaneseKatakanaStemFilterFactory" > minimumLength="4"/> > <!-- Lower-cases romaji characters --> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > ==================================== > > my insert.csv file > > "id","username","timestamp","content","jtxt" > "999999999","xxxxx","2013-12-26T10:14:26Z","Hello ","マイ ドキュメント" > ========================= > I am trying to insert through curl it gives me error > curl > "http://localhost:8983/solr/collection1/update/csv?separator=,&commit=true" > -H "Content-Type: text/plain; charset=utf-8" --data-binary @insert.csv > > > ERROR > ---------------------------- > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">400</int><int > name="QTime">23</int >></lst><lst name="error"><str name="msg">Document is missing mandatory >>uniqueKey > field: id</str><int name="code">400</int></lst> > </response> > > I know i should not use "Content-Type as text/plain" > ========================= > > > Thanks