Re: Solr Japanese support

Benson Margulies Sun, 16 Mar 2014 15:25:59 -0700

Your problem has nothing to do with Japanese. Perhaps a content-type
for CSV would work better?


On Sat, Mar 15, 2014 at 12:50 PM, Bala Iyer <grb...@yahoo.com> wrote:
> Hi,
>
> I am new to Solr japanese.
> I added the support for japanese on schema.xml
> How can i insert Japanese text into that field either by solr client (java / 
> php / ruby ) or by curl
>
>
> schema.xml
> ====================================
>     <field name="username" type="string" indexed="true" stored="true" 
> multiValued="true" omitNorms="true" termVectors="true" />
>     <field name="timestamp" type="date" indexed="true" stored="true" 
> multiValued="true" omitNorms="true" termVectors="true" />
>     <field name="jtxt" type="text_ja" indexed="true" stored="true" 
> multiValued="true" omitNorms="true" termVectors="true" />
>
>     <fieldType name="text_ja" class="solr.TextField" 
> positionIncrementGap="100" autoGeneratePhraseQueries="false">
>       <analyzer>
>         <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/>
>
>         <!--<tokenizer class="solr.JapaneseTokenizerFactory" mode="search" 
> userDictionary="lang/userdict_ja.txt"/>-->
>         <!-- Reduces inflected verbs and adjectives to their base/dictionary 
> forms (辞書形) -->
>         <filter class="solr.JapaneseBaseFormFilterFactory"/>
>         <!-- Removes tokens with certain part-of-speech tags -->
>         <filter class="solr.JapanesePartOfSpeechStopFilterFactory" 
> tags="lang/stoptags_ja.txt" />
>         <!-- Normalizes full-width romaji to half-width and half-width kana 
> to full-width (Unicode NFKC subset) -->
>         <filter class="solr.CJKWidthFilterFactory"/>
>         <!-- Removes common tokens typically not useful for search, but have 
> a negative effect on ranking -->
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="lang/stopwords_ja.txt" />
>         <!-- Normalizes common katakana spelling variations by removing any 
> last long sound character (U+30FC) -->
>         <filter class="solr.JapaneseKatakanaStemFilterFactory" 
> minimumLength="4"/>
>         <!-- Lower-cases romaji characters -->
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> ====================================
>
> my insert.csv file
>
> "id","username","timestamp","content","jtxt"
> "999999999","xxxxx","2013-12-26T10:14:26Z","Hello ","マイ ドキュメント"
> =========================
> I am trying to insert through curl it gives me error
> curl 
> "http://localhost:8983/solr/collection1/update/csv?separator=,&commit=true"; 
> -H "Content-Type: text/plain; charset=utf-8" --data-binary @insert.csv
>
>
> ERROR
> ----------------------------
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader"><int name="status">400</int><int 
> name="QTime">23</int
>></lst><lst name="error"><str name="msg">Document is missing mandatory 
>>uniqueKey
>  field: id</str><int name="code">400</int></lst>
> </response>
>
> I know i should not use "Content-Type as text/plain"
> =========================
>
>
> Thanks

Re: Solr Japanese support

Reply via email to