Re: Can solr-langid(Solr3.5.0) detect multiple languages in one text?

Jan Høydahl Tue, 13 Mar 2012 01:56:17 -0700

Hi,

Language detection cannot do that as of now. It would be a great improvement 
though. Language detectors are pluggable, perhaps if you know of a Java 
language detector which can do this we could plug it in? Or we could extend the 
current identifier with a capability of first splitting the text into chunks 
and then do langid on each chunk. If you'd like to open a JIRA for this, it 
will not be forgotten...


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 13. mars 2012, at 04:25, bing wrote:

> Hi, all, 
> 
> I am using solr-langid(Solr3.5.0) to do language detection, and I hope
> multiple languages in one text can be detected. 
> 
> The example text is: 
> 咖哩起源於印度。印度民間傳說咖哩是佛祖釋迦牟尼所創，由於咖哩的辛辣與香味可以幫助遮掩羊肉的腥騷，此舉即為用以幫助不吃豬肉與牛肉的印度人。在泰米爾語中，「kari」是「醬」的意思。在馬來西亞，kari也稱dal（當在mamak檔）。早期印度被蒙古人所建立的莫臥兒帝國（Mughal
> Empire）所統治過，其間從波斯（現今的伊朗）帶來的飲食習慣，從而影響印度人的烹調風格直到現今。
> Curry (plural, Curries) is a generic term primarily employed in Western
> culture to denote a wide variety of dishes originating in Indian, Pakistani,
> Bangladeshi, Sri Lankan, Thai or other Southeast Asian cuisines. Their
> common feature is the incorporation of more or less complex combinations of
> spices and herbs, usually (but not invariably) including fresh or dried hot
> capsicum peppers, commonly called "chili" or "cayenne" peppers.
> 
> I want the text can be separated into two parts, and the part in Chinese
> goes to "text_zh-tw" while the other one "text_en". Can I do something like
> that? 
> 
> Thank you. 
> 
> Best Regards, 
> Bing 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821210.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can solr-langid(Solr3.5.0) detect multiple languages in one text?

Reply via email to