Hi, Language detection cannot do that as of now. It would be a great improvement though. Language detectors are pluggable, perhaps if you know of a Java language detector which can do this we could plug it in? Or we could extend the current identifier with a capability of first splitting the text into chunks and then do langid on each chunk. If you'd like to open a JIRA for this, it will not be forgotten...
-- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 13. mars 2012, at 04:25, bing wrote: > Hi, all, > > I am using solr-langid(Solr3.5.0) to do language detection, and I hope > multiple languages in one text can be detected. > > The example text is: > 咖哩起源於印度。印度民間傳說咖哩是佛祖釋迦牟尼所創,由於咖哩的辛辣與香味可以幫助遮掩羊肉的腥騷,此舉即為用以幫助不吃豬肉與牛肉的印度人。在泰米爾語中,「kari」是「醬」的意思。在馬來西亞,kari也稱dal(當在mamak檔)。早期印度被蒙古人所建立的莫臥兒帝國(Mughal > Empire)所統治過,其間從波斯(現今的伊朗)帶來的飲食習慣,從而影響印度人的烹調風格直到現今。 > Curry (plural, Curries) is a generic term primarily employed in Western > culture to denote a wide variety of dishes originating in Indian, Pakistani, > Bangladeshi, Sri Lankan, Thai or other Southeast Asian cuisines. Their > common feature is the incorporation of more or less complex combinations of > spices and herbs, usually (but not invariably) including fresh or dried hot > capsicum peppers, commonly called "chili" or "cayenne" peppers. > > I want the text can be separated into two parts, and the part in Chinese > goes to "text_zh-tw" while the other one "text_en". Can I do something like > that? > > Thank you. > > Best Regards, > Bing > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-solr-langid-Solr3-5-0-detect-multiple-languages-in-one-text-tp3821210p3821210.html > Sent from the Solr - User mailing list archive at Nabble.com.