Re: Indexing languages, dataimporthandler

Teruhiko Kurosaka Tue, 22 Feb 2011 14:23:12 -0800

Greg,

You could use copyField to copy the column in question to 6 fields, one
for each of your 6 languages,
and hope they none of the analyzers do something reasonable without
crashing.
Or apply the white-space tokenizer and hope for the best?


If the column has long enough text, you could try a language detector.
My company, Basis Technology, sells one, and it can plug into Solr easily.
http://www.basistech.com/language-identification/


On 2/22/11 11:50 AM, "Greg Georges" <greg.geor...@biztree.com> wrote:

>Hello all,
>
>I have just gone through the mailing list and have set up my different
>field type analysers for my 6 different languages in my shema.xml. Here
>is my question. I am using the dataimporthandler to import data from my
>database into my index. In my table, the documentname column's data can
>be in any of the 6 languages. Lets say I want to index this data and
>apply the different language analysers for certain cases, what would be
>the best way in my case. The real problem is that I do not know the
>language of the string in the documentname column once I create my index,
>therefore I cannot apply the correct field type. Should I create a custom
>transformer?
>
>Thanks
>
>Greg

----
T. "Kuro" Kurosaka, 415-227-9600x122, 617-386-7122(direct)

Re: Indexing languages, dataimporthandler

Reply via email to