Tokenize Sentence and Set Attribute

Rendy Bambang Junior Mon, 06 May 2013 08:41:41 -0700

Hello,

I am trying to use part of speech tagger for bahasa Indonesia to filter
tokens in Solr.
The tagger receive input as word list of a sentence and return tag array.


I think the process should by like this:
- tokenize sentence
- tokenize word
- pass it into the tagger
- set attribute using tagger output
- pass it into a FilteringTokenFilter implementation

Is it possible to do this in Solr/Lucene? If it is, how?

I've read similar solution for Japanese language but since I am lack of
Japanese understanding, it couldn't help a lot.

-- 
Regards,
Rendy Bambang Junior
Informatics Engineering '09
Bandung Institute of Technology

Tokenize Sentence and Set Attribute

Reply via email to