Re: Tokenize Sentence and Set Attribute

Jack Krupansky Mon, 06 May 2013 10:22:48 -0700

Sounds like a very ambitious project. I'm sure you COULD do it in Solr, butnot in very short order.


Check out some discussion of simply searching within sentences:
http://markmail.org/message/aoiq62a4mlo25zzk?q=apache#query:apache+page:1+mid:aoiq62a4mlo25zzk+state:results

First, how do you expect to use/query the corpus? In other words, what areyour user requirements? They will determine what structure the Solr index,analysis chains, and custom search components will need.


Also, check out the Solr OpenNLP wiki:
http://wiki.apache.org/solr/OpenNLP

And see "LUCENE-2899: Add OpenNLP Analysis capabilities as a module":
https://issues.apache.org/jira/browse/LUCENE-2899

-- Jack Krupansky

-----Original Message-----From: Rendy Bambang Junior

Sent: Monday, May 06, 2013 11:41 AM
To: solr-user@lucene.apache.org
Subject: Tokenize Sentence and Set Attribute

Hello,

I am trying to use part of speech tagger for bahasa Indonesia to filter
tokens in Solr.
The tagger receive input as word list of a sentence and return tag array.

I think the process should by like this:
- tokenize sentence
- tokenize word
- pass it into the tagger
- set attribute using tagger output
- pass it into a FilteringTokenFilter implementation

Is it possible to do this in Solr/Lucene? If it is, how?

I've read similar solution for Japanese language but since I am lack of
Japanese understanding, it couldn't help a lot.

--
Regards,
Rendy Bambang Junior
Informatics Engineering '09

Bandung Institute of Technology

Re: Tokenize Sentence and Set Attribute

Reply via email to