i find UpdateRequestProcessors ( http://wiki.apache.org/solr/UpdateRequestProcessor) a handy way to add and remove NLP-related fields to a document as it is processed by Solr. this is also how UIMA integrates with Solr (http://wiki.apache.org/solr/SolrUIMA). you might want to take a look at UIMA as well.
On Mon, May 6, 2013 at 6:22 PM, Jack Krupansky <j...@basetechnology.com>wrote: > Sounds like a very ambitious project. I'm sure you COULD do it in Solr, > but not in very short order. > > Check out some discussion of simply searching within sentences: > http://markmail.org/message/**aoiq62a4mlo25zzk?q=apache#** > query:apache+page:1+mid:**aoiq62a4mlo25zzk+state:results<http://markmail.org/message/aoiq62a4mlo25zzk?q=apache#query:apache+page:1+mid:aoiq62a4mlo25zzk+state:results> > > First, how do you expect to use/query the corpus? In other words, what > are your user requirements? They will determine what structure the Solr > index, analysis chains, and custom search components will need. > > Also, check out the Solr OpenNLP wiki: > http://wiki.apache.org/solr/**OpenNLP<http://wiki.apache.org/solr/OpenNLP> > > And see "LUCENE-2899: Add OpenNLP Analysis capabilities as a module": > https://issues.apache.org/**jira/browse/LUCENE-2899<https://issues.apache.org/jira/browse/LUCENE-2899> > > -- Jack Krupansky > > -----Original Message----- From: Rendy Bambang Junior > Sent: Monday, May 06, 2013 11:41 AM > To: solr-user@lucene.apache.org > Subject: Tokenize Sentence and Set Attribute > > > Hello, > > I am trying to use part of speech tagger for bahasa Indonesia to filter > tokens in Solr. > The tagger receive input as word list of a sentence and return tag array. > > I think the process should by like this: > - tokenize sentence > - tokenize word > - pass it into the tagger > - set attribute using tagger output > - pass it into a FilteringTokenFilter implementation > > Is it possible to do this in Solr/Lucene? If it is, how? > > I've read similar solution for Japanese language but since I am lack of > Japanese understanding, it couldn't help a lot. > > -- > Regards, > Rendy Bambang Junior > Informatics Engineering '09 > Bandung Institute of Technology > -- edge