i find UpdateRequestProcessors (
http://wiki.apache.org/solr/UpdateRequestProcessor) a handy way to add and
remove NLP-related fields to a document as it is processed by Solr. this is
also how UIMA integrates with Solr (http://wiki.apache.org/solr/SolrUIMA).
you might want to take a look at UIMA as well.


On Mon, May 6, 2013 at 6:22 PM, Jack Krupansky <j...@basetechnology.com>wrote:

> Sounds like a very ambitious project. I'm sure you COULD do it in Solr,
> but not in very short order.
>
> Check out some discussion of simply searching within sentences:
> http://markmail.org/message/**aoiq62a4mlo25zzk?q=apache#**
> query:apache+page:1+mid:**aoiq62a4mlo25zzk+state:results<http://markmail.org/message/aoiq62a4mlo25zzk?q=apache#query:apache+page:1+mid:aoiq62a4mlo25zzk+state:results>
>
> First, how do you expect to use/query the corpus?  In other words, what
> are your user requirements? They will determine what structure the Solr
> index, analysis chains, and custom search components will need.
>
> Also, check out the Solr OpenNLP wiki:
> http://wiki.apache.org/solr/**OpenNLP<http://wiki.apache.org/solr/OpenNLP>
>
> And see "LUCENE-2899: Add OpenNLP Analysis capabilities as a module":
> https://issues.apache.org/**jira/browse/LUCENE-2899<https://issues.apache.org/jira/browse/LUCENE-2899>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Rendy Bambang Junior
> Sent: Monday, May 06, 2013 11:41 AM
> To: solr-user@lucene.apache.org
> Subject: Tokenize Sentence and Set Attribute
>
>
> Hello,
>
> I am trying to use part of speech tagger for bahasa Indonesia to filter
> tokens in Solr.
> The tagger receive input as word list of a sentence and return tag array.
>
> I think the process should by like this:
> - tokenize sentence
> - tokenize word
> - pass it into the tagger
> - set attribute using tagger output
> - pass it into a FilteringTokenFilter implementation
>
> Is it possible to do this in Solr/Lucene? If it is, how?
>
> I've read similar solution for Japanese language but since I am lack of
> Japanese understanding, it couldn't help a lot.
>
> --
> Regards,
> Rendy Bambang Junior
> Informatics Engineering '09
> Bandung Institute of Technology
>



-- 
edge

Reply via email to