Actually we dropped integrating nlp with solr but we took two different ideas:
* we're using nlp seperately not with solr * we're taking help of UIMA for solr. Its more advanced. If you've a specific question. you can ask me. I'll tell you if i know. -Vivek On Wed, Sep 10, 2014 at 3:46 PM, Aman Tandon <amantandon...@gmail.com> wrote: > Hi, > > What is the progress of integration of nlp with solr. If you have achieved > this integration techniques successfully then please share with us. > > With Regards > Aman Tandon > > On Tue, Jun 10, 2014 at 11:04 AM, Vivekanand Ittigi <vi...@biginfolabs.com > > > wrote: > > > Hi Aman, > > > > Yeah, We are also thinking the same. Using UIMA is better. And thanks to > > everyone. You guys really showed us the way(UIMA). > > > > We'll work on it. > > > > Thanks, > > Vivek > > > > > > On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon <amantandon...@gmail.com> > > wrote: > > > > > Hi Vikek, > > > > > > As everybody in the mail list mentioned to use UIMA you should go for > it, > > > as opennlp issues are not tracking properly, it can make stuck your > > > development in near future if any issue comes, so its better to start > > > investigate with uima. > > > > > > > > > With Regards > > > Aman Tandon > > > > > > > > > On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi < > > vi...@biginfolabs.com> > > > wrote: > > > > > > > Can anyone pleas reply..? > > > > > > > > Thanks, > > > > Vivek > > > > > > > > ---------- Forwarded message ---------- > > > > From: Vivekanand Ittigi <vi...@biginfolabs.com> > > > > Date: Wed, Jun 4, 2014 at 4:38 PM > > > > Subject: Re: Integrate solr with openNLP > > > > To: Tommaso Teofili <tommaso.teof...@gmail.com> > > > > Cc: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>, > Ahmet > > > > Arslan <iori...@yahoo.com> > > > > > > > > > > > > Hi Tommaso, > > > > > > > > Yes, you are right. 4.4 version will work.. I'm able to compile now. > > I'm > > > > trying to apply named recognition(person name) token but im not > seeing > > > any > > > > change. my schema.xml looks like this: > > > > > > > > <field name="text" type="text_opennlp_pos_ner" indexed="true" > > > stored="true" > > > > multiValued="true"/> > > > > > > > > <fieldType name="text_opennlp_pos_ner" class="solr.TextField" > > > > positionIncrementGap="100"> > > > > <analyzer> > > > > <tokenizer class="solr.OpenNLPTokenizerFactory" > > > > tokenizerModel="opennlp/en-token.bin" > > > > /> > > > > <filter class="solr.OpenNLPFilterFactory" > > > > nerTaggerModels="opennlp/en-ner-person.bin" > > > > /> > > > > <filter class="solr.LowerCaseFilterFactory"/> > > > > </analyzer> > > > > > > > > </fieldType> > > > > > > > > Please guide..? > > > > > > > > Thanks, > > > > Vivek > > > > > > > > > > > > On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili < > > > tommaso.teof...@gmail.com > > > > > > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > Ahment was suggesting to eventually use UIMA integration because > > > OpenNLP > > > > > has already an integration with Apache UIMA and so you would just > > have > > > to > > > > > use that [1]. > > > > > And that's one of the main reason UIMA integration was done: it's a > > > > > framework that you can easily hook into in order to plug your NLP > > > > algorithm. > > > > > > > > > > If you want to just use OpenNLP then it's up to you if either write > > > your > > > > > own UpdateRequestProcessor plugin [2] to add metadata extracted by > > > > OpenNLP > > > > > to your documents or either you can write a dedicated analyzer / > > > > tokenizer > > > > > / token filter. > > > > > > > > > > For the OpenNLP integration (LUCENE-2899), the patch is not up to > > date > > > > > with the latest APIs in trunk, however you should be able to apply > it > > > to > > > > > (if I recall correctly) to 4.4 version or so, and also adapting it > to > > > the > > > > > latest API shouldn't be too hard. > > > > > > > > > > Regards, > > > > > Tommaso > > > > > > > > > > [1] : > > > > > > > > > > > > > > > http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima > > > > > [2] : http://wiki.apache.org/solr/UpdateRequestProcessor > > > > > > > > > > > > > > > > > > > > 2014-06-03 15:34 GMT+02:00 Ahmet Arslan <iori...@yahoo.com.invalid > >: > > > > > > > > > > Can you extract names, locations etc using OpenNLP in > plain/straight > > > java > > > > >> program? > > > > >> > > > > >> If yes, here are two seperate options : > > > > >> > > > > >> 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an > > > > >> example to integrate your NER code into it and write your own > > indexing > > > > >> code. You have the full power here. No solr-plugins are involved. > > > > >> > > > > >> 2) Use 'Implementing a conditional copyField' given here : > > > > >> http://wiki.apache.org/solr/UpdateRequestProcessor > > > > >> as an example and integrate your NER code into it. > > > > >> > > > > >> > > > > >> Please note that these are separate ways to enrich your incoming > > > > >> documents, choose either (1) or (2). > > > > >> > > > > >> > > > > >> > > > > >> On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi < > > > > >> vi...@biginfolabs.com> wrote: > > > > >> Okay, but i dint understand what you said. Can you please > elaborate. > > > > >> > > > > >> Thanks, > > > > >> Vivek > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan <iori...@yahoo.com> > > > wrote: > > > > >> > > > > >> > Hi Vivekanand, > > > > >> > > > > > >> > I have never use UIMA+Solr before. > > > > >> > > > > > >> > Personally I think it takes more time to learn how to > > configure/use > > > > >> these > > > > >> > uima stuff. > > > > >> > > > > > >> > > > > > >> > If you are familiar with java, write a class that extends > > > > >> > UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these > > new > > > > >> fields > > > > >> > (organisation, city, person name, etc, to your document. This > > phase > > > is > > > > >> > usually called 'enrichment'. > > > > >> > > > > > >> > Does that makes sense? > > > > >> > > > > > >> > > > > > >> > > > > > >> > On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi < > > > > >> vi...@biginfolabs.com> > > > > >> > wrote: > > > > >> > Hi Ahmet, > > > > >> > > > > > >> > I followed what you said > > > > >> > > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration > > . > > > > But > > > > >> how > > > > >> > can i achieve my goal? i mean extracting only name of the > > > organization > > > > >> or > > > > >> > person from the content field. > > > > >> > > > > > >> > I guess i'm almost there but something is missing? please guide > me > > > > >> > > > > > >> > Thanks, > > > > >> > Vivek > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi < > > > > >> vi...@biginfolabs.com> > > > > >> > wrote: > > > > >> > > > > > >> > > Entire goal cant be said but one of those tasks can be like > > this.. > > > > we > > > > >> > have > > > > >> > > big document(can be website or pdf etc) indexed to the solr. > > > > >> > > Lets say <field name=content> will sore store the contents of > > > > >> document. > > > > >> > > All i want to do is pick name of persons,places from it using > > > > openNLP > > > > >> or > > > > >> > > some other means. > > > > >> > > > > > > >> > > Those names should be reflected in solr itself. > > > > >> > > > > > > >> > > Thanks, > > > > >> > > Vivek > > > > >> > > > > > > >> > > > > > > >> > > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan < > iori...@yahoo.com > > > > > > > >> wrote: > > > > >> > > > > > > >> > >> Hi, > > > > >> > >> > > > > >> > >> Please tell us what you are trying to in a new treat. Your > high > > > > level > > > > >> > >> goal. There may be some other ways/tools such as ( > > > > >> > >> https://stanbol.apache.org ) other than OpenNLP. > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi < > > > > >> > >> vi...@biginfolabs.com> wrote: > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> We'll surely look into UIMA integration. > > > > >> > >> > > > > >> > >> But before moving, is this( > > https://wiki.apache.org/solr/OpenNLP > > > ) > > > > >> the > > > > >> > >> only link we've got to integrate?isn't there any other > article > > or > > > > >> link > > > > >> > >> which may help us to do fix this problem. > > > > >> > >> > > > > >> > >> Thanks, > > > > >> > >> Vivek > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> > > > > >> > >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan < > > iori...@yahoo.com> > > > > >> wrote: > > > > >> > >> > > > > >> > >> Hi, > > > > >> > >> > > > > > >> > >> >I believe I answered it. Let me re-try, > > > > >> > >> > > > > > >> > >> >There is no committed code for OpenNLP. There is an open > > ticket > > > > with > > > > >> > >> patches. They may not work with current trunk. > > > > >> > >> > > > > > >> > >> >Confluence is the official documentation. Wiki is maintained > > by > > > > >> > >> community. Meaning wiki can talk about some uncommitted > > > > >> features/stuff. > > > > >> > >> Like this one : https://wiki.apache.org/solr/OpenNLP > > > > >> > >> > > > > > >> > >> >What I am suggesting is, have a look at > > > > >> > >> > > > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration > > > > >> > >> > > > > > >> > >> > > > > > >> > >> >And search how to use OpenNLP inside UIMA. May be > LUCENE-2899 > > is > > > > >> > already > > > > >> > >> doable with solr-uima. I am adding Tommaso (sorry for this > but > > we > > > > >> need > > > > >> > an > > > > >> > >> authoritative answer here) to clarify this. > > > > >> > >> > > > > > >> > >> > > > > > >> > >> >Also consider indexing with SolrJ and use OpenNLP enrichment > > > > outside > > > > >> > the > > > > >> > >> solr. Use openNLP with plain java, enrich your documents and > > > index > > > > >> them > > > > >> > >> with SolJ. You don't have to too everything inside solr as > > > > >> solr-plugins. > > > > >> > >> > > > > > >> > >> >Hope this helps, > > > > >> > >> > > > > > >> > >> >Ahmet > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> >On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi < > > > > >> > >> vi...@biginfolabs.com> wrote: > > > > >> > >> >Thanks, I will check with the jira.. but you dint answe my > > first > > > > >> > >> >question..? And there's no way to integrate solr with > > openNLP?or > > > > is > > > > >> > there > > > > >> > >> >any committed code, using which i can go head. > > > > >> > >> > > > > > >> > >> >Thanks, > > > > >> > >> >Vivek > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> >On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan < > > > iori...@yahoo.com> > > > > >> > wrote: > > > > >> > >> > > > > > >> > >> >> Hi, > > > > >> > >> >> > > > > >> > >> >> Here is the jira issue : > > > > >> > >> https://issues.apache.org/jira/browse/LUCENE-2899 > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> Anyone can create an account. > > > > >> > >> >> > > > > >> > >> >> I didn't use UIMA by myself and I have little knowledge > > about > > > > it. > > > > >> > But I > > > > >> > >> >> believe it is possible to use OpenNLP inside UIMA. > > > > >> > >> >> You need to dig into UIMA documentation. > > > > >> > >> >> > > > > >> > >> >> Solr UIMA integration already exists, thats why I > questioned > > > > >> whether > > > > >> > >> your > > > > >> > >> >> requirement is possible with uima or not. I don't know the > > > > answer > > > > >> > >> myself. > > > > >> > >> >> > > > > >> > >> >> Ahmet > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi < > > > > >> > >> vi...@biginfolabs.com> > > > > >> > >> >> wrote: > > > > >> > >> >> Hi Arslan, > > > > >> > >> >> > > > > >> > >> >> If not uncommitted code, then which code to be used to > > > > integrate? > > > > >> > >> >> > > > > >> > >> >> If i have to comment my problems, which jira and how to > put > > > it? > > > > >> > >> >> > > > > >> > >> >> And why you are suggesting UIMA integration. My > requirements > > > is > > > > >> > >> integrating > > > > >> > >> >> with openNLP.? You mean we can do all the acitivties > through > > > > UIMA > > > > >> as > > > > >> > >> we do > > > > >> > >> >> it using openNLP..?like name,location finder etc? > > > > >> > >> >> > > > > >> > >> >> Thanks, > > > > >> > >> >> Vivek > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan > > > > >> > <iori...@yahoo.com.invalid > > > > >> > >> > > > > > >> > >> >> wrote: > > > > >> > >> >> > > > > >> > >> >> > Hi, > > > > >> > >> >> > > > > > >> > >> >> > Uncommitted code could have these kind of problems. It > is > > > not > > > > >> > >> guaranteed > > > > >> > >> >> > to work with latest trunk. > > > > >> > >> >> > > > > > >> > >> >> > You could commend the problem you face on the jira > ticket. > > > > >> > >> >> > > > > > >> > >> >> > By the way, may be you are after something doable with > > > already > > > > >> > >> committed > > > > >> > >> >> > UIMA stuff? > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration > > > > >> > >> >> > > > > > >> > >> >> > Ahmet > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi < > > > > >> > >> >> vi...@biginfolabs.com> > > > > >> > >> >> > wrote: > > > > >> > >> >> > I followed this link to integrate > > > > >> > >> https://wiki.apache.org/solr/OpenNLP > > > > >> > >> >> to > > > > >> > >> >> > integrate > > > > >> > >> >> > > > > > >> > >> >> > Installation > > > > >> > >> >> > > > > > >> > >> >> > For English language testing: Until LUCENE-2899 is > > > committed: > > > > >> > >> >> > > > > > >> > >> >> > 1.pull the latest trunk or 4.0 branch > > > > >> > >> >> > > > > > >> > >> >> > 2.apply the latest LUCENE-2899 patch > > > > >> > >> >> > 3.do 'ant compile' > > > > >> > >> >> > cd solr/contrib/opennlp/src/test-files/training > > > > >> > >> >> > . > > > > >> > >> >> > . > > > > >> > >> >> > . > > > > >> > >> >> > i followed first two steps but got the following error > > while > > > > >> > >> executing > > > > >> > >> >> 3rd > > > > >> > >> >> > point > > > > >> > >> >> > > > > > >> > >> >> > common.compile-core: > > > > >> > >> >> > [javac] Compiling 10 source files to > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > >> > >> > > > > >> > > > > > >> > > > > > > > > > > /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java > > > > >> > >> >> > > > > > >> > >> >> > [javac] warning: [path] bad path element > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > >> > >> > > > > >> > > > > > >> > > > > > > > > > > "/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar": > > > > >> > >> >> > no such file or directory > > > > >> > >> >> > > > > > >> > >> >> > [javac] > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > >> > >> > > > > >> > > > > > >> > > > > > > > > > > /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43: > > > > >> > >> >> > error: cannot find symbol > > > > >> > >> >> > > > > > >> > >> >> > [javac] super(Version.LUCENE_44, input); > > > > >> > >> >> > > > > > >> > >> >> > [javac] ^ > > > > >> > >> >> > [javac] symbol: variable LUCENE_44 > > > > >> > >> >> > [javac] location: class Version > > > > >> > >> >> > [javac] > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > >> > >> > > > > >> > > > > > >> > > > > > > > > > > /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56: > > > > >> > >> >> > error: no suitable constructor found for > Tokenizer(Reader) > > > > >> > >> >> > [javac] super(input); > > > > >> > >> >> > [javac] ^ > > > > >> > >> >> > [javac] constructor > > > > >> Tokenizer.Tokenizer(AttributeFactory) > > > > >> > is > > > > >> > >> not > > > > >> > >> >> > applicable > > > > >> > >> >> > [javac] (actual argument Reader cannot be > > > converted > > > > to > > > > >> > >> >> > AttributeFactory by method invocation conversion) > > > > >> > >> >> > [javac] constructor Tokenizer.Tokenizer() is not > > > > >> applicable > > > > >> > >> >> > [javac] (actual and formal argument lists > differ > > > in > > > > >> > length) > > > > >> > >> >> > [javac] 2 errors > > > > >> > >> >> > [javac] 1 warning > > > > >> > >> >> > > > > > >> > >> >> > Im really stuck how to passthough this step. I wasted my > > > > entire > > > > >> to > > > > >> > >> fix > > > > >> > >> >> this > > > > >> > >> >> > but couldn't move a bit. Please someone help me..? > > > > >> > >> >> > > > > > >> > >> >> > Thanks, > > > > >> > >> >> > Vivek > > > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> > > > > > >> > >> > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > >