Hi All, I went through possible solutions for my requirement of triggering a Stanbol enhancement during Solr indexing, and I got the requirement simplified.
I only need to process the field named "content" to perform the Stanbol enhancement to extract Person and Organizations. So I think it will be easier to do the Stanbol request during indexing the "content" field , after the data is imported (from DIH). I think the best solution will be to write a custom Analyzer to process the content and post it to Stanbol. In the analyzer I also need to process the Stanbol enhancement response. The response should be processed as a new document to index and store the identified Person and Organization entities in a field called "extractedEntities". So my current idea is as follows; in the schema.xml <copyField source="content" dest="stanbolRequest" /> <field name="stanbolRequest" type="stanbolRequestType" indexed="true" stored="true" docValues="true"required="false"/> <fieldType name="stanbolRequestType" class="solr.TextField"> <analyzer class="MyCustomAnalyzer"/> </fieldType> In the : MyCustomAnalyzer class the content will be posted and enhanced from Stanbol. The Person and Organization entities in the response should be indexed into the Solr field "extractedEntities". Am I going in the correct path for my requirement? Please share your ideas. Appreciate any relevant pointers to samples/documentation. Thanks, Dileea On Wed, Oct 30, 2013 at 11:26 AM, Dileepa Jayakody < dileepajayak...@gmail.com> wrote: > Thanks guys for your ideas. > > I will go through them and come back with questions. > > Regards, > Dileepa > > > On Wed, Oct 30, 2013 at 7:00 AM, Erick Erickson > <erickerick...@gmail.com>wrote: > >> Third time tonight I've been able to paste this link.... >> >> Also, you can consider just moving to SolrJ and >> taking DIH out of the process, see: >> http://searchhub.org/2012/02/14/indexing-with-solrj/ >> >> Whichever approach fits your needs of course. >> >> Best, >> Erick >> >> >> On Tue, Oct 29, 2013 at 7:15 PM, Alexandre Rafalovitch >> <arafa...@gmail.com>wrote: >> >> > It's also possible to combine Update Request Processor with DIH. That >> way >> > if a debug entry needs to be inserted it could go through the same >> Stanbol >> > process. >> > >> > Just define a processing chain the DIH handler and write custom URP to >> call >> > out to Stanbol web service. You have access to a full record in URP, so >> can >> > add/delete/change the fields at will. >> > >> > Regards, >> > Alex. >> > >> > Personal website: http://www.outerthoughts.com/ >> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> > - Time is the quality of nature that keeps events from happening all at >> > once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> > >> > >> > On Wed, Oct 30, 2013 at 4:09 AM, Michael Della Bitta < >> > michael.della.bi...@appinions.com> wrote: >> > >> > > Hi Dileepa, >> > > >> > > You can write your own Transformers in Java. If it doesn't make sense >> to >> > > run Stanbol calls in a Transformer, maybe setting up a web service >> that >> > > grabs a record out of MySQL, sends the data to Stanbol, and displays >> the >> > > results could be used in conjunction with HttpDataSource rather than >> > > JdbcDataSource. >> > > >> > > http://wiki.apache.org/solr/DIHCustomTransformer >> > > >> > > >> > >> http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2FHTTP_Datasource >> > > >> > > Michael Della Bitta >> > > >> > > Applications Developer >> > > >> > > o: +1 646 532 3062 | c: +1 917 477 7906 >> > > >> > > appinions inc. >> > > >> > > “The Science of Influence Marketing” >> > > >> > > 18 East 41st Street >> > > >> > > New York, NY 10017 >> > > >> > > t: @appinions <https://twitter.com/Appinions> | g+: >> > > plus.google.com/appinions< >> > > >> > >> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts >> > > > >> > > w: appinions.com <http://www.appinions.com/> >> > > >> > > >> > > On Tue, Oct 29, 2013 at 4:47 PM, Dileepa Jayakody < >> > > dileepajayak...@gmail.com >> > > > wrote: >> > > >> > > > Hi All, >> > > > >> > > > I'm a newbie to Solr, and I have a requirement to import data from a >> > > mysql >> > > > database; enhance the imported content to identify Persons >> mentioned >> > > and >> > > > index it as a separate field in Solr along with the other fields >> > defined >> > > > for the original db query. >> > > > >> > > > I'm using Apache Stanbol [1] for the content enhancement >> requirement. >> > > > I can get enhancement results for 'Person' type data in the content >> as >> > > the >> > > > enhancement result. >> > > > >> > > > The data flow will be; >> > > > mysql-db > Solr data-import handler > Stanbol enhancer > Solr index >> > > > >> > > > For the above requirement I need to perform additional processing at >> > the >> > > > data-import handler prior to indexing to send a request to Stanbol >> and >> > > > process the enhancement response. I found some related examples on >> > > > modifying mysql data import handler to customize the query results >> in >> > > > db-data-config.xml by using a transformer script. >> > > > As per my requirement, In the data-import-handler I need to send a >> > > request >> > > > to Stanbol and process the response prior to indexing. But I'm not >> sure >> > > if >> > > > this can be achieved using a simple javascript. >> > > > >> > > > Is there any other better way of achieving my requirement? Maybe >> > writing >> > > a >> > > > custom filter in Solr? >> > > > Please share your thoughts. Appreciate any pointers as I'm a >> beginner >> > for >> > > > Solr. >> > > > >> > > > Thanks, >> > > > Dileepa >> > > > >> > > > >> > > > [1] https://stanbol.apache.org >> > > > >> > > >> > >> > >