https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler
Admin UI has the interface, so you can play there once you define it. You do have to use Curl, there is no built-in scheduler. Regards, Alex. ---- Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 13:29, Carl Roberts <carl.roberts.zap...@gmail.com> wrote: > Hi Alex, > > If I am understanding this correctly, I can define multiple entities like > this? > > <document> > <entity/> > <entity/> > <entity/> > ... > </document> > > How would I trigger loading certain entities during start? > > How would I trigger loading other entities during update? > > Is there a way to set an auto-update for certain entities so that I don't > have to invoke an update via curl? > > Where / how do I specify the preImportDeleteQuery to avoid deleting > everything upon each update? > > Is there an example or doc that shows how to do all this? > > Regards, > > Joe > > > On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote: >> >> You can define both multiple entities in the same file and nested >> entities if your list comes from an external source (e.g. a text file >> of URLs). >> You can also trigger DIH with a name of a specific entity to load just >> that. >> You can even pass DIH configuration file when you are triggering the >> processing start, so you can have different files completely for >> initial load and update. Though you can just do the same with >> entities. >> >> The only thing to be aware of is that before an entity definition is >> processed, a delete command is run. By default, it's "delete all", so >> executing one entity will delete everything but then just populate >> that one entity's results. You can avoid that by defining >> preImportDeleteQuery and having a clear identifier on content >> generated by each entity (e.g. source, either extracted or manually >> added with TemplateTransformer). >> >> Regards, >> Alex. >> >> ---- >> Sign up for my Solr resources newsletter at http://www.solr-start.com/ >> >> >> On 23 January 2015 at 11:15, Carl Roberts <carl.roberts.zap...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> I have the RSS DIH example working with my own RSS feed - here is the >>> configuration for it. >>> >>> <dataConfig> >>> <dataSource type="URLDataSource" /> >>> <document> >>> <entity name="nvd-rss" >>> pk="link" >>> url="https://nvd.nist.gov/download/nvd-rss.xml" >>> processor="XPathEntityProcessor" >>> forEach="/RDF/item" >>> transformer="DateFormatTransformer"> >>> >>> <field column="id" xpath="/RDF/item/title" >>> commonField="true" /> >>> <field column="link" xpath="/RDF/item/link" >>> commonField="true" >>> /> >>> <field column="summary" xpath="/RDF/item/description" >>> commonField="true" /> >>> <field column="date" xpath="/RDF/item/date" >>> commonField="true" >>> /> >>> >>> </entity> >>> </document> >>> </dataConfig> >>> >>> However, my problem is that I also have to load multiple XML feeds into >>> the >>> same core. Here is one example (there are about 10 of them): >>> >>> http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip >>> >>> >>> Is there any built-in functionality that would allow me to do this? >>> Basically, the use-case is to load and index all the XML ZIP files first, >>> and then check the RSS feed every two hours and update the indexes with >>> any >>> new ones. >>> >>> Regards, >>> >>> Joe >>> >>> >