If you add clean=false as a parameter to the full-import then deletion is disabled. Since you are ingesting RSS there is no need for deletion at all I guess.
On Fri, Jan 23, 2015 at 7:31 PM, Carl Roberts <carl.roberts.zap...@gmail.com > wrote: > OK - Thanks for the doc. > > Is it possible to just provide an empty value to preImportDeleteQuery to > disable the delete prior to import? > > Will the data still be deleted for each entity during a delta-import > instead of full-import? > > Is there any capability in the handler to unzip an XML file from a URL > prior to reading it or can I perhaps hook a custom pre-processing handler? > > Regards, > > Joe > > > > On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote: > >> https://cwiki.apache.org/confluence/display/solr/ >> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler >> >> Admin UI has the interface, so you can play there once you define it. >> >> You do have to use Curl, there is no built-in scheduler. >> >> Regards, >> Alex. >> ---- >> Sign up for my Solr resources newsletter at http://www.solr-start.com/ >> >> >> On 23 January 2015 at 13:29, Carl Roberts <carl.roberts.zap...@gmail.com> >> wrote: >> >>> Hi Alex, >>> >>> If I am understanding this correctly, I can define multiple entities like >>> this? >>> >>> <document> >>> <entity/> >>> <entity/> >>> <entity/> >>> ... >>> </document> >>> >>> How would I trigger loading certain entities during start? >>> >>> How would I trigger loading other entities during update? >>> >>> Is there a way to set an auto-update for certain entities so that I don't >>> have to invoke an update via curl? >>> >>> Where / how do I specify the preImportDeleteQuery to avoid deleting >>> everything upon each update? >>> >>> Is there an example or doc that shows how to do all this? >>> >>> Regards, >>> >>> Joe >>> >>> >>> On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote: >>> >>>> You can define both multiple entities in the same file and nested >>>> entities if your list comes from an external source (e.g. a text file >>>> of URLs). >>>> You can also trigger DIH with a name of a specific entity to load just >>>> that. >>>> You can even pass DIH configuration file when you are triggering the >>>> processing start, so you can have different files completely for >>>> initial load and update. Though you can just do the same with >>>> entities. >>>> >>>> The only thing to be aware of is that before an entity definition is >>>> processed, a delete command is run. By default, it's "delete all", so >>>> executing one entity will delete everything but then just populate >>>> that one entity's results. You can avoid that by defining >>>> preImportDeleteQuery and having a clear identifier on content >>>> generated by each entity (e.g. source, either extracted or manually >>>> added with TemplateTransformer). >>>> >>>> Regards, >>>> Alex. >>>> >>>> ---- >>>> Sign up for my Solr resources newsletter at http://www.solr-start.com/ >>>> >>>> >>>> On 23 January 2015 at 11:15, Carl Roberts < >>>> carl.roberts.zap...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have the RSS DIH example working with my own RSS feed - here is the >>>>> configuration for it. >>>>> >>>>> <dataConfig> >>>>> <dataSource type="URLDataSource" /> >>>>> <document> >>>>> <entity name="nvd-rss" >>>>> pk="link" >>>>> url="https://nvd.nist.gov/download/nvd-rss.xml" >>>>> processor="XPathEntityProcessor" >>>>> forEach="/RDF/item" >>>>> transformer="DateFormatTransformer"> >>>>> >>>>> <field column="id" xpath="/RDF/item/title" >>>>> commonField="true" /> >>>>> <field column="link" xpath="/RDF/item/link" >>>>> commonField="true" >>>>> /> >>>>> <field column="summary" xpath="/RDF/item/description" >>>>> commonField="true" /> >>>>> <field column="date" xpath="/RDF/item/date" >>>>> commonField="true" >>>>> /> >>>>> >>>>> </entity> >>>>> </document> >>>>> </dataConfig> >>>>> >>>>> However, my problem is that I also have to load multiple XML feeds into >>>>> the >>>>> same core. Here is one example (there are about 10 of them): >>>>> >>>>> http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip >>>>> >>>>> >>>>> Is there any built-in functionality that would allow me to do this? >>>>> Basically, the use-case is to load and index all the XML ZIP files >>>>> first, >>>>> and then check the RSS feed every two hours and update the indexes with >>>>> any >>>>> new ones. >>>>> >>>>> Regards, >>>>> >>>>> Joe >>>>> >>>>> >>>>> > -- Regards, Shalin Shekhar Mangar.