Right. You define three update handlers, something like /update-animal, /update-mineral, and /update-vegetable. Each one has a separate DIH config. Each config deletes documents of that type and loads documents of that type.
You will not want to run them at the same time, because a commit in one will commit all the pending changes from any other one. It would be much less confusing to run them separately. wunder On Oct 6, 2012, at 2:30 PM, Erick Erickson wrote: > Sure, you need to define the appropriate delete query for each DIH entry. > > Best > Erick > > On Fri, Oct 5, 2012 at 5:40 PM, Billy Newman <newman...@gmail.com> wrote: >> Does DIH support only deleting/re-indexing docs of a certain type? >> >> I.E. can I have a DIH for type:vegetable and another for type:mineral >> and each only deletes/recreates the right types? >> >> Thanks. >> >> On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >>> Using the same unique key doesn't handle documents which disappear from one >>> indexing to the next. >>> >>> Instead, add a field for the type of item, like type:animal, >>> type:vegetable, or type:mineral. Then the query used to clean up before >>> indexing can delete all items of that type. >>> >>> wunder >>> >>> On Oct 5, 2012, at 12:00 PM, Erick Erickson wrote: >>> >>>> DIH always gives me indigestion..... >>>> >>>> Couple of things: >>>> See the 'clean' parameter here for full import: >>>> http://wiki.apache.org/solr/DataImportHandler >>>> it defaults to true. I think if you set it to "false" >>>> _and_ assuming that your <uniqueKey> is >>>> defined, it should work OK. >>>> >>>> The other approach would be to control the >>>> indexing of your XML from, say, a SolrJ program >>>> combined with a cron job.... >>>> >>>> Does that work? >>>> Erick >>>> >>>> On Fri, Oct 5, 2012 at 2:39 PM, Billy Newman <newman...@gmail.com> wrote: >>>>> Erick, >>>>> >>>>> I did mention using the DIH to index the first two datasets, that is >>>>> where my the root of my problem lies. >>>>> >>>>> I do see the benefit of one index. However the question still >>>>> remains, can I use the DIH to index xml from data set 1 and 2, every >>>>> 15 minutes or so (full index) without wiping out all the indexed data >>>>> in the index from data set 3. >>>>> >>>>> I.E. From a couple of quick tests the DIH full import destroys all >>>>> data in the index before it repopulates it. Not sure I can just have >>>>> it destroy/re-index data of a certain type. Basically DIH full-import >>>>> on my_index for type 'dataset1', and DIH full-import on my-index for >>>>> type 'dataset2'. Both full-imports leaving alone the type 'dataset3' >>>>> data in the index. >>>>> >>>>> Any ideas? >>>>> >>>>> Thanks, >>>>> Billy >>>>> >>>>> On Fri, Oct 5, 2012 at 10:42 AM, Erick Erickson <erickerick...@gmail.com> >>>>> wrote: >>>>>> The very first question is "what form are your XML docs in?" >>>>>> Solr does NOT index arbitrary XML, so I'm guessing >>>>>> you're using DIH and some of the xml stuff there. Do note >>>>>> that the XSLT is a subset of the full capabilities.... >>>>>> >>>>>> Second, I'd recommend you just put it all in a single index, it'll be >>>>>> simpler. Index a field indicating which of your three sources >>>>>> the doc belongs to. Then you can group (aka Field Collapse) by >>>>>> source and your result sets will contain the top N docs from each >>>>>> type and you can do whatever you want with them at the app >>>>>> level. See: http://wiki.apache.org/solr/FieldCollapsing >>>>>> >>>>>> By including a type, you an also do nifty things like delete all the >>>>>> records for a particular type by query. >>>>>> >>>>>> Best >>>>>> Erick >>>>>> >>>>>> >>>>>> On Fri, Oct 5, 2012 at 11:22 AM, Billy Newman <newman...@gmail.com> >>>>>> wrote: >>>>>>> I am looking into Solr to index a few of my data sets, 3 to be exact. >>>>>>> >>>>>>> The first 2 are really small xml docs retrieved via url, ~300 records >>>>>>> each. The data behind both of these changes very frequently ~5 >>>>>>> minutes. The data itself does not have timestamps so delta-import >>>>>>> using DIH would not work (at least I don't think it would work). I am >>>>>>> thinking about just re-indexing these 2 data sources every 15 minutes >>>>>>> or so to keep the indexes up to date. >>>>>>> >>>>>>> The 3rd data set is a lot more complicated in which I will probably >>>>>>> have to use SolrJ and write some custom code to handle >>>>>>> inserts/updates/deletes. >>>>>>> >>>>>>> I need to be able to search all the data sets once they are indexed in >>>>>>> one search. >>>>>>> >>>>>>> A couple options: >>>>>>> >>>>>>> 1. Store the data from all 3 datasets in different indexes, allowing >>>>>>> the DIH import handler to re-index datasets 1 and 2 without affecting >>>>>>> indexed data from data set 3. Not sure this is advised as I am not >>>>>>> sure it is a good idea, or even possible to search multiple cores. >>>>>>> >>>>>>> 2. Store all the data from all 3 datasets in the same index. Yet this >>>>>>> brings the question of how to re-index datasets 1 and 2 using a DIH >>>>>>> full-import and not lose indexed data from data set 3. >>>>>>> >>>>>>> Just starting with Solr so please go easy ;). Thanks in advance. >>>>>>> >>>>>>> Billy >>> >>> -- >>> Walter Underwood >>> wun...@wunderwood.org >>> >>> >>> -- Walter Underwood wun...@wunderwood.org