Sure, you need to define the appropriate delete query for each DIH entry. Best Erick
On Fri, Oct 5, 2012 at 5:40 PM, Billy Newman <newman...@gmail.com> wrote: > Does DIH support only deleting/re-indexing docs of a certain type? > > I.E. can I have a DIH for type:vegetable and another for type:mineral > and each only deletes/recreates the right types? > > Thanks. > > On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood <wun...@wunderwood.org> > wrote: >> Using the same unique key doesn't handle documents which disappear from one >> indexing to the next. >> >> Instead, add a field for the type of item, like type:animal, type:vegetable, >> or type:mineral. Then the query used to clean up before indexing can delete >> all items of that type. >> >> wunder >> >> On Oct 5, 2012, at 12:00 PM, Erick Erickson wrote: >> >>> DIH always gives me indigestion..... >>> >>> Couple of things: >>> See the 'clean' parameter here for full import: >>> http://wiki.apache.org/solr/DataImportHandler >>> it defaults to true. I think if you set it to "false" >>> _and_ assuming that your <uniqueKey> is >>> defined, it should work OK. >>> >>> The other approach would be to control the >>> indexing of your XML from, say, a SolrJ program >>> combined with a cron job.... >>> >>> Does that work? >>> Erick >>> >>> On Fri, Oct 5, 2012 at 2:39 PM, Billy Newman <newman...@gmail.com> wrote: >>>> Erick, >>>> >>>> I did mention using the DIH to index the first two datasets, that is >>>> where my the root of my problem lies. >>>> >>>> I do see the benefit of one index. However the question still >>>> remains, can I use the DIH to index xml from data set 1 and 2, every >>>> 15 minutes or so (full index) without wiping out all the indexed data >>>> in the index from data set 3. >>>> >>>> I.E. From a couple of quick tests the DIH full import destroys all >>>> data in the index before it repopulates it. Not sure I can just have >>>> it destroy/re-index data of a certain type. Basically DIH full-import >>>> on my_index for type 'dataset1', and DIH full-import on my-index for >>>> type 'dataset2'. Both full-imports leaving alone the type 'dataset3' >>>> data in the index. >>>> >>>> Any ideas? >>>> >>>> Thanks, >>>> Billy >>>> >>>> On Fri, Oct 5, 2012 at 10:42 AM, Erick Erickson <erickerick...@gmail.com> >>>> wrote: >>>>> The very first question is "what form are your XML docs in?" >>>>> Solr does NOT index arbitrary XML, so I'm guessing >>>>> you're using DIH and some of the xml stuff there. Do note >>>>> that the XSLT is a subset of the full capabilities.... >>>>> >>>>> Second, I'd recommend you just put it all in a single index, it'll be >>>>> simpler. Index a field indicating which of your three sources >>>>> the doc belongs to. Then you can group (aka Field Collapse) by >>>>> source and your result sets will contain the top N docs from each >>>>> type and you can do whatever you want with them at the app >>>>> level. See: http://wiki.apache.org/solr/FieldCollapsing >>>>> >>>>> By including a type, you an also do nifty things like delete all the >>>>> records for a particular type by query. >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> >>>>> On Fri, Oct 5, 2012 at 11:22 AM, Billy Newman <newman...@gmail.com> wrote: >>>>>> I am looking into Solr to index a few of my data sets, 3 to be exact. >>>>>> >>>>>> The first 2 are really small xml docs retrieved via url, ~300 records >>>>>> each. The data behind both of these changes very frequently ~5 >>>>>> minutes. The data itself does not have timestamps so delta-import >>>>>> using DIH would not work (at least I don't think it would work). I am >>>>>> thinking about just re-indexing these 2 data sources every 15 minutes >>>>>> or so to keep the indexes up to date. >>>>>> >>>>>> The 3rd data set is a lot more complicated in which I will probably >>>>>> have to use SolrJ and write some custom code to handle >>>>>> inserts/updates/deletes. >>>>>> >>>>>> I need to be able to search all the data sets once they are indexed in >>>>>> one search. >>>>>> >>>>>> A couple options: >>>>>> >>>>>> 1. Store the data from all 3 datasets in different indexes, allowing >>>>>> the DIH import handler to re-index datasets 1 and 2 without affecting >>>>>> indexed data from data set 3. Not sure this is advised as I am not >>>>>> sure it is a good idea, or even possible to search multiple cores. >>>>>> >>>>>> 2. Store all the data from all 3 datasets in the same index. Yet this >>>>>> brings the question of how to re-index datasets 1 and 2 using a DIH >>>>>> full-import and not lose indexed data from data set 3. >>>>>> >>>>>> Just starting with Solr so please go easy ;). Thanks in advance. >>>>>> >>>>>> Billy >> >> -- >> Walter Underwood >> wun...@wunderwood.org >> >> >>