Does DIH support only deleting/re-indexing docs of a certain type? I.E. can I have a DIH for type:vegetable and another for type:mineral and each only deletes/recreates the right types?
Thanks. On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Using the same unique key doesn't handle documents which disappear from one > indexing to the next. > > Instead, add a field for the type of item, like type:animal, type:vegetable, > or type:mineral. Then the query used to clean up before indexing can delete > all items of that type. > > wunder > > On Oct 5, 2012, at 12:00 PM, Erick Erickson wrote: > >> DIH always gives me indigestion..... >> >> Couple of things: >> See the 'clean' parameter here for full import: >> http://wiki.apache.org/solr/DataImportHandler >> it defaults to true. I think if you set it to "false" >> _and_ assuming that your <uniqueKey> is >> defined, it should work OK. >> >> The other approach would be to control the >> indexing of your XML from, say, a SolrJ program >> combined with a cron job.... >> >> Does that work? >> Erick >> >> On Fri, Oct 5, 2012 at 2:39 PM, Billy Newman <newman...@gmail.com> wrote: >>> Erick, >>> >>> I did mention using the DIH to index the first two datasets, that is >>> where my the root of my problem lies. >>> >>> I do see the benefit of one index. However the question still >>> remains, can I use the DIH to index xml from data set 1 and 2, every >>> 15 minutes or so (full index) without wiping out all the indexed data >>> in the index from data set 3. >>> >>> I.E. From a couple of quick tests the DIH full import destroys all >>> data in the index before it repopulates it. Not sure I can just have >>> it destroy/re-index data of a certain type. Basically DIH full-import >>> on my_index for type 'dataset1', and DIH full-import on my-index for >>> type 'dataset2'. Both full-imports leaving alone the type 'dataset3' >>> data in the index. >>> >>> Any ideas? >>> >>> Thanks, >>> Billy >>> >>> On Fri, Oct 5, 2012 at 10:42 AM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>>> The very first question is "what form are your XML docs in?" >>>> Solr does NOT index arbitrary XML, so I'm guessing >>>> you're using DIH and some of the xml stuff there. Do note >>>> that the XSLT is a subset of the full capabilities.... >>>> >>>> Second, I'd recommend you just put it all in a single index, it'll be >>>> simpler. Index a field indicating which of your three sources >>>> the doc belongs to. Then you can group (aka Field Collapse) by >>>> source and your result sets will contain the top N docs from each >>>> type and you can do whatever you want with them at the app >>>> level. See: http://wiki.apache.org/solr/FieldCollapsing >>>> >>>> By including a type, you an also do nifty things like delete all the >>>> records for a particular type by query. >>>> >>>> Best >>>> Erick >>>> >>>> >>>> On Fri, Oct 5, 2012 at 11:22 AM, Billy Newman <newman...@gmail.com> wrote: >>>>> I am looking into Solr to index a few of my data sets, 3 to be exact. >>>>> >>>>> The first 2 are really small xml docs retrieved via url, ~300 records >>>>> each. The data behind both of these changes very frequently ~5 >>>>> minutes. The data itself does not have timestamps so delta-import >>>>> using DIH would not work (at least I don't think it would work). I am >>>>> thinking about just re-indexing these 2 data sources every 15 minutes >>>>> or so to keep the indexes up to date. >>>>> >>>>> The 3rd data set is a lot more complicated in which I will probably >>>>> have to use SolrJ and write some custom code to handle >>>>> inserts/updates/deletes. >>>>> >>>>> I need to be able to search all the data sets once they are indexed in >>>>> one search. >>>>> >>>>> A couple options: >>>>> >>>>> 1. Store the data from all 3 datasets in different indexes, allowing >>>>> the DIH import handler to re-index datasets 1 and 2 without affecting >>>>> indexed data from data set 3. Not sure this is advised as I am not >>>>> sure it is a good idea, or even possible to search multiple cores. >>>>> >>>>> 2. Store all the data from all 3 datasets in the same index. Yet this >>>>> brings the question of how to re-index datasets 1 and 2 using a DIH >>>>> full-import and not lose indexed data from data set 3. >>>>> >>>>> Just starting with Solr so please go easy ;). Thanks in advance. >>>>> >>>>> Billy > > -- > Walter Underwood > wun...@wunderwood.org > > >