DIH always gives me indigestion..... Couple of things: See the 'clean' parameter here for full import: http://wiki.apache.org/solr/DataImportHandler it defaults to true. I think if you set it to "false" _and_ assuming that your <uniqueKey> is defined, it should work OK.
The other approach would be to control the indexing of your XML from, say, a SolrJ program combined with a cron job.... Does that work? Erick On Fri, Oct 5, 2012 at 2:39 PM, Billy Newman <newman...@gmail.com> wrote: > Erick, > > I did mention using the DIH to index the first two datasets, that is > where my the root of my problem lies. > > I do see the benefit of one index. However the question still > remains, can I use the DIH to index xml from data set 1 and 2, every > 15 minutes or so (full index) without wiping out all the indexed data > in the index from data set 3. > > I.E. From a couple of quick tests the DIH full import destroys all > data in the index before it repopulates it. Not sure I can just have > it destroy/re-index data of a certain type. Basically DIH full-import > on my_index for type 'dataset1', and DIH full-import on my-index for > type 'dataset2'. Both full-imports leaving alone the type 'dataset3' > data in the index. > > Any ideas? > > Thanks, > Billy > > On Fri, Oct 5, 2012 at 10:42 AM, Erick Erickson <erickerick...@gmail.com> > wrote: >> The very first question is "what form are your XML docs in?" >> Solr does NOT index arbitrary XML, so I'm guessing >> you're using DIH and some of the xml stuff there. Do note >> that the XSLT is a subset of the full capabilities.... >> >> Second, I'd recommend you just put it all in a single index, it'll be >> simpler. Index a field indicating which of your three sources >> the doc belongs to. Then you can group (aka Field Collapse) by >> source and your result sets will contain the top N docs from each >> type and you can do whatever you want with them at the app >> level. See: http://wiki.apache.org/solr/FieldCollapsing >> >> By including a type, you an also do nifty things like delete all the >> records for a particular type by query. >> >> Best >> Erick >> >> >> On Fri, Oct 5, 2012 at 11:22 AM, Billy Newman <newman...@gmail.com> wrote: >>> I am looking into Solr to index a few of my data sets, 3 to be exact. >>> >>> The first 2 are really small xml docs retrieved via url, ~300 records >>> each. The data behind both of these changes very frequently ~5 >>> minutes. The data itself does not have timestamps so delta-import >>> using DIH would not work (at least I don't think it would work). I am >>> thinking about just re-indexing these 2 data sources every 15 minutes >>> or so to keep the indexes up to date. >>> >>> The 3rd data set is a lot more complicated in which I will probably >>> have to use SolrJ and write some custom code to handle >>> inserts/updates/deletes. >>> >>> I need to be able to search all the data sets once they are indexed in >>> one search. >>> >>> A couple options: >>> >>> 1. Store the data from all 3 datasets in different indexes, allowing >>> the DIH import handler to re-index datasets 1 and 2 without affecting >>> indexed data from data set 3. Not sure this is advised as I am not >>> sure it is a good idea, or even possible to search multiple cores. >>> >>> 2. Store all the data from all 3 datasets in the same index. Yet this >>> brings the question of how to re-index datasets 1 and 2 using a DIH >>> full-import and not lose indexed data from data set 3. >>> >>> Just starting with Solr so please go easy ;). Thanks in advance. >>> >>> Billy