Erick, I did mention using the DIH to index the first two datasets, that is where my the root of my problem lies.
I do see the benefit of one index. However the question still remains, can I use the DIH to index xml from data set 1 and 2, every 15 minutes or so (full index) without wiping out all the indexed data in the index from data set 3. I.E. From a couple of quick tests the DIH full import destroys all data in the index before it repopulates it. Not sure I can just have it destroy/re-index data of a certain type. Basically DIH full-import on my_index for type 'dataset1', and DIH full-import on my-index for type 'dataset2'. Both full-imports leaving alone the type 'dataset3' data in the index. Any ideas? Thanks, Billy On Fri, Oct 5, 2012 at 10:42 AM, Erick Erickson <erickerick...@gmail.com> wrote: > The very first question is "what form are your XML docs in?" > Solr does NOT index arbitrary XML, so I'm guessing > you're using DIH and some of the xml stuff there. Do note > that the XSLT is a subset of the full capabilities.... > > Second, I'd recommend you just put it all in a single index, it'll be > simpler. Index a field indicating which of your three sources > the doc belongs to. Then you can group (aka Field Collapse) by > source and your result sets will contain the top N docs from each > type and you can do whatever you want with them at the app > level. See: http://wiki.apache.org/solr/FieldCollapsing > > By including a type, you an also do nifty things like delete all the > records for a particular type by query. > > Best > Erick > > > On Fri, Oct 5, 2012 at 11:22 AM, Billy Newman <newman...@gmail.com> wrote: >> I am looking into Solr to index a few of my data sets, 3 to be exact. >> >> The first 2 are really small xml docs retrieved via url, ~300 records >> each. The data behind both of these changes very frequently ~5 >> minutes. The data itself does not have timestamps so delta-import >> using DIH would not work (at least I don't think it would work). I am >> thinking about just re-indexing these 2 data sources every 15 minutes >> or so to keep the indexes up to date. >> >> The 3rd data set is a lot more complicated in which I will probably >> have to use SolrJ and write some custom code to handle >> inserts/updates/deletes. >> >> I need to be able to search all the data sets once they are indexed in >> one search. >> >> A couple options: >> >> 1. Store the data from all 3 datasets in different indexes, allowing >> the DIH import handler to re-index datasets 1 and 2 without affecting >> indexed data from data set 3. Not sure this is advised as I am not >> sure it is a good idea, or even possible to search multiple cores. >> >> 2. Store all the data from all 3 datasets in the same index. Yet this >> brings the question of how to re-index datasets 1 and 2 using a DIH >> full-import and not lose indexed data from data set 3. >> >> Just starting with Solr so please go easy ;). Thanks in advance. >> >> Billy