Sure, you need to define the appropriate delete query for each DIH entry.

Best
Erick

On Fri, Oct 5, 2012 at 5:40 PM, Billy Newman <newman...@gmail.com> wrote:
> Does DIH support only deleting/re-indexing docs of a certain type?
>
> I.E. can I have a DIH for type:vegetable and another for type:mineral
> and each only deletes/recreates the right types?
>
> Thanks.
>
> On Fri, Oct 5, 2012 at 1:04 PM, Walter Underwood <wun...@wunderwood.org> 
> wrote:
>> Using the same unique key doesn't handle documents which disappear from one 
>> indexing to the next.
>>
>> Instead, add a field for the type of item, like type:animal, type:vegetable, 
>> or type:mineral. Then the query used to clean up before indexing can delete 
>> all items of that type.
>>
>> wunder
>>
>> On Oct 5, 2012, at 12:00 PM, Erick Erickson wrote:
>>
>>> DIH always gives me indigestion.....
>>>
>>> Couple of things:
>>> See the 'clean' parameter here for full import:
>>> http://wiki.apache.org/solr/DataImportHandler
>>> it defaults to true. I think if you set it to "false"
>>> _and_ assuming that your <uniqueKey> is
>>> defined, it should work OK.
>>>
>>> The other approach would be to control the
>>> indexing of your XML from, say, a SolrJ program
>>> combined with a cron job....
>>>
>>> Does that work?
>>> Erick
>>>
>>> On Fri, Oct 5, 2012 at 2:39 PM, Billy Newman <newman...@gmail.com> wrote:
>>>> Erick,
>>>>
>>>> I did mention using the DIH to index the first two datasets, that is
>>>> where my the root of my problem lies.
>>>>
>>>> I do see the benefit of one index.  However the question still
>>>> remains, can I use the DIH to index xml from data set 1 and 2, every
>>>> 15 minutes or so (full index) without wiping out all the indexed data
>>>> in the index from data set 3.
>>>>
>>>> I.E. From a couple of quick tests the DIH full import destroys all
>>>> data in the index before it repopulates it.  Not sure I can just have
>>>> it destroy/re-index data of a certain type.  Basically DIH full-import
>>>> on my_index for type 'dataset1', and DIH full-import on my-index for
>>>> type 'dataset2'.  Both full-imports leaving alone the type 'dataset3'
>>>> data in the index.
>>>>
>>>> Any ideas?
>>>>
>>>> Thanks,
>>>> Billy
>>>>
>>>> On Fri, Oct 5, 2012 at 10:42 AM, Erick Erickson <erickerick...@gmail.com> 
>>>> wrote:
>>>>> The very first question is "what form are your XML docs in?"
>>>>> Solr does NOT index arbitrary XML, so I'm guessing
>>>>> you're using DIH and some of the xml stuff there. Do note
>>>>> that the XSLT is a subset of the full capabilities....
>>>>>
>>>>> Second, I'd recommend you just put it all in a single index, it'll be
>>>>> simpler. Index a field indicating which of your three sources
>>>>> the doc belongs to. Then you can group (aka Field Collapse) by
>>>>> source and your result sets will contain the top N docs from each
>>>>> type and you can do whatever you want with them at the app
>>>>> level. See: http://wiki.apache.org/solr/FieldCollapsing
>>>>>
>>>>> By including a type, you an also do nifty things like delete all the
>>>>> records for a particular type by query.
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>>
>>>>> On Fri, Oct 5, 2012 at 11:22 AM, Billy Newman <newman...@gmail.com> wrote:
>>>>>> I am looking into Solr to index a few of my data sets, 3 to be exact.
>>>>>>
>>>>>> The first 2 are really small xml docs retrieved via url, ~300 records
>>>>>> each.  The data behind both of these changes very frequently ~5
>>>>>> minutes.  The data itself does not have timestamps so delta-import
>>>>>> using DIH would not work (at least I don't think it would work).  I am
>>>>>> thinking about just re-indexing these 2 data sources every 15 minutes
>>>>>> or so to keep the indexes up to date.
>>>>>>
>>>>>> The 3rd data set is a lot more complicated in which I will probably
>>>>>> have to use SolrJ and write some custom code to handle
>>>>>> inserts/updates/deletes.
>>>>>>
>>>>>> I need to be able to search all the data sets once they are indexed in
>>>>>> one search.
>>>>>>
>>>>>> A couple options:
>>>>>>
>>>>>> 1.  Store the data from all 3 datasets in different indexes, allowing
>>>>>> the DIH import handler to re-index datasets 1 and 2 without affecting
>>>>>> indexed data from data set 3.   Not sure this is advised as I am not
>>>>>> sure it is a good idea, or even possible to search multiple cores.
>>>>>>
>>>>>> 2. Store all the data from all 3 datasets in the same index.  Yet this
>>>>>> brings the question of how to re-index datasets 1 and 2 using a DIH
>>>>>> full-import and not lose indexed data from data set 3.
>>>>>>
>>>>>> Just starting with Solr so please go easy ;).  Thanks in advance.
>>>>>>
>>>>>> Billy
>>
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>>
>>
>>

Reply via email to