Re: Possibilities of (near) real time search with solr

Peter Sturge Thu, 18 Nov 2010 13:08:32 -0800

> no, I only thought you use one day :-)
> so you don't or do you have 31 shards?
>


No, we use 1 shard per month - e.g. 7 shards will hold 7 month's of data.
It can be set to 1 day, but you would need to have a huge amount of
data in a single day to warrant doing that.



On Thu, Nov 18, 2010 at 8:20 PM, Peter Karich <[email protected]> wrote:
>
>
>>  Does yours need to be once a day?
>
> no, I only thought you use one day :-)
> so you don't or do you have 31 shards?
>
>
>>  having a look at Solr Cloud or Katta - could be useful
>>  here in dynamically allocating shards.
>
> ah, thx! I will take a look at it (after trying solr4)!
>
> Regards,
> Peter.
>
>
>>> Maybe I didn't fully understood what you explained: but doesn't this mean
>>> that you'll have one index per day?
>>> Or are you overwriting, via replicating, every shard and the number of
>>> shard
>>> is fixed?
>>> And why are you replicating from the local replica to the next shard?
>>> (why
>>> not directly from active to next shard?)
>>
>> Yes, you can have one index per day (for us, our boundary is typically
>> 1 month, so is less of an issue).
>> The 'oldest' replica in the round robin is overwritten, yes. We use
>> fixed shard numbers, but you don't have to.
>> Does yours need to be once a day?
>> We used our own round robin code because it was pre-Solr Cloud...
>> I'm not too familiar with them, but I believe it's certainly worth
>> having a look at Solr Cloud or Katta - could be useful here in
>> dynamically allocating shards.
>>
>> Peter
>>
>>
>>
>> On Thu, Nov 18, 2010 at 5:41 PM, Peter Karich<[email protected]>  wrote:
>>>
>>>  Hi Peter!
>>>
>>>> * I believe the NRT patches are included in the 4.x trunk. I don't
>>>> think there's any support as yet in 3x (uses features in Lucene 3.0).
>>>
>>> I'll investage how much effort it is to update to solr4
>>>
>>>> * For merging, I'm talking about commits/writes. If you merge while
>>>> commits are going on, things can get a bit messy (maybe on source
>>>> cores this is ok, but I have a feeling it's not).
>>>
>>> ok
>>>
>>>> * For moving data to a an 'offline' read-only core, this is the
>>>> trickiest
>>>> bit.
>>>> We do this today by using a round-robin chain of remote shards and 2
>>>> local cores. At the boundary time (e.g. 1 day), the 'active' core is
>>>> replicated locally, then this local replica is replicated to the next
>>>> shard in the chain. Once everything is complete, the local replica is
>>>> discarded, and the 'active' core is cleaned, being careful not to
>>>> delete any new data since the replicated commit point.
>>>
>>> Maybe I didn't fully understood what you explained: but doesn't this mean
>>> that you'll have one index per day?
>>> Or are you overwriting, via replicating, every shard and the number of
>>> shard
>>> is fixed?
>>> And why are you replicating from the local replica to the next shard?
>>> (why
>>> not directly from active to next shard?)
>>>
>>> Regards,
>>> Peter.
>>>
>
>
> --
> http://jetwick.com twitter search prototype
>
>

Re: Possibilities of (near) real time search with solr

Reply via email to