Re: Custom close to index metadata / pass commit data to writer.commit

Erick Erickson Sun, 24 Jun 2012 18:02:55 -0700

Yeah, it's a bit kludgy I admit. But it's usable right now, pragmatism
rules sometimes...


But the never returning this doc is actually relatively easy, just put
your data in a
field that no other document has. There's no requirement that any document in
Solr have any field in common with any other document, except when
required="true".

I guess you might see this doc in the case of *:* queries, but I _believe_ that
you'd be pretty safe here because this document will be at the end of
your insertions.
There is some possibility of order-shuffling when merging, but in the worst case
this doc would be at the end of its segment. So a simple test for the
"very special
<uniqueKey>" at the app level should insure at least nobody sees it.

You could also do something like encrypt it just in case and live with
the resulting
garbage display in the unlikely case someone actually saw it.

But personally I think that's all overkill, and if it's some simple
statistical data that the end
use might be puzzled at but wouldn't compromise your app, it might be worth it.

FWIW
Erick

On Sun, Jun 24, 2012 at 3:44 PM, Jozef Vilcek <[email protected]> wrote:
> On Sun, Jun 24, 2012 at 1:18 AM, Erick Erickson <[email protected]> 
> wrote:
>> see: https://issues.apache.org/jira/browse/SOLR-2701.
>>
>
> Hey, that is what I want :) Thanks for the reference. I am unlucky
> that there seems to be no progress on this ( as far as I can tell ).
> I would be able to use commitData in rather non-invasive way in 3.6
> release, but I fear of future and other releases ...
>
>
>> But there's an easier alternative. Just have a _very special_ document
>> with a known<unqueKey> that you index at the end of the run that
>> 1> has no fields in common with any other document (except uniqueKey)
>> 2> contains whatever data you want to carry around in whatever format you 
>> want.
>>
>> Now whenever you query for that document by ID, you get your info. And
>> since you can't search the doc until after it's been committed, you know
>> that the preceding documents have all been persisted....
>>
>> Of course whenever you send a version of the doc it will overwrite the
>> one before since it has the same <uniqueKey>
>>
>
> Yes, we thought about having this data stored like a "special"
> document type, but conceptually it just does not feel right. Also, I
> fear of extra modifications and maintaining some query defaults to
> never return this document for any kind of search query ...
>
>
>> Best
>> Erick
>>
>> On Fri, Jun 22, 2012 at 5:34 AM, Jozef Vilcek <[email protected]> wrote:
>>> Hi everyone,
>>>
>>> I am seeking to solution to store some custom data very close to /
>>> within index. I have found a possibility to pass commit "user" data to
>>> IndexWriter:
>>> http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/index/IndexWriter.html#commit(java.util.Map)
>>> which are from what I understand stored somewhere close to segments
>>> "metadata" like index version, generation, ...
>>>
>>> Now, I see no easy way to accumulate and pass along such data with
>>> Solr 3.6. DirectUpdateHandler2 is committing implicitly via close
>>> rather than invoking commit API. I can extend DirectUpdateHander2 and
>>> alter closeWriter method but still ... I am not yet clear how to pass
>>> along request level params which are not available at
>>> DirectUpdateHandler2 level. It seems that passing commitData is not
>>> supported ( maybe not wanted to by by design ) and not going to be as
>>> when I look at Solr trunk, I see implicit commit removed,
>>> writer.commit with passing commitData used but no easy way how to pass
>>> custom commit data nor how to easily hook in.
>>>
>>> Any recommendations for how to store some data close to index?
>>>
>>> To throw some light why I what this ... Basically I want to store
>>> there some kind of time stamp, which defines what is already in the
>>> index with respect to feeding updates from external world. Now, my
>>> index is replicated to other index instance in different data center
>>> (serving traffic as well). When default document feed in DC1 go south
>>> for some reason, backup in DC2 bumps in to keep updates alive ... but
>>> it has to know from where the feed should start ... that would be that
>>> kind of time stamp stored and replicated with index.
>>>
>>> Many thanks in advance.
>>>
>>> Best,
>>> Jozef

Re: Custom close to index metadata / pass commit data to writer.commit

Reply via email to