Yes, before indexing, we go and check whether that document is already
there in index or not.
Because along with the document, we also have meta-data information which
needs to be appended.

So, we have few multivalued metadata fields, which we update if the same
document is found again.


On Fri, Apr 6, 2012 at 10:17 AM, Walter Underwood <wun...@wunderwood.org>wrote:

> So you will need to do a search for each document before adding it to the
> index, in case it is already there. That will be slow.
>
> And where do you store the last-assigned number?
>
> And there are plenty of other problems, like reloading after a corrupted
> index (disk failure), or deleted documents which are re-added later, or
> duplicates, splitting content across shards (requires a global lock across
> all shards to index each document), ...
>
> Two recommendations:
>
> 1. Having two different unique IDs is likely to cause problems, so choose
> one.
>
> 2. If you must have two IDs, use one table in a lightweight relational
> database to store the relationships between the md5 value and the serial
> number.
>
> wunder
>
> On Apr 5, 2012, at 9:37 PM, Manish Bafna wrote:
>
> > Actually not.
> > If i am updating the existing document, i need to keep the old number
> > itself.
> >
> > may be this way we can do it.
> > If we pass the number to the field, it will take that value, if we dont
> > pass it, it will do auto-increment.
> > Because if we update, i will have old number and i will pass it as a
> field
> > again.
> >
> > On Fri, Apr 6, 2012 at 9:59 AM, Walter Underwood <wun...@wunderwood.org
> >wrote:
> >
> >> Why?
> >>
> >> When you reindex, is it OK if they all change?
> >>
> >> If you reindex one document, is it OK if it gets a new sequential
> number?
> >>
> >> wunder
> >>
> >> On Apr 5, 2012, at 9:23 PM, Manish Bafna wrote:
> >>
> >>> We already have a unique key (We use md5 value).
> >>> We need another id (sequential numbers).
> >>>
> >>> On Fri, Apr 6, 2012 at 9:47 AM, Chris Hostetter <
> >> hossman_luc...@fucit.org>wrote:
> >>>
> >>>>
> >>>> : We need to have a document id available for every document (Per
> core).
> >>>>
> >>>> : We can pass docid as one of the parameter for fq, and it will return
> >> the
> >>>> : docid in the search result.
> >>>>
> >>>>
> >>>> So it sounds like you need a *unique* id, but nothing you described
> >>>> requies that it be a counter.
> >>>>
> >>>> Take a look at the UUIDField, or consider using the
> >>>> SignatureUpdateProcessor to generate a key based on a hash of all the
> >>>> field values.
> >>>>
> >>>> -Hoss
> >>>>
> >>
> >>
> >>
> >>
> >>
> >>
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>

Reply via email to