Yes, id is unique key. <field name="id" type="string" indexed="true"
stored="true" required="true" multiValued="false" />

> I bet that if you redefined your updateHandler to give it some name other
than “/update” in solrconfig.xml two things would happen:

Hmm, nice. I didn't think of that but that would definitely identify the
problem. We do have other scripts writing to the index but they are not of
type "talk"; talk is handled completely by a single script (although my
suspicion has been that we have a rogue script somewhere).

Will definitely give a rename a try, at least for my own sanity.

Thanks again.

On Mon, 28 Sep 2020 at 21:26, Erick Erickson <erickerick...@gmail.com>
wrote:

> Is your “id” field is your <uniqueKey>, and is it tokenized? It shouldn’t
> be, use something like “string” or keywordTokenizer. Definitely do NOT use,
> say, text_general.
>
> It’s very unlikely that records are not being flushed on commit, I’m
> 99.99% certain that’s a red herring and that this is a problem in your
> environment.
>
> Or that some process you don’t know about is sending documents that don’t
> have the information you expect. The fact that you say you’ve disabled your
> update scripts but see this second record being indexed 3 minutes later is
> strong evidence that _someone_ is updating records, is there a cron job
> somewhere that’s sending docs? Other??
>
> I bet that if you redefined your updateHandler to give it some name other
> than “/update” in solrconfig.xml two things would happen:
> 1> this problem will go away
> 2> you’ll get some error report from somewhere telling you that Solr is
> broken because it isn’t accepting documents for update ;)
>
> > On Sep 28, 2020, at 9:01 AM, Mr Havercamp <mrhaverc...@gmail.com> wrote:
> >
> > Thanks Eric. My knowledge is fairly limited but 1) sounds feasible. Some
> > logs:
> >
> > I write a bunch of recods to Solr:
> >
> > 2020-09-28 11:01:01.255 INFO  (qtp918312414-21) [   x:vnc]
> > o.a.s.u.p.LogUpdateProcessorFactory [vnc]  webapp=/solr path=/update
> > params={json.nl=flat&omitHeader=false&wt=json}{add=[
> > talk.tq0rkem4pc.jaydeep.pan...@dev.vnc.de (1679075166122934272),
> > talk.tq0rkem4pc.dmitry.zolotni...@dev.vnc.de (1679075166123982848),
> > talk.tq0rkem4pc.hayden.yo...@dev.vnc.de (1679075166125031424),
> > talk.tq0rkem4pc.nishant.j...@dev.vnc.de (1679075166125031425),
> > talk.tq0rkem4pc.macanh....@dev.vnc.de (1679075166126080000),
> > talk.tq0rkem4pc.kapil.nadiyap...@dev.vnc.de (1679075166126080001),
> > talk.tq0rkem4pc.sanjay.domad...@dev.vnc.de (1679075166126080002),
> > talk.tq0rkem4pc.umesh.sarva...@dev.vnc.de
> (1679075166127128576)],commit=} 0
> > 8
> >
> > Selecting records looks good:
> >
> >      {
> >        "talk_id_s":"tq0rkem4pc",
> >        "talk_internal_id_s":"29896",
> >        "from_s":"from address",
> >        "content_txt":["test_1000016"],
> >        "raw_txt":["<body xmlns=\"http://www.w3.org/1999/xhtml\
> > ">test_1000016</body>"],
> >        "created_dt":"2020-09-28T11:00:02Z",
> >        "updated_dt":"2020-09-28T11:00:02Z",
> >        "type_s":"talk",
> >        "talk_type_s":"groupchat",
> >        "title_s":"role__change__1_talk@conference",
> >        "to_ss":["bunch", "of", "names"],
> >        "owner_s":"owner address",
> >        "id":"talk.tq0rkem4pc.email@address",
> >        "_version_":1679075166127128576}
> >
> > Then, a few minutes later:
> >
> > 2020-09-28 11:04:33.070 INFO  (qtp918312414-21) [   x:vnc]
> > o.a.s.u.p.LogUpdateProcessorFactory [vnc]  webapp=/solr path=/update
> > params={wt=json}{add=[talk.tq0rkem4pc.hayden.yo...@dev.vnc.de
> > (1679075388234399744)]} 0 1
> > 2020-09-28 11:04:33.150 INFO  (qtp918312414-21) [   x:vnc]
> > o.a.s.u.p.LogUpdateProcessorFactory [vnc]  webapp=/solr path=/update
> > params={wt=json}{add=[talk.tq0rkem4pc (1679075388318285824)]} 0 0
> >
> > Checking the record again:
> >
> > {
> >        "id":"talk.tq0rkem4pc.email@address",
> >        "_version_":1679075388234399744},
> >      {
> >        "id":"talk.tq0rkem4pc",
> >        "_version_":1679075388318285824}
> >
> > A couple of strange things here:
> >
> > 1. my talk.tq0rkem4pc.email@address record no longer has any data in it.
> > Just id and version.
> >
> > 2. The second entry is really strange; this isn't a valid record at all
> and
> > I don't have any record of creating it.
> >
> > I've ruled out reindexing items both from my indexing script (I just
> don't
> > run it) and an external code snippet updating the record at a later time.
> >
> > Not sure if I've got the terminology right but would I be correct in
> > assuming that it is possible records are not being flushed from the
> buffer
> > when added? I'm assuming there is some kind of buffering or caching going
> > on before records are commttted? Is it possible they are getting
> corrupted
> > under higher than usual load?
> >
> >
> > On Mon, 28 Sep 2020 at 20:41, Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> There are several possibilities:
> >>
> >> 1> you simply have some process incorrectly updating documents.
> >>
> >> 2> you’ve changed your schema sometime without completely deleting your
> >> old index and re-indexing all documents from scratch. I recommend in
> fact
> >> indexing into a new collection and using collection aliasing if you
> can’t
> >> delete and recreate the collection before re-indexing. There’s some
> support
> >> for this idea because you say that the doc in question not only changes
> one
> >> way, but then changes back mysteriously. So seg1 (old def) merges with
> seg2
> >> (new def) into seg10 using the old def because merging saw seg1 first.
> Then
> >> sometime later seg3 (new def) merges with seg10 and the data
> mysteriously
> >> comes back because that merge uses seg3 (new def) as a template for how
> the
> >> index “should” look.
> >>
> >> But I’ve never heard of Solr (well, Lucene actually) doing this by
> itself,
> >> and I have heard of the merging process doing “interesting” stuff with
> >> segments created with changed schema definitions.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Sep 28, 2020, at 8:26 AM, Mr Havercamp <mrhaverc...@gmail.com>
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We're seeing strange behaviour when records have been committed. It
> >> doesn't
> >>> happen all the time but enough that the index is very inconsistent.
> >>>
> >>> What happens:
> >>>
> >>> 1. We commit a doc to Solr,
> >>> 2. The doc shows in the search results,
> >>> 3. Later (may be immediate, may take minutes, may take hours), the same
> >>> document is emptied of all data except version and id.
> >>>
> >>> We have custom scripts which add to the index but even without them
> being
> >>> executed we see records being updated in this way.
> >>>
> >>> For example committing:
> >>>
> >>> { id: talk.1234, from: "me", to: "you", "content": "some content",
> title:
> >>> "some title"}
> >>>
> >>> will suddenly end up as after an initial successful search:
> >>>
> >>> { id: talk.1234, version: 1234}
> >>>
> >>> Not sure how to proceed on debugging this issue. It seems to settle in
> >>> after Solr has been running for a while but can just as quickly rectify
> >>> itself.
> >>>
> >>> At a loss how to debug and proceed.
> >>>
> >>> Any help much appreciated.
> >>
> >>
>
>

Reply via email to