Hi,
On 11/25/2017 02:25 AM, Tom Lane wrote:
I wrote:
For me, this patch fixes the valgrind failures inside generation.c
itself, but I still see one more in the test_decoding run: ...
Not sure what to make of this: the stack traces make it look unrelated
to the GenerationContext changes, but if it's not related, how come
skink was passing before that patch went in?
I've pushed fixes for everything that I could find wrong in generation.c
(and there was a lot :-(). But I'm still seeing the "invalid read in
SnapBuildProcessNewCid" failure when I run test_decoding under valgrind.
Somebody who has more familiarity with the logical decoding stuff than
I do needs to look into that.
I tried to narrow down exactly which fetch in SnapBuildProcessNewCid was
triggering the failure, with the attached patch. Weirdly, *it does not
fail* with this. I have no explanation for that.
I have no explanation for that either. FWIW I don't think this is
related to the new memory contexts. I can reproduce it on 3bae43c (i.e.
before the Generation memory context was introduced), and with Slab
removed from ReorderBuffer.
I wonder if this might be a valgrind issue. I'm not sure which version
skink is using, but I'm running with valgrind-3.12.0-9.el7_4.x86_64.
BTW I also see these failures in hstore:
==15168== Source and destination overlap in memcpy(0x5d0fed0, 0x5d0fed0, 40)
==15168== at 0x4C2E00C: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:1018)
==15168== by 0x15419A06: hstoreUniquePairs (hstore_io.c:343)
==15168== by 0x15419EE4: hstore_in (hstore_io.c:416)
==15168== by 0x9ED11A: InputFunctionCall (fmgr.c:1635)
==15168== by 0x9ED3C2: OidInputFunctionCall (fmgr.c:1738)
==15168== by 0x6014A2: stringTypeDatum (parse_type.c:641)
==15168== by 0x5E1ADC: coerce_type (parse_coerce.c:304)
==15168== by 0x5E17A9: coerce_to_target_type (parse_coerce.c:103)
==15168== by 0x5EDD6D: transformTypeCast (parse_expr.c:2724)
==15168== by 0x5E8860: transformExprRecurse (parse_expr.c:203)
==15168== by 0x5E8601: transformExpr (parse_expr.c:156)
==15168== by 0x5FCF95: transformTargetEntry (parse_target.c:103)
==15168== by 0x5FD15D: transformTargetList (parse_target.c:191)
==15168== by 0x5A5EEC: transformSelectStmt (analyze.c:1214)
==15168== by 0x5A4453: transformStmt (analyze.c:297)
==15168== by 0x5A4381: transformOptionalSelectInto (analyze.c:242)
==15168== by 0x5A423F: transformTopLevelStmt (analyze.c:192)
==15168== by 0x5A4097: parse_analyze (analyze.c:112)
==15168== by 0x87E0AF: pg_analyze_and_rewrite (postgres.c:664)
==15168== by 0x87E6EE: exec_simple_query (postgres.c:1045)
Seems hstoreUniquePairs may call memcpy with the same pointers in some
cases (which looks a bit dubious). But the code is ancient, so it's
strange it didn't fail before.
regards
Tomas