Copilot commented on code in PR #1034:
URL:
https://github.com/apache/skywalking-banyandb/pull/1034#discussion_r3009524936
##########
banyand/internal/sidx/sidx.go:
##########
@@ -366,7 +368,7 @@ func (b *blockCursorBuilder) processWithFilter(req
QueryRequest, log *logger.Log
continue
}
- tags = b.collectTagsForFilter(tags, decoder, i)
+ tags = b.collectTagsForFilter(tags, decoder, i,
req.TagProjection)
Review Comment:
`collectTagsForFilter` now uses `req.TagProjection` to decide both the
*order* and the *set* of tags passed into `req.TagFilter.Match`. However
`TagFilterMatcher` (via `logical.tagFilterAdapter`) resolves tag values by
`(TagFamilyIdx, TagIdx)` from its schema, so the slice passed to `Match` must
align exactly with the schema used to build the matcher. If `TagProjection`
contains any projected tags that are not part of the matcher schema (e.g.
user-selected projection tags placed before condition tags), the indices shift
and filters can be evaluated against the wrong tag values (or error on nil
values), turning the previous flake into a deterministic wrong result. Consider
passing an explicit ordered list of condition-tag names/refs for matching, or
building the matcher schema using the same projection order that SIDX uses here.
```suggestion
tags = b.collectTagsForFilter(tags, decoder, i, nil)
```
##########
test/cases/stream/stream.go:
##########
@@ -43,6 +43,9 @@ var _ = g.DescribeTable("Scanning Streams", func(args
helpers.Args) {
gm.Eventually(func(innerGm gm.Gomega) {
verify(innerGm, args)
}, flags.EventuallyTimeout).Should(gm.Succeed())
+ gm.Consistently(func(innerGm gm.Gomega) {
+ verify(innerGm, args)
+ }, flags.ConsistentlyTimeout).Should(gm.Succeed())
Review Comment:
`Consistently` is called without an explicit polling interval, which means
Gomega will poll very frequently (default ~10ms). Since `verify` performs
integration queries, this can generate hundreds of requests per test case over
`flags.ConsistentlyTimeout`, increasing suite runtime and load. Consider
setting an explicit polling interval (similar to the `topn` suite) for the
`Consistently` assertion.
##########
test/cases/property/property.go:
##########
@@ -34,6 +34,9 @@ var (
gm.Eventually(func(innerGm gm.Gomega) {
propertyTestData.VerifyFn(innerGm, SharedContext, args)
}, flags.EventuallyTimeout).Should(gm.Succeed())
+ gm.Consistently(func(innerGm gm.Gomega) {
+ propertyTestData.VerifyFn(innerGm, SharedContext, args)
+ }, flags.ConsistentlyTimeout).Should(gm.Succeed())
Review Comment:
`Consistently` is called without an explicit polling interval, so it will
poll very frequently by default. Because `VerifyFn` performs integration
queries, this can create a large number of requests over
`flags.ConsistentlyTimeout`. Consider setting a larger polling interval (e.g.
200ms–2s) for the `Consistently` assertion to reduce load and runtime.
##########
banyand/internal/sidx/sidx.go:
##########
@@ -395,22 +397,66 @@ func (b *blockCursorBuilder) processWithoutFilter() {
}
}
-func (b *blockCursorBuilder) collectTagsForFilter(buf []*modelv1.Tag, decoder
func(pbv1.ValueType, []byte, [][]byte) *modelv1.TagValue, index int)
[]*modelv1.Tag {
+func (b *blockCursorBuilder) collectTagsForFilter(
+ buf []*modelv1.Tag,
+ decoder func(pbv1.ValueType, []byte, [][]byte) *modelv1.TagValue,
+ index int,
+ projections []model.TagProjection,
+) []*modelv1.Tag {
buf = buf[:0]
- for tagName, tagData := range b.block.tags {
+ // IMPORTANT: The logical tag filter matcher relies on stable tag
ordering,
+ // because it uses schema tag refs to locate tag values by index
(TagFamily.Tags[idx]).
+ // Iterating a Go map is randomized, which makes results flaky.
+ if len(projections) > 0 {
+ seenTagNames := make(map[string]struct{}, len(b.block.tags))
+ for _, proj := range projections {
+ for _, tagName := range proj.Names {
+ if _, exists := seenTagNames[tagName]; exists {
+ continue
+ }
Review Comment:
`collectTagsForFilter` allocates a fresh `seenTagNames` map on every row
processed (this function is called inside the per-element loop in
`processWithFilter`). With large blocks this can become allocation/GC heavy.
Consider precomputing a deduplicated, ordered `[]string` of tag names once per
block/request and reusing it for all rows, instead of rebuilding the dedupe map
each time.
##########
test/cases/trace/trace.go:
##########
@@ -41,6 +41,9 @@ var _ = g.DescribeTable("Scanning Traces", func(args
helpers.Args) {
gm.Eventually(func(innerGm gm.Gomega) {
verify(innerGm, args)
}, flags.EventuallyTimeout).Should(gm.Succeed())
+ gm.Consistently(func(innerGm gm.Gomega) {
+ verify(innerGm, args)
+ }, flags.ConsistentlyTimeout).Should(gm.Succeed())
Review Comment:
`Consistently` is called without an explicit polling interval, which means
Gomega will poll very frequently (default ~10ms). Since `verify` performs an
integration query/compare, this can result in hundreds of backend queries per
table entry over `flags.ConsistentlyTimeout`, significantly increasing suite
runtime and load. Consider setting a saner polling interval (e.g. 200ms–2s) for
the `Consistently` assertion.
```suggestion
}, flags.ConsistentlyTimeout, 500*time.Millisecond).Should(gm.Succeed())
```
##########
test/cases/measure/measure.go:
##########
@@ -36,6 +36,9 @@ var (
gm.Eventually(func(innerGm gm.Gomega) {
measureTestData.VerifyFn(innerGm, SharedContext, args)
}, flags.EventuallyTimeout).Should(gm.Succeed())
+ gm.Consistently(func(innerGm gm.Gomega) {
+ measureTestData.VerifyFn(innerGm, SharedContext, args)
+ }, flags.ConsistentlyTimeout).Should(gm.Succeed())
Review Comment:
`Consistently` is called without an explicit polling interval, so it will
poll very frequently by default. Given `VerifyFn` is an integration query, this
can generate many requests over `flags.ConsistentlyTimeout` and slow the suite.
Consider specifying a polling interval (e.g. 200ms–2s) for the `Consistently`
assertion.
```suggestion
}, flags.ConsistentlyTimeout,
500*time.Millisecond).Should(gm.Succeed())
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]