Tanya-W opened a new pull request, #16569:
URL: https://github.com/apache/doris/pull/16569
# Proposed changes
Issue Number: close #xxx
## Problem summary
At the storage layer, the raw data of the index column will still be read
after apply index(bitmap_index or inverted_index), although the index column is
not in the result column returned by the query, that will generate more
performance overhead on seek and read data.
In addition, when there are multi-table join query, there will be many in or
not_in predicate of runtime filter pushed down to the storage layer. According
to our test, if apply those predicates by inverted index, the performance will
be degraded because there are many conditions in in_predicate. Therefore, the
inverted index not apply on in or not_in predicate which is produced by
runtime_filter.
Based on that situation, this pr will do:
1. reduce overhead on seek and read data for index column that only in where
clause, optimization for query sql like:
```
sql 1: SELECT timestamp FROM tb WHERE log MATCH 'error';
sql 2: SELECT timestamp FROM tb WHERE log MATCH 'error' ORDER BY timestamp
LIMIT 2;
sql 3: SELECT timestamp FROM tb WHERE log MATCH 'error' AND status = 404;
sql 4: SELECT timestamp FROM tb WHERE log MATCH 'error' AND status = 404
ORDER BY timestamp DESC LIMIT 10;
sql 5: SELECT count() FROM tb WHERE log MATCH 'error';
```
column `log` and column `status` is inverted index or bitmap index, above
sqls only need seek and read data of column `timestamp`
2. not apply inverted index on in or not_in predicate which is produced by
runtime_filter.
## Checklist(Required)
1. Does it affect the original behavior:
- [ ] Yes
- [ ] No
- [ ] I don't know
4. Has unit tests been added:
- [ ] Yes
- [ ] No
- [ ] No Need
5. Has document been added or modified:
- [ ] Yes
- [ ] No
- [ ] No Need
6. Does it need to update dependencies:
- [ ] Yes
- [ ] No
7. Are there any changes that cannot be rolled back:
- [ ] Yes (If Yes, please explain WHY)
- [ ] No
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[[email protected]](mailto:[email protected]) by explaining why you
chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]