compasses opened a new issue, #10733: URL: https://github.com/apache/doris/issues/10733
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues. ### Description To speed up like queries we have pushed the like function to storage layer in PR #10355 , which can get 2x~3x performance gain, no matter vectorized or not. But we want to go the extra mile, and make it more faster and less resource overhead. Base on that, we are going to implement a new index for like queries. We have researched several solutions such as pg_trgm from postgresql、ngrambf from clickhouse and FST from elasticsearch. Since Doris have bloom filter index already, in consideration of complexity、function scope and compatibility. Finally, we will choose the way as clickhouse did ```ngrambf_v1(n, size_of_bloom_filter_in_bytes, number_of_hash_functions, random_seed)```: the input column string is split into n-grams (first parameter – n-gram size), and then stored in a bloom filter. During query, the like pattern will also be split to n-grams and generate a bloom filter to do the filter, use the bloom filter to skip granule. For doris here is the details: 1. Reuse the exist bloom filter index read/write process, and the storage layer will be unaffected. 2. Add a new kind of bloom filter index, example : "ngram_bloom_filter_columns" = "(col1,n,512), (col2,n,512)",n-gram size, 512-bloom filter size in bytes,n and 512 all can be configured,and both have default value like (3,512). 3. Add new type of algorithm: NGRAM_BLOOM_FILTER, which will extract gram and calculate the bloom filter. 4. For the new algorithm the HashStrategy will follow the clickhouse 5. Query will support index filter pages for like queries , if exist the ngram bloom filter, which base the #10355 6. Support add index for history data:ALTER TABLE <db.table_name> SET ("ngram_bloom_filter_columns" = "(col1,n,512), (col2,n,512)").  That's all, thanks. ### Use case _No response_ ### Related issues _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org