EmmyMiao87 edited a comment on issue #4370:
URL: 
https://github.com/apache/incubator-doris/issues/4370#issuecomment-684818724


   # New Feature
   
   ## Query spill to disk
   
   Doris supports query spill to disk in sorting and window functions. When the 
`enable_spilling` is true and memory limit is reached, the query will spill to 
disk so as to avoid the problem of unable to query due to memory bottleneck. 
The 0.13 version supports spill in sort and window function.
   
   [#3820] [#4151] [#4152]
   
   ## Support `bitmap_union`, `hll_union` and `count` in materialized view
   
   Materialized view supports richer aggregate functions: `bitmap_union`, 
`hll_union` and `count`. In the Order scenario, user needs to analyze the 
number of orders in different dimensions by count. Also the pre-calculation of 
bitmap and hll function can be performed for some deduplication analysis 
scenarios such as analyzing PV and UV data in website traffic. Doris can 
automatically match the user's query to an optimal materialized view to speed 
up the query. 
   
   [#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
   
   ## Spark load
   
   Spark load implements the preprocessing of imported data through external 
Spark resources, improves the import performance of Doris large data volume and 
saves Doris cluster computing resources. It is mainly used for scenarios where 
a large amount of data is imported into Doris during the initial migration.
   
   [#3418] [#3712] [#3715] [#3716]
   
   ## Support load json-data into Doris by RoutineLoad or StreamLoad
   
   RoutineLoad and StreamLoad support a new data format: json. The data in json 
format is finally imported into Doris through the transform rules in the load 
statement. This function is especially beneficial for log services whose 
original data format is json. Users no longer need to process the data into csv 
format in the outer layer.
   
   [#3553]
   
   ## Modify routine load
   
   The properties of routine load such as concurrency, Kafka consumption 
progress could be modify by `ALTER ROUTINE LOAD` stmt. Only jobs in the PAUSED 
state can be modified. After routine load is modified, the newly set properties 
will be used to plan the task when the task is scheduled again.
   
   [#4158]
   
   ## Support fetch `_id` from ES and create table with wildcard or aliase 
index of ES
   
   There is `_id` field from native ES document which is primary-key for ES 
index. This field could be fetch by Doris on ES. Also, Doris support create 
external table with `aliases` or `wildcard index` such as `log_*`. User can 
easily search all those index by using aliases and wildcards to match those 
indexes.
   
   [#3900] [#3968 
   ## Logstash Doris output plugin
   
   Logstash plugin is used to output data to Doris for logstash. Use the HTTP 
protocol to interact with the Doris FE Http interface
   Load data through Doris's stream load.
   
   [#3800]
   
   ## Support `SELECT INTO OUTFILE`
   
   Doris currently supports exporting query results to a third-party file 
system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar 
manual. The export format is CSV. The export query results could be provide to 
other users to download or further processing by other systems. Especially good 
for this kind that the result reset is too large to through the MySQL protocol 
such as a large number of ids by `bitmap_to_string`.
   
   [#3584]
   
   ## Support in predicate in delete statement
   
   The delete statement supports conditions for IN or NOT IN predicate. Users 
can delete rows that meet different values through this function.
   
   [#4006]
   
   # Enhancement
   
   ## Compaction rules optimization
   
   This optimization updated the strategy for triggering compaction, a version 
merging strategy that compromises write amplification, space amplification, and 
read performance (it tends to merge files of adjacent sizes). When the number 
of the same version is the same, the number of merges is reduced and the total 
number of files is reduced.
   
   [#4212]  
   
   ## Simplify the delete process to make it fast
   
   The load checker of the rotation training during deletion is cancelled and 
replaced by txn callback, which will reduce the corresponding time of the 
delete command to the millisecond level. 
   
   [#3191]
   
   ## Support simple transitivity on join predicate pushdown 
   
   When the columns involved in the query filter predicate are consistent with 
the columns involved in the join condition, the filter predicate can conduct 
column transmission and also filter another table in the join, reducing the 
amount of data and achieving the effect of improving the query speed.
   
   [#3453]
   
   ## Non blocking OlapTableSink 
   
   In this optimization, the sending process and the adding row process are 
executed concurrently in `OlapTableSink`, and the load performance is always 
improved. After testing, 56G broker load, the origin ver will run for 4 hours, 
the multi-ver can halve the time.
   
   [#3143]
   
   ## Support txn management in db level and use ArrayDeque to improve txn task 
performance
   
   The transaction management part supports the division of db levels, and each 
db does not block each other, which improves the execution efficiency of 
transaction tasks
   
   [#3369]
   
   ## Improve the performance of query with IN predicate
   
   Add a new config `max_pushdown_conditions_per_column` to limit the number of 
conditions of a single column that can be pushed down to the storage engine. It 
is different from the previous configuration that controls the split scan key. 
The default value alone is 1024. After the two configurations are separated, 
the qps of Doris has improved, and the CPU usage rate has also decreased. 
   
   [#3694]
   
   ## Optimize load reading parquet format file
   
   There is a cache buffer array in broker reading process when reading parquet 
file. When a broker about to seek for a position and get data from remote 
parquet file, try reading with this position in the cache buffer array. Once 
the expected data hits the cache buffer array, then we don't bother to read 
data from remote parquet file. After testing, the load time of parquet file in 
broker or spark load can halve the time.
   
   [#3878]
   
   # New Built-in Functions
   
   + `bitmap_intersect` [#3571]
   + `orthogonal_bitmap_intersect` in UDAF [#4198]
   + `orthogonal_bitmap_intersect_count` in UDAF [#4198]
   + `orthogonal_bitmap_union_count` in UDAF [#4198]
   
   
   # Other
   
   + Support to modify configs when BE is running without restarting (#3264)
   + Support setting replica quota in db level (#3283)
   + [Doris On ES][Bug-fix] Solve the problem of time format processing.(#3941)
   + [Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.(#3751)
   + [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) 
(#4352)
   + [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with 
text type(#4300)
   + [CodeRefactor] Modify FE modules (#4146)
   + [CodeRefactor] Generate jave files using maven (#4133)
   + [Compaction] Add delayed deletion of rowsets function, fix -230 error. 
(#4039)
   + [DOCS] documents rebuild with Vuepress (#3414)
   + [Webserver] Make BE webserver more pretty (#4050)
   + [Webserver] Introduce mustache to simplify BE's website render (#4062)
   + [Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and 
enable doc_values scan default (#4055)
   + [Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. 
(#4012)
   + [Doris On ES][Enhancement] Ignore _total node for efficiency and fully 
trusted document count (#3932)
   + [ColocateJoin] Support table join itself by colocate join (#4231)
   
   # API Change
   
   + [DynamicPartition] Optimize the rule of creating dynamic partition (#3679)
   + [SegmentV2] Change the default storage format to SegmentV2 (#4387)
   + [License] Organize and modify the license of the code (#4371)
   + [UDF] Fix large string val allocation failure (#3724)
   
   # Credits
   
   @ZhangYu0123
   @wfjcmcb
   @Fullstop000
   @sduzh
   @stalary
   @worker24h
   @chaoyli
   @vagetablechicken
   @jmk1011
   @funyeah
   @wutiangan
   @gengjun-git
   @xinghuayu007
   @EmmyMiao87
   @songenjie
   @acelyc111
   @yangzhg
   @Seaven
   @hexian55
   @ChenXiaoFei
   @WingsGo
   @kangpinghuang
   @wangbo
   @weizuo93
   @sdgshawn
   @skyduy
   @wyb
   @gaodayue
   @HappenLee
   @kangkaisen
   @wuyunfeng
   @HangyuanLiu
   @xy720
   @liutang123
   @caiconghui
   @liyuance
   @spaces-X
   @hffariel
   @decster
   @blackfox1983
   @Astralidea
   @morningman
   @hf200012
   @xbyang18
   @Youngwb
   @imay
   @marising
   @caoyang10


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to