This is an automated email from the ASF dual-hosted git repository. kassiez pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new f97c83201ec [update] Update alt description of images for seo (#2244) f97c83201ec is described below commit f97c83201ec0b0fcc63672eeec89cefeaafa91a9 Author: KassieZ <139741991+kass...@users.noreply.github.com> AuthorDate: Mon Mar 31 15:33:14 2025 +0800 [update] Update alt description of images for seo (#2244) ## Versions - [ ] dev - [ ] 3.0 - [ ] 2.1 - [ ] 2.0 ## Languages - [ ] Chinese - [ ] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built --- docs/admin-manual/auth/authorization/ranger.md | 14 +++++++------- docs/admin-manual/auth/ranger.md | 18 +++++++++--------- docs/admin-manual/maint-monitor/monitor-alert.md | 8 ++++---- .../trouble-shooting/memory-management/overview.md | 4 ++-- docs/admin-manual/workload-management/compute-group.md | 2 +- .../admin-manual/workload-management/resource-group.md | 2 +- docs/admin-manual/workload-management/spill-disk.md | 5 +++-- .../admin-manual/workload-management/workload-group.md | 2 +- docs/benchmark/tpcds.md | 2 +- docs/benchmark/tpch.md | 2 +- docs/compute-storage-decoupled/overview.md | 6 +++--- docs/data-operate/import/group-commit-manual.md | 4 ++-- .../import/import-way/stream-load-manual.md | 2 +- docs/db-connect/arrow-flight-sql-connect.md | 2 +- docs/db-connect/database-connect.md | 8 ++++---- docs/ecosystem/flink-doris-connector.md | 2 +- docs/gettingStarted/what-is-apache-doris.md | 12 ++++++------ .../integrated-storage-compute-deploy-manually.md | 2 +- docs/lakehouse/lakehouse-overview.md | 8 ++++---- docs/log-storage-analysis.md | 4 ++-- docs/releasenotes/v2.0/release-2.0.0.md | 6 +++--- .../version-2.1/gettingStarted/what-is-apache-doris.md | 12 ++++++------ .../version-2.1/table-design/index/bloomfilter.md | 2 +- .../version-3.0/compute-storage-decoupled/overview.md | 6 +++--- .../import/import-way/stream-load-manual.md | 2 +- .../version-3.0/gettingStarted/what-is-apache-doris.md | 10 +++++----- .../lakehouse/lakehouse-best-practices/doris-paimon.md | 2 +- versioned_docs/version-3.0/log-storage-analysis.md | 2 +- .../version-3.0/table-design/index/bloomfilter.md | 2 +- 29 files changed, 77 insertions(+), 76 deletions(-) diff --git a/docs/admin-manual/auth/authorization/ranger.md b/docs/admin-manual/auth/authorization/ranger.md index 7820e2f9265..2dd1f4eeac2 100644 --- a/docs/admin-manual/auth/authorization/ranger.md +++ b/docs/admin-manual/auth/authorization/ranger.md @@ -80,41 +80,41 @@ Equivalent to the internal Doris authorization statement `grant select_priv on * - The global option can be found in the dropdown box at the same level as the catalog. - Only `*` can be entered in the input box. -  +  #### Catalog Permissions Equivalent to the internal Doris authorization statement `grant select_priv on hive.*.* to user1`; - + #### Database Permissions Equivalent to the internal Doris authorization statement `grant select_priv on hive.db1.* to user1`; - + #### Table Permissions > Here, the term "table" generally refers to tables, views, and asynchronous > materialized views. Equivalent to the internal Doris authorization statement `grant select_priv on hive.db1.tbl1 to user1`; - + #### Column Permissions Equivalent to the internal Doris authorization statement `grant select_priv(col1,col2) on hive.db1.tbl1 to user1`; - + #### Resource Permissions Equivalent to the internal Doris authorization statement `grant usage_priv on resource 'resource1' to user1`; - The resource option can be found in the dropdown box at the same level as the catalog. - + #### Workload Group Permissions Equivalent to the internal Doris authorization statement `grant usage_priv on workload group 'group1' to user1`; - The workload group option can be found in the dropdown box at the same level as the catalog. - + ### Row-Level Permissions Example diff --git a/docs/admin-manual/auth/ranger.md b/docs/admin-manual/auth/ranger.md index 2a761efc2a3..f6bc3989ca1 100644 --- a/docs/admin-manual/auth/ranger.md +++ b/docs/admin-manual/auth/ranger.md @@ -114,11 +114,11 @@ In version 2.1.0, Doris supports unified permission management by integrating Ap After the installation is complete, open the Ranger WebUI and you can see the Apache Doris plug-in in the Service Manger interface: - + Click the `+` button next to the plugin to add a Doris service: - + The meaning of some parameters of Config Properties is as follows: @@ -248,39 +248,39 @@ Equivalent to Doris' internal authorization statement `grant select_priv on *.*. - The global option can be found in the dropdown menu of the same level in the catalog - Only `*` can be entered in the input box -  +  #### Catalog Privileges Equivalent to Doris' internal authorization statement `grant select_priv on hive.*.* to user1`; - + #### Database Privileges Equivalent to Doris' internal authorization statement `grant select_priv on hive.tpch.* to user1`; - + #### Table Privileges Equivalent to Doris' internal authorization statement `grant select_priv on hive.tpch.user to user1`; - + #### Column Privileges Equivalent to Doris' internal authorization statement `grant select_priv(name,age) on hive.tpch.user to user1`; - + #### Resource Privileges Equivalent to Doris' internal authorization statement `grant usage_priv on resource 'resource1' to user1`; - The resource option can be found in the dropdown menu of the same level in the catalog - + #### Workload Group Privileges Equivalent to Doris' internal authorization statement `grant usage_priv on workload group 'group1' to user1`; - The workload group option can be found in the dropdown menu of the same level in the catalog - + ### Row Policy Example diff --git a/docs/admin-manual/maint-monitor/monitor-alert.md b/docs/admin-manual/maint-monitor/monitor-alert.md index 6bde0166c1e..3f713d42d77 100644 --- a/docs/admin-manual/maint-monitor/monitor-alert.md +++ b/docs/admin-manual/maint-monitor/monitor-alert.md @@ -42,7 +42,7 @@ Welcome to provide better dashboard. Doris uses [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/) to collect and display input monitoring items. - + 1. Prometheus @@ -263,7 +263,7 @@ Here we briefly introduce Doris Dashboard. The content of Dashboard may change w 1. Top Bar -  +  * The upper left corner is the name of Dashboard. * The upper right corner shows the current monitoring time range. You can choose different time ranges by dropping down. You can also specify a regular refresh page interval. @@ -275,7 +275,7 @@ Here we briefly introduce Doris Dashboard. The content of Dashboard may change w 2. Row. -  +  In Grafana, the concept of Row is a set of graphs. As shown in the figure above, Overview and Cluster Overview are two different Rows. Row can be folded by clicking Row. Currently Dashboard has the following Rows (in continuous updates): @@ -288,7 +288,7 @@ Here we briefly introduce Doris Dashboard. The content of Dashboard may change w 3. Charts -  +  A typical icon is divided into the following parts: diff --git a/docs/admin-manual/trouble-shooting/memory-management/overview.md b/docs/admin-manual/trouble-shooting/memory-management/overview.md index 7a65b95c6f0..8a47194c28a 100644 --- a/docs/admin-manual/trouble-shooting/memory-management/overview.md +++ b/docs/admin-manual/trouble-shooting/memory-management/overview.md @@ -32,7 +32,7 @@ When facing complex calculations and large-scale operations with huge memory res ## Doris BE memory structure - + ``` Server physical memory: The physical memory used by all processes on the server, MemTotal seen by `cat /proc/meminfo` or `free -h`. @@ -97,7 +97,7 @@ For more information about Memory Tracker, refer to [Memory Tracker](./memory-fe Historical memory statistics can be viewed through Doris BE's Bvar page `http://{be_host}:{brpc_port}/vars/*memory_*`. Use the real-time memory statistics page `http://{be_host}:{be_web_server_port}/mem_tracker` to search for the Bvar page under the Memory Tracker Label to get the memory size change trend tracked by the corresponding Memory Tracker. `brpc_port` defaults to 8060. - + When the error process memory exceeds the limit or the available memory is insufficient, you can find the `Memory Tracker Summary` in the `be/log/be.INFO` log, which contains all the Memory Trackers of `Type=overview` and `Type=global`, to help users analyze the memory status at that time. For details, please refer to [Memory Log Analysis](./memory-analysis/memory-log-analysis.md) diff --git a/docs/admin-manual/workload-management/compute-group.md b/docs/admin-manual/workload-management/compute-group.md index e31257384c8..24a246d5087 100644 --- a/docs/admin-manual/workload-management/compute-group.md +++ b/docs/admin-manual/workload-management/compute-group.md @@ -26,7 +26,7 @@ under the License. Compute Group is a mechanism for physical isolation between different workloads in a storage-compute separation architecture. The basic principle of Compute Group is illustrated in the diagram below: - + - One or more BE nodes can form a Compute Group. diff --git a/docs/admin-manual/workload-management/resource-group.md b/docs/admin-manual/workload-management/resource-group.md index 2991a25a1ed..c656ce9db21 100644 --- a/docs/admin-manual/workload-management/resource-group.md +++ b/docs/admin-manual/workload-management/resource-group.md @@ -26,7 +26,7 @@ under the License. Resource Group is a mechanism under the compute-storage integration architecture that achieves physical isolation between different workloads. Its basic principle is illustrated in this diagram: - + - By using tags, BEs are divided into different groups, each identified by the tag's name. For example, in the diagram above, host1, host2, and host3 are all set to group a, while host4 and host5 are set to group b. diff --git a/docs/admin-manual/workload-management/spill-disk.md b/docs/admin-manual/workload-management/spill-disk.md index 809a7e8b950..3a5f5dfec0e 100644 --- a/docs/admin-manual/workload-management/spill-disk.md +++ b/docs/admin-manual/workload-management/spill-disk.md @@ -43,9 +43,10 @@ Currently, the operators that support spilling include: When a query triggers spilling, additional disk read/write operations may significantly increase query time. It is recommended to increase the FE Session variable query_timeout. Additionally, spilling can generate significant disk I/O, so it is advisable to configure a separate disk directory or use SSD disks to reduce the impact of query spilling on normal data ingestion or queries. The query spilling feature is currently disabled by default. -##Memory Management Mechanism +## Memory Management Mechanism Doris's memory management is divided into three levels: process level, Workload Group level, and Query level. - + + ### BE Process Memory Configuration The memory of the entire BE process is controlled by the mem_limit parameter in be.conf. Once Doris's memory usage exceeds this threshold, Doris cancels the current query that is requesting memory. Additionally, a background task asynchronously kills some queries to release memory or cache. Therefore, Doris's internal management operations (such as spilling to disk, flushing memtable, etc.) need to run when approaching this threshold to avoid reaching it. Once the threshold is reached, t [...] diff --git a/docs/admin-manual/workload-management/workload-group.md b/docs/admin-manual/workload-management/workload-group.md index 1d24e813e77..0d52f52e00b 100644 --- a/docs/admin-manual/workload-management/workload-group.md +++ b/docs/admin-manual/workload-management/workload-group.md @@ -29,7 +29,7 @@ Workload Group is an in-process mechanism for isolating workloads. It achieves resource isolation by finely partitioning or limiting resources (CPU, IO, Memory) within the BE process. Its principle is illustrated in the diagram below: - + The currently supported isolation capabilities include: diff --git a/docs/benchmark/tpcds.md b/docs/benchmark/tpcds.md index 149421addcf..47f5d0d83d8 100644 --- a/docs/benchmark/tpcds.md +++ b/docs/benchmark/tpcds.md @@ -35,7 +35,7 @@ This document mainly introduces the performance of Doris on the TPC-DS 1000G tes On 99 queries on the TPC-DS standard test data set, we conducted a comparison test based on Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 versions. - + ## 1. Hardware Environment diff --git a/docs/benchmark/tpch.md b/docs/benchmark/tpch.md index 6d5e2229d06..9d3a6e9b118 100644 --- a/docs/benchmark/tpch.md +++ b/docs/benchmark/tpch.md @@ -32,7 +32,7 @@ This document mainly introduces the performance of Doris on the TPC-H 1000G test On 22 queries on the TPC-H standard test data set, we conducted a comparison test based on Apache Doris 2.1.7-rc03 and Apache Doris 2.0.15.1 versions. - + ## 1. Hardware Environment diff --git a/docs/compute-storage-decoupled/overview.md b/docs/compute-storage-decoupled/overview.md index 18b6ee343a3..54aa05bba2b 100644 --- a/docs/compute-storage-decoupled/overview.md +++ b/docs/compute-storage-decoupled/overview.md @@ -32,17 +32,17 @@ The following sections will describe in detail how to deploy and use Apache Dori The overall architecture of Doris consists of two types of processes: Frontend (FE) and Backend (BE). The FE is primarily responsible for user request access, query parsing and planning, metadata management, and node management. The BE is responsible for data storage and query plan execution. ([More information](../gettingStarted/what-is-apache-doris)) -### **Compute-storage coupled** +### Compute-storage coupled In the compute-storage coupled mode, the BE nodes perform both data storage and computation, and multiple BE nodes forms a massively parallel processing (MPP) distributed computing architecture. - + ### **Compute-storage decoupled** The BE nodes no longer store the primary data. Instead, the shared storage layer serves as the unified primary data storage. Additionally, to overcome the performance loss caused by the limitations of the underlying object storage system and the overhead of network transmission, Doris introduces a high-speed cache on the local compute nodes. - + **Meta data layer:** diff --git a/docs/data-operate/import/group-commit-manual.md b/docs/data-operate/import/group-commit-manual.md index d87ec074069..315b10359e3 100644 --- a/docs/data-operate/import/group-commit-manual.md +++ b/docs/data-operate/import/group-commit-manual.md @@ -609,8 +609,8 @@ PROPERTIES ( JMeter Parameter Settings as Shown in the Images - - + + 1. Set the Init Statement Before Testing: diff --git a/docs/data-operate/import/import-way/stream-load-manual.md b/docs/data-operate/import/import-way/stream-load-manual.md index 9763a166814..e986b081936 100644 --- a/docs/data-operate/import/import-way/stream-load-manual.md +++ b/docs/data-operate/import/import-way/stream-load-manual.md @@ -54,7 +54,7 @@ When using Stream Load, it is necessary to initiate an import job through the HT The following figure shows the main flow of Stream Load, omitting some import details. - + 1. The client submits a Stream Load imports job request to the FE (Frontend). 2. The FE selects a BE (Backend) as the Coordinator node in a round-robin manner, which is responsible for scheduling the import job, and then returns an HTTP redirect to the client. diff --git a/docs/db-connect/arrow-flight-sql-connect.md b/docs/db-connect/arrow-flight-sql-connect.md index 949f4a70631..d2ba18e81ce 100644 --- a/docs/db-connect/arrow-flight-sql-connect.md +++ b/docs/db-connect/arrow-flight-sql-connect.md @@ -30,7 +30,7 @@ Since Doris 2.1, a high-speed data link based on the Arrow Flight SQL protocol h In Doris, query results are organized in columnar format as Blocks. In versions prior to 2.1, data could be transferred to the target client via MySQL Client or JDBC/ODBC drivers, but this required deserializing row-based Bytes into columnar format. By building a high-speed data transfer link based on Arrow Flight SQL, if the target client also supports Arrow columnar format, the entire transfer process avoids serialization and deserialization operations, completely eliminating the time [...] - + To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514). diff --git a/docs/db-connect/database-connect.md b/docs/db-connect/database-connect.md index 7032e6ac787..54bd17cd120 100644 --- a/docs/db-connect/database-connect.md +++ b/docs/db-connect/database-connect.md @@ -83,11 +83,11 @@ jdbc:mysql://FE_IP:FE_PORT/demo?sessionVariables=key1=val1,key2=val2 Create a MySQL connection to Apache Doris: - + Query in DBeaver: - + ## Built-in Web UI of Doris @@ -97,7 +97,7 @@ To access the Web UI, simply enter the URL in a web browser: http://fe_ip:fe_por The built-in Web console is primarily intended for use by the root account of the cluster. By default, the root account password is empty after installation. - + For example, you can execute the following command in the Playground to add a BE node. @@ -105,7 +105,7 @@ For example, you can execute the following command in the Playground to add a BE ALTER SYSTEM ADD BACKEND "be_host_ip:heartbeat_service_port"; ``` - + :::tip For successful execution of statements that are not related to specific databases/tables in the Playground, it is necessary to randomly select a database from the left-hand database panel. This limitation will be removed later. diff --git a/docs/ecosystem/flink-doris-connector.md b/docs/ecosystem/flink-doris-connector.md index f12a93a2f6b..c8ef9ed37bf 100644 --- a/docs/ecosystem/flink-doris-connector.md +++ b/docs/ecosystem/flink-doris-connector.md @@ -76,7 +76,7 @@ To use it with Maven, simply add the following dependency to your Pom file: ### Reading Data from Doris - + When reading data, Flink Doris Connector offers higher performance compared to Flink JDBC Connector and is recommended for use: diff --git a/docs/gettingStarted/what-is-apache-doris.md b/docs/gettingStarted/what-is-apache-doris.md index 112e86d8463..924f97a154d 100644 --- a/docs/gettingStarted/what-is-apache-doris.md +++ b/docs/gettingStarted/what-is-apache-doris.md @@ -37,7 +37,7 @@ Apache Doris has a wide user base. It has been used in production environments o As shown in the figure below, after various data integrations and processing, data sources are typically ingested into the real-time data warehouse Doris and offline lakehouses (such as Hive, Iceberg, and Hudi). These are widely used in OLAP analysis scenarios. - + Apache Doris is widely used in the following scenarios: @@ -74,7 +74,7 @@ The storage-compute integrated architecture of Apache Doris is streamlined and e - **Backend (BE):** Primarily responsible for data storage and query execution. Data is partitioned into shards and stored with multiple replicas across BE nodes. - + In a production environment, multiple FE nodes can be deployed for disaster recovery. Each FE node maintains a full copy of the metadata. The FE nodes are divided into three roles: @@ -96,7 +96,7 @@ Starting from version 3.0, a compute-storage decoupled deployment architecture c - **Storage Layer**: The storage layer can use shared storage solutions such as S3, HDFS, OSS, COS, OBS, Minio, and Ceph to store Doris's data files, including Segment files and inverted index files. - + ## Core Features of Apache Doris @@ -146,15 +146,15 @@ Apache Doris also supports strongly consistent single-table materialized views a Apache Doris has an MPP-based query engine for parallel execution between and within nodes. It supports distributed shuffle join for large tables to better handle complicated queries. - + The query engine of Apache Doris is fully vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual function calls, increase cache hit rates, and make efficient use of SIMD instructions. Apache Doris delivers a 5~10 times higher performance in wide table aggregation scenarios than non-vectorized engines. - + Apache Doris uses adaptive query execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate a runtime filter and push it to the probe side. Specifically, it pushes the filters to the lowest-level scan node on the probe side, which largely reduces the data amount to be processed and increases join performance. The runtime filter of Apache Doris supports In/Min/Max/Bloom Filter. - + Apache Doris uses a Pipeline execution engine that breaks down queries into multiple sub-tasks for parallel execution, fully leveraging multi-core CPU capabilities. It simultaneously addresses the thread explosion problem by limiting the number of query threads. The Pipeline execution engine reduces data copying and sharing, optimizes sorting and aggregation operations, thereby significantly improving query efficiency and throughput. diff --git a/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md b/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md index 9c0ebc341aa..be80f0b9b65 100644 --- a/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md +++ b/docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md @@ -27,7 +27,7 @@ After completing the preliminary checks and planning, such as environment checks The integrated storage-compute architecture is shown below, and the deployment of the integrated storage-compute cluster involves four steps: -[integrated-storage-compute-architecture](/images/getting-started/apache-doris-technical-overview.png) +[MPP-based integrated storage compute architecture](/images/getting-started/apache-doris-technical-overview.png) 1. **Deploy FE Master Node**: Deploy the first FE node as the Master node; diff --git a/docs/lakehouse/lakehouse-overview.md b/docs/lakehouse/lakehouse-overview.md index 0dbc631b480..9f5cb882118 100644 --- a/docs/lakehouse/lakehouse-overview.md +++ b/docs/lakehouse/lakehouse-overview.md @@ -30,7 +30,7 @@ under the License. Doris provides an excellent lakehouse solution for users through an extensible connector framework, a compute-storage decoupled architecture, a high-performance data processing engine, and data ecosystem openness. - + ### Flexible Data Access @@ -137,7 +137,7 @@ In the lakehouse solution, Doris is mainly used for **lakehouse query accelerati In this scenario, Doris acts as a **compute engine**, accelerating query analysis on lakehouse data. - + #### Cache Acceleration @@ -153,7 +153,7 @@ This feature can significantly improve query performance by reducing runtime com Doris can act as a **unified SQL query engine**, connecting different data sources for federated analysis, solving data silos. - + Users can dynamically create multiple catalogs in Doris to connect different data sources. They can use SQL statements to perform arbitrary join queries on data from different data sources. For details, refer to the [Catalog Overview](catalog-overview.md). @@ -161,7 +161,7 @@ Users can dynamically create multiple catalogs in Doris to connect different dat In this scenario, **Doris acts as a data processing engine**, processing lakehouse data. - + #### Task Scheduling diff --git a/docs/log-storage-analysis.md b/docs/log-storage-analysis.md index 068786df212..cdf261395fb 100644 --- a/docs/log-storage-analysis.md +++ b/docs/log-storage-analysis.md @@ -40,7 +40,7 @@ Focused on this solution, this chapter contains the following 3 sections: The following figure illustrates the architecture of the log storage and analysis platform built on Apache Doris: - + The architecture contains the following 3 parts: @@ -577,7 +577,7 @@ ORDER BY ts DESC LIMIT 10; Some third-party vendors offer visual log analysis development platforms based on Apache Doris, which include a log search and analysis interface similar to Kibana Discover. These platforms provide an intuitive and user-friendly exploratory log analysis interaction. - + - Support for full-text search and SQL modes diff --git a/docs/releasenotes/v2.0/release-2.0.0.md b/docs/releasenotes/v2.0/release-2.0.0.md index 85d0ea43dab..782eaf7fbe5 100644 --- a/docs/releasenotes/v2.0/release-2.0.0.md +++ b/docs/releasenotes/v2.0/release-2.0.0.md @@ -44,7 +44,7 @@ This new version highlights: In SSB-Flat and TPC-H benchmarking, Apache Doris 2.0.0 delivered **over 10-time faster query performance** compared to an early version of Apache Doris. - + This is realized by the introduction of a smarter query optimizer, inverted index, a parallel execution model, and a series of new functionalities to support high-concurrency point queries. @@ -54,7 +54,7 @@ The brand new query optimizer, Nereids, has a richer statistical base and adopts TPC-H tests showed that Nereids, with no human intervention, outperformed the old query optimizer by a wide margin. Over 100 users have tried Apache Doris 2.0.0 in their production environment and the vast majority of them reported huge speedups in query execution. - + **Doc**: https://doris.apache.org/docs/dev/query-acceleration/nereids/ @@ -66,7 +66,7 @@ In Apache Doris 2.0.0, we introduced inverted index to better support fuzzy keyw A smartphone manufacturer tested Apache Doris 2.0.0 in their user behavior analysis scenarios. With inverted index enabled, v2.0.0 was able to finish the queries within milliseconds and maintain stable performance as the query concurrency level went up. In this case, it is 5 to 90 times faster than its old version. - + ### 20 times higher concurrency capability diff --git a/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md b/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md index 4f3356dc923..bab0c6add59 100644 --- a/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md +++ b/versioned_docs/version-2.1/gettingStarted/what-is-apache-doris.md @@ -37,7 +37,7 @@ Apache Doris has a wide user base. It has been used in production environments o As shown in the figure below, after various data integrations and processing, data sources are typically ingested into the real-time data warehouse Doris and offline lakehouses (such as Hive, Iceberg, and Hudi). These are widely used in OLAP analysis scenarios. - + Apache Doris is widely used in the following scenarios: @@ -73,7 +73,7 @@ The storage-compute integrated architecture of Apache Doris is streamlined and e - **Backend (BE):** Primarily responsible for data storage and query execution. Data is partitioned into shards and stored with multiple replicas across BE nodes. - + In a production environment, multiple FE nodes can be deployed for disaster recovery. Each FE node maintains a full copy of the metadata. The FE nodes are divided into three roles: @@ -133,15 +133,15 @@ Apache Doris also supports strongly consistent single-table materialized views a Apache Doris has an MPP-based query engine for parallel execution between and within nodes. It supports distributed shuffle join for large tables to better handle complicated queries. - + The query engine of Apache Doris is fully vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual function calls, increase cache hit rates, and make efficient use of SIMD instructions. Apache Doris delivers a 5~10 times higher performance in wide table aggregation scenarios than non-vectorized engines. - - + + Apache Doris uses adaptive query execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate a runtime filter and push it to the probe side. Specifically, it pushes the filters to the lowest-level scan node on the probe side, which largely reduces the data amount to be processed and increases join performance. The runtime filter of Apache Doris supports In/Min/Max/Bloom Filter. - + Apache Doris uses a Pipeline execution engine that breaks down queries into multiple sub-tasks for parallel execution, fully leveraging multi-core CPU capabilities. It simultaneously addresses the thread explosion problem by limiting the number of query threads. The Pipeline execution engine reduces data copying and sharing, optimizes sorting and aggregation operations, thereby significantly improving query efficiency and throughput. diff --git a/versioned_docs/version-2.1/table-design/index/bloomfilter.md b/versioned_docs/version-2.1/table-design/index/bloomfilter.md index 49094e3b853..63552039260 100644 --- a/versioned_docs/version-2.1/table-design/index/bloomfilter.md +++ b/versioned_docs/version-2.1/table-design/index/bloomfilter.md @@ -38,7 +38,7 @@ A BloomFilter consists of a very long binary bit array and a series of hash func The figure below shows an example of a BloomFilter with m=18 and k=3 (where m is the size of the bit array and k is the number of hash functions). Elements x, y, and z in the set are hashed by 3 different hash functions into the bit array. When querying element w, if any bit calculated by the hash functions is 0, then w is not in the set. Conversely, if all bits are 1, it only indicates that w may be in the set, but not definitely, due to possible hash collisions. - + Thus, if all bits at the calculated positions are 1, it only indicates that the element may be in the set, not definitely, due to possible hash collisions. This is the "false positive" nature of BloomFilter. Therefore, a BloomFilter-based index can only skip data that does not meet the conditions but cannot precisely locate data that does. diff --git a/versioned_docs/version-3.0/compute-storage-decoupled/overview.md b/versioned_docs/version-3.0/compute-storage-decoupled/overview.md index 7f0b5cc1f75..25c71e65337 100644 --- a/versioned_docs/version-3.0/compute-storage-decoupled/overview.md +++ b/versioned_docs/version-3.0/compute-storage-decoupled/overview.md @@ -28,17 +28,17 @@ This article introduces the differences, advantages, and applicable scenarios of The following sections will describe in detail how to deploy and use Apache Doris in the compute-storage decoupled mode. For information on deployment in compute-storage coupled mode, please refer to the [Cluster Deployment](../../../docs/install/deploy-manually/integrated-storage-compute-deploy-manually) section. -## **Compute-storage coupled VS decoupled** +## Compute-storage coupled VS decoupled The overall architecture of Doris consists of two types of processes: Frontend (FE) and Backend (BE). The FE is primarily responsible for user request access, query parsing and planning, metadata management, and node management. The BE is responsible for data storage and query plan execution. ([More information](../gettingStarted/what-is-apache-doris)) -### **Compute-storage coupled** +### Compute-storage coupled In the compute-storage coupled mode, the BE nodes perform both data storage and computation, and multiple BE nodes forms a massively parallel processing (MPP) distributed computing architecture.  -### **Compute-storage decoupled** +### Compute-storage decoupled The BE nodes no longer store the primary data. Instead, the shared storage layer serves as the unified primary data storage. Additionally, to overcome the performance loss caused by the limitations of the underlying object storage system and the overhead of network transmission, Doris introduces a high-speed cache on the local compute nodes. diff --git a/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md b/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md index 9763a166814..988a8a38ba4 100644 --- a/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md +++ b/versioned_docs/version-3.0/data-operate/import/import-way/stream-load-manual.md @@ -54,7 +54,7 @@ When using Stream Load, it is necessary to initiate an import job through the HT The following figure shows the main flow of Stream Load, omitting some import details. - + 1. The client submits a Stream Load imports job request to the FE (Frontend). 2. The FE selects a BE (Backend) as the Coordinator node in a round-robin manner, which is responsible for scheduling the import job, and then returns an HTTP redirect to the client. diff --git a/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md b/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md index 15847a98acd..76b0afe75f6 100644 --- a/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md +++ b/versioned_docs/version-3.0/gettingStarted/what-is-apache-doris.md @@ -37,7 +37,7 @@ Apache Doris has a wide user base. It has been used in production environments o As shown in the figure below, after various data integrations and processing, data sources are typically ingested into the real-time data warehouse Doris and offline lakehouses (such as Hive, Iceberg, and Hudi). These are widely used in OLAP analysis scenarios. - + Apache Doris is widely used in the following scenarios: @@ -74,7 +74,7 @@ The storage-compute integrated architecture of Apache Doris is streamlined and e - **Backend (BE):** Primarily responsible for data storage and query execution. Data is partitioned into shards and stored with multiple replicas across BE nodes. - + In a production environment, multiple FE nodes can be deployed for disaster recovery. Each FE node maintains a full copy of the metadata. The FE nodes are divided into three roles: @@ -146,15 +146,15 @@ Apache Doris also supports strongly consistent single-table materialized views a Apache Doris has an MPP-based query engine for parallel execution between and within nodes. It supports distributed shuffle join for large tables to better handle complicated queries. - + The query engine of Apache Doris is fully vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual function calls, increase cache hit rates, and make efficient use of SIMD instructions. Apache Doris delivers a 5~10 times higher performance in wide table aggregation scenarios than non-vectorized engines. - + Apache Doris uses adaptive query execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate a runtime filter and push it to the probe side. Specifically, it pushes the filters to the lowest-level scan node on the probe side, which largely reduces the data amount to be processed and increases join performance. The runtime filter of Apache Doris supports In/Min/Max/Bloom Filter. - + Apache Doris uses a Pipeline execution engine that breaks down queries into multiple sub-tasks for parallel execution, fully leveraging multi-core CPU capabilities. It simultaneously addresses the thread explosion problem by limiting the number of query threads. The Pipeline execution engine reduces data copying and sharing, optimizes sorting and aggregation operations, thereby significantly improving query efficiency and throughput. diff --git a/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md b/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md index 8a7a5ac79f1..21756f9b7d8 100644 --- a/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md +++ b/versioned_docs/version-3.0/lakehouse/lakehouse-best-practices/doris-paimon.md @@ -226,7 +226,7 @@ mysql> select * from customer where c_nationkey=1 limit 2; We conducted a simple test on the TPCDS 1000 dataset in Paimon (0.8) version, using Apache Doris 2.1.5 version and Trino 422 version, both with the Primary Key Table Read Optimized feature enabled. - + From the test results, it can be seen that Doris' average query performance on the standard static test set is 3-5 times that of Trino. In the future, we will optimize the Deletion Vector to further improve query efficiency in real business scenarios. diff --git a/versioned_docs/version-3.0/log-storage-analysis.md b/versioned_docs/version-3.0/log-storage-analysis.md index 74e10aae94d..3562be7f0ec 100644 --- a/versioned_docs/version-3.0/log-storage-analysis.md +++ b/versioned_docs/version-3.0/log-storage-analysis.md @@ -40,7 +40,7 @@ Focused on this solution, this chapter contains the following 3 sections: The following figure illustrates the architecture of the log storage and analysis platform built on Apache Doris: - + The architecture contains the following 3 parts: diff --git a/versioned_docs/version-3.0/table-design/index/bloomfilter.md b/versioned_docs/version-3.0/table-design/index/bloomfilter.md index 49094e3b853..245f446458c 100644 --- a/versioned_docs/version-3.0/table-design/index/bloomfilter.md +++ b/versioned_docs/version-3.0/table-design/index/bloomfilter.md @@ -38,7 +38,7 @@ A BloomFilter consists of a very long binary bit array and a series of hash func The figure below shows an example of a BloomFilter with m=18 and k=3 (where m is the size of the bit array and k is the number of hash functions). Elements x, y, and z in the set are hashed by 3 different hash functions into the bit array. When querying element w, if any bit calculated by the hash functions is 0, then w is not in the set. Conversely, if all bits are 1, it only indicates that w may be in the set, but not definitely, due to possible hash collisions. - + Thus, if all bits at the calculated positions are 1, it only indicates that the element may be in the set, not definitely, due to possible hash collisions. This is the "false positive" nature of BloomFilter. Therefore, a BloomFilter-based index can only skip data that does not meet the conditions but cannot precisely locate data that does. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org