(doris) 01/01: Update README.md

kassiez Mon, 17 Mar 2025 21:12:36 -0700

This is an automated email from the ASF dual-hosted git repository.

kassiez pushed a commit to branch KassieZ-patch-1
in repository https://gitbox.apache.org/repos/asf/doris.git


commit a7de0fccfb482d93e9580d3b43645acc9dff82bc
Author: KassieZ <139741991+kass...@users.noreply.github.com>
AuthorDate: Tue Mar 18 12:11:02 2025 +0800

    Update README.md
---
 README.md | 119 ++++++++++++++++++++++++++++++++++++++++----------------------
 1 file changed, 78 insertions(+), 41 deletions(-)

diff --git a/README.md b/README.md
index d850df18001..3d8e4233a85 100644
--- a/README.md
+++ b/README.md
@@ -25,8 +25,8 @@ under the License.
 [![GitHub 
release](https://img.shields.io/github/release/apache/doris.svg)](https://github.com/apache/doris/releases)
 
[![OSSRank](https://shields.io/endpoint?url=https://ossrank.com/shield/516)](https://ossrank.com/p/516)
 [![Commit 
activity](https://img.shields.io/github/commit-activity/m/apache/doris)](https://github.com/apache/doris/commits/master/)
-[![EN 
doc](https://img.shields.io/badge/Docs-English-blue.svg)](https://doris.apache.org/docs/get-starting/quick-start)
-[![CN 
doc](https://img.shields.io/badge/文档-中文版-blue.svg)](https://doris.apache.org/zh-CN/docs/get-starting/quick-start/)
+[![EN 
doc](https://img.shields.io/badge/Docs-English-blue.svg)](https://doris.apache.org/docs/gettingStarted/what-is-apache-doris)
+[![CN 
doc](https://img.shields.io/badge/文档-中文版-blue.svg)](https://doris.apache.org/zh-CN/docs/gettingStarted/what-is-apache-doris)
 
 <div>
 
@@ -42,7 +42,7 @@ under the License.
     &nbsp;
     <a href="https://github.com/apache/doris/discussions";><img 
src="https://img.shields.io/badge/- Discussion 
-red?style=social&logo=discourse" height=25></a>
     &nbsp;
-    <a 
href="https://apachedoriscommunity.slack.com/join/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA";><img
 src="https://img.shields.io/badge/-Slack-red?style=social&logo=slack"; 
height=25></a>
+    <a 
href="https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw";
 height=25></a>
     &nbsp;
     <a href="https://medium.com/@ApacheDoris";><img 
src="https://img.shields.io/badge/-Medium-red?style=social&logo=medium"; 
height=25></a>
 
@@ -73,65 +73,104 @@ As shown in the figure below, after various data 
integration and processing, the
 
 <br />
 
+
 Apache Doris is widely used in the following scenarios:
 
-- Reporting Analysis
+- **Real-time Data Analysis**:
 
-    - Real-time dashboards
-    - Reports for in-house analysts and managers
-    - Highly concurrent user-oriented or customer-oriented report analysis: 
such as website analysis and ad reporting that usually require thousands of QPS 
and quick response times measured in milliseconds. A successful user case is 
that Doris has been used by the Chinese e-commerce giant JD.com in ad 
reporting, where it receives 10 billion rows of data per day, handles over 
10,000 QPS, and delivers a 99 percentile query latency of 150 ms.
+  - **Real-time Reporting and Decision-making**: Doris provides real-time 
updated reports and dashboards for both internal and external enterprise use, 
supporting real-time decision-making in automated processes.
+  
+  - **AD Hoc Analysis**: Doris offers multidimensional data analysis 
capabilities, enabling rapid business intelligence analysis and ad hoc queries 
to help users quickly uncover insights from complex data.
+  
+  - **User Profiling and Behavior Analysis**: Doris can analyze user behavior 
such as participation, retention, and conversion, while also supporting 
scenarios like population insights and crowd selection for behavior analysis.
 
-- Ad-Hoc Query. Analyst-oriented self-service analytics with irregular query 
patterns and high throughput requirements. XiaoMi has built a growth analytics 
platform (Growth Analytics, GA) based on Doris, using user behavior data for 
business growth analysis, with an average query latency of 10 seconds and a 
95th percentile query latency of 30 seconds or less, and tens of thousands of 
SQL queries per day.
+- **Lakehouse Analytics**:
 
-- Unified Data Warehouse Construction. Apache Doris allows users to build a 
unified data warehouse via one single platform and save the trouble of handling 
complicated software stacks. Chinese hot pot chain Haidilao has built a unified 
data warehouse with Doris to replace its old complex architecture consisting of 
Apache Spark, Apache Hive, Apache Kudu, Apache HBase, and Apache Phoenix.
+  - **Lakehouse Query Acceleration**: Doris accelerates lakehouse data queries 
with its efficient query engine.
+  
+  - **Federated Analytics**: Doris supports federated queries across multiple 
data sources, simplifying architecture and eliminating data silos.
+  
+  - **Real-time Data Processing**: Doris combines real-time data streams and 
batch data processing capabilities to meet the needs of high concurrency and 
low-latency complex business requirements.
 
-- Data Lake Query. Apache Doris avoids data copying by federating the data in 
Apache Hive, Apache Iceberg, and Apache Hudi using external tables, and thus 
achieves outstanding query performance.
+- **SQL-based Observability**:
 
-## 🖥️ Core Concepts
+  - **Log and Event Analysis**: Doris enables real-time or batch analysis of 
logs and events in distributed systems, helping to identify issues and optimize 
performance.
 
-### 📂 Architecture of Apache Doris
 
-The overall architecture of Apache Doris is shown in the following figure. The 
Doris architecture is very simple, with only two types of processes.
+## Overall Architecture
 
-- Frontend (FE): user request access, query parsing and planning, metadata 
management, node management, etc.
+Apache Doris uses the MySQL protocol, is highly compatible with MySQL syntax, 
and supports standard SQL. Users can access Apache Doris through various client 
tools, and it seamlessly integrates with BI tools.
 
-- Backend (BE): data storage and query plan execution
+### Storage-Compute Integrated Architecture
 
-Both types of processes are horizontally scalable, and a single cluster can 
support up to hundreds of machines and tens of petabytes of storage capacity. 
And these two types of processes guarantee high availability of services and 
high reliability of data through consistency protocols. This highly integrated 
architecture design greatly reduces the operation and maintenance cost of a 
distributed system.
+The storage-compute integrated architecture of Apache Doris is streamlined and 
easy to maintain. As shown in the figure below, it consists of only two types 
of processes:
 
-<br />
+- **Frontend (FE):** Primarily responsible for handling user requests, query 
parsing and planning, metadata management, and node management tasks.
+
+- **Backend (BE):** Primarily responsible for data storage and query 
execution. Data is partitioned into shards and stored with multiple replicas 
across BE nodes.
 
 ![The overall architecture of Apache 
Doris](https://cdn.selectdb.com/static/What_is_Apache_Doris_adb26397e2.png)
 
 <br />
 
-In terms of interfaces, Apache Doris adopts MySQL protocol, supports standard 
SQL, and is highly compatible with MySQL dialect. Users can access Doris 
through various client tools and it supports seamless connection with BI tools.
+In a production environment, multiple FE nodes can be deployed for disaster 
recovery. Each FE node maintains a full copy of the metadata. The FE nodes are 
divided into three roles:
+
+| Role      | Function                                                     |
+| --------- | ------------------------------------------------------------ |
+| Master    | The FE Master node is responsible for metadata read and write 
operations. When metadata changes occur in the Master, they are synchronized to 
Follower or Observer nodes via the BDB JE protocol. |
+| Follower  | The Follower node is responsible for reading metadata. If the 
Master node fails, a Follower node can be selected as the new Master. |
+| Observer  | The Observer node is responsible for reading metadata and is 
mainly used to increase query concurrency. It does not participate in cluster 
leadership elections. |
+
+Both FE and BE processes are horizontally scalable, enabling a single cluster 
to support hundreds of machines and tens of petabytes of storage capacity. The 
FE and BE processes use a consistency protocol to ensure high availability of 
services and high reliability of data. The storage-compute integrated 
architecture is highly integrated, significantly reducing the operational 
complexity of distributed systems.
+
+
+## Core Features of Apache Doris
+
+- **High Availability**: In Apache Doris, both metadata and data are stored 
with multiple replicas, synchronizing data logs via the quorum protocol. Data 
write is considered successful once a majority of replicas have completed the 
write, ensuring that the cluster remains available even if a few nodes fail. 
Apache Doris supports both same-city and cross-region disaster recovery, 
enabling dual-cluster master-slave modes. When some nodes experience failures, 
the cluster can automatically i [...]
+
+- **High Compatibility**: Apache Doris is highly compatible with the MySQL 
protocol and supports standard SQL syntax, covering most MySQL and Hive 
functions. This high compatibility allows users to seamlessly migrate and 
integrate existing applications and tools. Apache Doris supports the MySQL 
ecosystem, enabling users to connect Doris using MySQL Client tools for more 
convenient operations and maintenance. It also supports MySQL protocol 
compatibility for BI reporting tools and data tr [...]
+
+- **Real-Time Data Warehouse**: Based on Apache Doris, a real-time data 
warehouse service can be built. Apache Doris offers second-level data ingestion 
capabilities, capturing incremental changes from upstream online transactional 
databases into Doris within seconds. Leveraging vectorized engines, MPP 
architecture, and Pipeline execution engines, Doris provides sub-second data 
query capabilities, thereby constructing a high-performance, low-latency 
real-time data warehouse platform.
 
-### 💾 Storage Engine
+- **Unified Lakehouse**: Apache Doris can build a unified lakehouse 
architecture based on external data sources such as data lakes or relational 
databases. The Doris unified lakehouse solution enables seamless integration 
and free data flow between data lakes and data warehouses, helping users 
directly utilize data warehouse capabilities to solve data analysis problems in 
data lakes while fully leveraging data lake data management capabilities to 
enhance data value.
 
-Doris uses a columnar storage engine, which encodes, compresses, and reads 
data by column. This enables a very high compression ratio and largely reduces 
irrelavant data scans, thus making more efficient use of IO and CPU resources. 
Doris supports various index structures to minimize data scans:
+- **Flexible Modeling**: Apache Doris offers various modeling approaches, such 
as wide table models, pre-aggregation models, star/snowflake schemas, etc. 
During data import, data can be flattened into wide tables and written into 
Doris through compute engines like Flink or Spark, or data can be directly 
imported into Doris, performing data modeling operations through views, 
materialized views, or real-time multi-table joins.
 
-- Sorted Compound Key Index: Users can specify three columns at most to form a 
compound sort key. This can effectively prune data to better support highly 
concurrent reporting scenarios.
-- MIN/MAX Indexing: This enables effective filtering of equivalence and range 
queries for numeric types.
-- Bloom Filter: very effective in equivalence filtering and pruning of high 
cardinality columns
-- Invert Index: This enables fast search for any field.
+## Technical overview
 
+Doris provides an efficient SQL interface and is fully compatible with the 
MySQL protocol. Its query engine is based on an MPP (Massively Parallel 
Processing) architecture, capable of efficiently executing complex analytical 
queries and achieving low-latency real-time queries. Through columnar storage 
technology for data encoding and compression, it significantly optimizes query 
performance and storage compression ratio.
 
-### 💿 Storage Models
+### Interface
 
-Doris supports a variety of storage models and has optimized them for 
different scenarios:
+Apache Doris adopts the MySQL protocol, supports standard SQL, and is highly 
compatible with MySQL syntax. Users can access Apache Doris through various 
client tools and seamlessly integrate it with BI tools, including but not 
limited to Smartbi, DataEase, FineBI, Tableau, Power BI, and Apache Superset. 
Apache Doris can work as the data source for any BI tools that support the 
MySQL protocol.
 
-- Aggregate Key Model: able to merge the value columns with the same keys and 
significantly improve performance
+### Storage engine
 
-- Unique Key Model: Keys are unique in this model and data with the same key 
will be overwritten to achieve row-level data updates.
+Apache Doris has a columnar storage engine, which encodes, compresses, and 
reads data by column. This enables a very high data compression ratio and 
largely reduces unnecessary data scanning, thus making more efficient use of IO 
and CPU resources.
 
-- Duplicate Key Model: This is a detailed data model capable of detailed 
storage of fact tables.
+Apache Doris supports various index structures to minimize data scans:
 
-Doris also supports strongly consistent materialized views. Materialized views 
are automatically selected and updated, which greatly reduces maintenance costs 
for users.
+- **Sorted Compound Key Index**: Users can specify three columns at most to 
form a compound sort key. This can effectively prune data to better support 
highly concurrent reporting scenarios.
+
+- **Min/Max Index**: This enables effective data filtering in equivalence and 
range queries of numeric types.
+
+- **BloomFilter Index**: This is very effective in equivalence filtering and 
pruning of high-cardinality columns.
+
+- **Inverted Index**: This enables fast searching for any field.
+
+Apache Doris supports a variety of data models and has optimized them for 
different scenarios:
+
+- **Detail Model (Duplicate Key Model):** A detail data model designed to meet 
the detailed storage requirements of fact tables.
+
+- **Primary Key Model (Unique Key Model):** Ensures unique keys; data with the 
same key is overwritten, enabling row-level data updates.
+
+- **Aggregate Model (Aggregate Key Model):** Merges value columns with the 
same key, significantly improving performance through pre-aggregation.
+
+Apache Doris also supports strongly consistent single-table materialized views 
and asynchronously refreshed multi-table materialized views. Single-table 
materialized views are automatically refreshed and maintained by the system, 
requiring no manual intervention from users. Multi-table materialized views can 
be refreshed periodically using in-cluster scheduling or external scheduling 
tools, reducing the complexity of data modeling.
 
 ### 🔍 Query Engine
 
-Doris adopts the MPP model in its query engine to realize parallel execution 
between and within nodes. It also supports distributed shuffle join for 
multiple large tables so as to handle complex queries.
+Apache Doris has an MPP-based query engine for parallel execution between and 
within nodes. It supports distributed shuffle join for large tables to better 
handle complicated queries.
 
 <br />
 
@@ -139,7 +178,7 @@ Doris adopts the MPP model in its query engine to realize 
parallel execution bet
 
 <br />
 
-The Doris query engine is vectorized, with all memory structures laid out in a 
columnar format. This can largely reduce virtual function calls, improve cache 
hit rates, and make efficient use of SIMD instructions. Doris delivers a 5–10 
times higher performance in wide table aggregation scenarios than 
non-vectorized engines.
+The query engine of Apache Doris is fully vectorized, with all memory 
structures laid out in a columnar format. This can largely reduce virtual 
function calls, increase cache hit rates, and make efficient use of SIMD 
instructions. Apache Doris delivers a 5~10 times higher performance in wide 
table aggregation scenarios than non-vectorized engines.
 
 <br />
 
@@ -147,14 +186,12 @@ The Doris query engine is vectorized, with all memory 
structures laid out in a c
 
 <br />
 
-Apache Doris uses Adaptive Query Execution technology to dynamically adjust 
the execution plan based on runtime statistics. For example, it can generate 
runtime filter, push it to the probe side, and automatically penetrate it to 
the Scan node at the bottom, which drastically reduces the amount of data in 
the probe and increases join performance. The runtime filter in Doris supports 
In/Min/Max/Bloom filter.
-
-### 🚅 Query Optimizer
+Apache Doris uses adaptive query execution technology to dynamically adjust 
the execution plan based on runtime statistics. For example, it can generate a 
runtime filter and push it to the probe side. Specifically, it pushes the 
filters to the lowest-level scan node on the probe side, which largely reduces 
the data amount to be processed and increases join performance. The runtime 
filter of Apache Doris supports In/Min/Max/Bloom Filter.
 
-In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports 
constant folding, subquery rewriting, predicate pushdown and CBO supports Join 
Reorder. The Doris CBO is under continuous optimization for more accurate 
statistical information collection and derivation, and more accurate cost model 
prediction.
+Apache Doris uses a Pipeline execution engine that breaks down queries into 
multiple sub-tasks for parallel execution, fully leveraging multi-core CPU 
capabilities. It simultaneously addresses the thread explosion problem by 
limiting the number of query threads. The Pipeline execution engine reduces 
data copying and sharing, optimizes sorting and aggregation operations, thereby 
significantly improving query efficiency and throughput.
 
+In terms of the optimizer, Apache Doris employs a combined optimization 
strategy of CBO (Cost-Based Optimizer), RBO (Rule-Based Optimizer), and HBO 
(History-Based Optimizer). RBO supports constant folding, subquery rewriting, 
predicate pushdown, and more. CBO supports join reordering and other 
optimizations. HBO recommends the optimal execution plan based on historical 
query information. These multiple optimization measures ensure that Doris can 
enumerate high-performance query plans acr [...]
 
-**Technical Overview**: 🔗[Introduction to Apache 
Doris](https://doris.apache.org/docs/dev/summary/basic-summary)
 
 ## 🎆 Why choose Apache Doris?
 
@@ -190,7 +227,7 @@ Add your company logo at Apache Doris Website: 🔗[Add Your 
Company](https://gi
 
 ### 📚 Docs
 
-All Documentation   
🔗[Docs](https://doris.apache.org/docs/get-starting/quick-start)  
+All Documentation   
🔗[Docs](https://doris.apache.org/docs/gettingStarted/what-is-apache-doris)  
 
 ### ⬇️ Download 
 
@@ -198,11 +235,11 @@ All release and binary version 
🔗[Download](https://doris.apache.org/download)
 
 ### 🗄️ Compile
 
-See how to compile  
🔗[Compilation](https://doris.apache.org/docs/dev/install/source-install/compilation-general)
+See how to compile  
🔗[Compilation](https://doris.apache.org/community/source-install/compilation-with-docker))
 
 ### 📮 Install
 
-See how to install and deploy 🔗[Installation and 
deployment](https://doris.apache.org/docs/dev/install/cluster-deployment/standard-deployment)
 
+See how to install and deploy 🔗[Installation and 
deployment](https://doris.apache.org/docs/install/preparation/env-checking) 
 
 ## 🧩 Components
 
@@ -248,7 +285,7 @@ Contact us through the following mailing list.
 
 * Apache Doris Official Website - [Site](https://doris.apache.org)
 * Developer Mailing list - <d...@doris.apache.org>. Mail to 
<dev-subscr...@doris.apache.org>, follow the reply to subscribe the mail list.
-* Slack channel - [Join the 
Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-28il1o2wk-DD6LsLOz3v4aD92Mu0S0aQ)
+* Slack channel - [Join the 
Slack](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw)
 * Twitter - [Follow @doris_apache](https://twitter.com/doris_apache)
 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

(doris) 01/01: Update README.md

Reply via email to