This is an automated email from the ASF dual-hosted git repository. yiguolei pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push: new 55e0c7e0fd2 [docs] improve decimal docs (#2491) 55e0c7e0fd2 is described below commit 55e0c7e0fd22a4e62afe75bd4d7c4d94645f7519 Author: TengJianPing <tengjianp...@selectdb.com> AuthorDate: Mon Jun 16 21:48:13 2025 +0800 [docs] improve decimal docs (#2491) --- .../sql-data-types/numeric/DECIMAL.md | 65 ++++++++++++---------- docs/table-design/data-type.md | 2 +- .../sql-data-types/numeric/DECIMAL.md | 45 ++++++++------- .../current/table-design/data-type.md | 2 +- .../sql-data-types/numeric/DECIMAL.md | 45 ++++++++------- .../version-2.1/table-design/data-type.md | 2 +- .../sql-data-types/numeric/DECIMAL.md | 45 ++++++++------- .../version-3.0/table-design/data-type.md | 2 +- .../sql-data-types/numeric/DECIMAL.md | 65 ++++++++++++---------- .../version-2.1/table-design/data-type.md | 14 ++--- .../sql-data-types/numeric/DECIMAL.md | 65 ++++++++++++---------- .../version-3.0/table-design/data-type.md | 14 ++--- 12 files changed, 198 insertions(+), 168 deletions(-) diff --git a/docs/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md b/docs/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md index 5be27fc1e69..54972f4f8ed 100644 --- a/docs/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md +++ b/docs/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md @@ -29,59 +29,64 @@ under the License. DECIMAL ### Description -DECIMAL (M [,D]) + DECIMAL(P[,S]) + High-precision fixed-point number, where P represents the total count of significant digits (precision), and S is the count of decimal digits in the fractional part, to the right of the decimal point. + The range of significant digits P is [1, MAX_P], where MAX_P=38 when enable_decimal256=false, and MAX_P=76 when enable_decimal256=true. + The range of decimal places S is [0, P]. -High-precision fixed-point number, M represents the total number of significant digits, and D represents the scale. + By default, precision is 38, and scale is 9(that is DECIMAL(38, 9)). -The range of M is [1, 38], and the range of D is [0, precision]. - -The default value is DECIMAL(9, 0). + The default value of enable_decimal256 is false. Setting it to true can get more accurate results, but it will bring some performance loss. ### Precision Deduction DECIMAL has a very complex set of type inference rules. For different expressions, different rules will be applied for precision inference. -#### Arithmetic Expressions +#### Arithmetic Operations + +Assuming e1(p1, s1) and e2(p2, s2) are two DECIMAL numbers, the precision deduction rules for operation results are as follows: + +|Operation|Result precision|Result scale|Result precision if overflow|Result scale if overflow|Intermediate e1 type|Intermediate e2 type| +|-------|------------|---------|-----------------|--------------|--------|------| +|e1 + e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| +|e1 - e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| +|e1 * e2|p1 + p2|s1 + s2|MAX_P|<ol><li>Integer part less than 32 bits: min(scale, 38 - (precision - scale))</li><li>Integer part greater than 32 bits, and decimal part less than 6 bits: s1 + s2</li><li>Integer part greater than 32 bits, decimal part greater than or equal to 6 bits: 6</li></ol>|Unchanged|Unchanged| +|e1 / e2|p1 + s2 + `div_precision_increment`|s1 + `div_precision_increment`|MAX_P|<ol><li>precision - s1 less than max_precision - `decimal_overflow_scale`: (max_precision - (precision - s1)) + `div_precision_increment`</li><li>precision - s1 greater than max_precision - `decimal_overflow_scale`, and s1 less than `decimal_overflow_scale`: s1 + `div_precision_increment`</li><li>precision - s1 greater than max_precision - `decimal_overflow_scale`, and s1 greater than or equal to `decimal_o [...] +|e1 % e2|max(p1 - s1,p2 - s2) + max(s1, s2)|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| + +`div_precision_increment` is a configuration parameter of FE, see [div_precision_increment](../../../../admin-manual/config/fe-config#div_precision_increment). -* Plus / Minus: DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(max(a - b, x - y) + max(b, y) + 1, max(b, y)). -* Multiply: DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(a + x, b + y). -* Divide: DECIMAL(p1, s1) + DECIMAL(p2, s2) -> DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment).div_precision_increment default 4. -It is worth noting that the process of division calculation is as follows: -DECIMAL(p1, s1) / DECIMAL(p2, s2) is first converted to DECIMAL(p1 + s2 + div_precision_increment, s1 + s2) / DECIMAL(p2, s2) and then the calculation is performed. Therefore, it is possible that DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment) satisfies the range of DECIMAL, -but due to the conversion to DECIMAL(p1 + s2 + div_precision_increment, s1 + s2), -it exceeds the range. Currently, Doris handles this by converting it to Double for calculation. -#### Aggregation functions +`decimal_overflow_scale` is a session variable of FE, which indicates the maximum number of decimal places that can be retained in the calculation result when the precision of the decimal value calculation result overflows. The default value is 6. -* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(38, b). -* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(38, max(b, 4)). +#### Aggregation Operations -#### Default rules +* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(MAX_P, b). +* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(MAX_P, max(b, 4)). + +#### Default Rules Except for the expressions mentioned above, other expressions use default rules for precision deduction. That is, for the expression `expr(DECIMAL(a, b))`, the result type is also DECIMAL(a, b). -#### Adjust the result precision +#### Adjusting Result Precision -Different users have different accuracy requirements for DECIMAL. The above rules are the default behavior of Doris. If users **have different accuracy requirements, they can adjust the accuracy in the following ways**: +Different users have different precision requirements for DECIMAL. The above rules are the default behavior of Doris. If users **have different precision requirements, they can adjust the precision in the following ways**: * If the expected result precision is greater than the default precision, you can adjust the result precision by adjusting the parameter's precision. For example, if the user expects to calculate `AVG(col)` and get DECIMAL(x, y) as the result, where the type of `col` is DECIMAL (a, b), the expression can be rewritten to `AVG(CAST(col as DECIMAL (x, y))`. * If the expected result precision is less than the default precision, the desired precision can be obtained by approximating the output result. For example, if the user expects to calculate `AVG(col)` and get DECIMAL(x, y) as the result, where the type of `col` is DECIMAL(a, b), the expression can be rewritten as `ROUND(AVG(col), y)`. -### Why DECIMAL is required +### Why DECIMAL is Required DECIMAL in Doris is a real high-precision fixed-point number. Decimal has the following core advantages: 1. It can represent a wider range. The value ranges of both precision and scale in DECIMAL have been significantly expanded. 2. Higher performance. The old version of DECIMAL requires 16 bytes in memory and 12 bytes in storage, while DECIMAL has made adaptive adjustments as shown below. -``` -+----------------------+------------------------------+ + | precision | Space occupied (memory/disk) | -+----------------------+------------------------------+ -| 0 < precision <= 9 | 4 bytes | -+----------------------+------------------------------+ -| 9 < precision <= 18 | 8 bytes | -+----------------------+------------------------------+ -| 18 < precision <= 38 | 16 bytes | -+----------------------+------------------------------+ -``` +|----------------------|-------------------| +| 0 < precision <= 9 | 4 bytes | +| 9 < precision <= 18 | 8 bytes | +| 18 < precision <= 38 | 16 bytes | +| 38 < precision <= 76 | 32 bytes | + 3. More complete precision deduction. For different expressions, different precision inference rules are applied to deduce the precision of the results. ### keywords diff --git a/docs/table-design/data-type.md b/docs/table-design/data-type.md index 830cc956f4a..baab27ce65b 100644 --- a/docs/table-design/data-type.md +++ b/docs/table-design/data-type.md @@ -40,7 +40,7 @@ The list of data types supported by Doris is as follows: | [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | Integer value, range is [-2^127 + 1 to 2^127 - 1]. | | [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | Single precision floating point number, range is [-3.4 * 10^38 to 3.4 * 10^38]. | | [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | Double precision floating point number, range is [-1.79 * 10^308 to 1.79 * 10^308]. | -| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16 | An exact fixed-point number defined by precision (total number of digits) and scale (number of digits to the right of the decimal point). Format: DECIMAL(M[,D]), where M is precision and D is scale. The range for M is [1, 38], and for D is [0, precision]. Storage requirements: - 4 bytes for 0 < precision <= 9, - 8 bytes for 9 < precision <= 18, - 16 bytes for 18 < precision <= 38. | +| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | An exact fixed-point number defined by precision (total number of digits) and scale (number of digits to the right of the decimal point). Format: DECIMAL(P[,S]), where P is precision and S is scale. The range for P is [1, MAX_P], where MAX_P=38 when `enable_decimal256`=false, and MAX_P=76 when `enable_decimal256`=true, and for S is [0, P]. <br>The default value of `enable_decimal256` is f [...] ## [Datetime data type](../sql-manual/basic-element/sql-data-types/data-type-overview#date-types) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md index 395c81209d8..4798494f3ff 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md @@ -29,11 +29,14 @@ under the License. DECIMAL ## 描述 - DECIMAL(M[,D]) - 高精度定点数,M 代表一共有多少个有效数字(precision),D 代表小数位有多少数字(scale), - 有效数字 M 的范围是 [1, 38],小数位数字数量 D 的范围是 [0, precision]。 + DECIMAL(P[,S]) + 高精度定点数,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。 + 有效数字 P 的范围是 [1, MAX_P],enable_decimal256=false时,MAX_P=38, enable_decimal256=true时,MAX_P=76。 + 小数位数字数量 S 的范围是 [0, P]。 - 默认值为 DECIMAL(9, 0)。 + P默认值是38,S默认是9(DECIMAL(38, 9))。 + + enable_decimal256 的默认值是false,设置为true 可以获得更加精确的结果,但是会带来一些性能损失。 ### 精度推演 @@ -41,19 +44,24 @@ DECIMAL 有一套很复杂的类型推演规则,针对不同的表达式,会 #### 四则运算 -* 加法 / 减法:DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(max(a - b, x - y) + max(b, y) + 1, max(b, y))。 -* 乘法:DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(a + x, b + y)。 -* 除法:DECIMAL(p1, s1) + DECIMAL(p2, s2) -> DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment)。div_precision_increment 默认为 4。 - 值得注意的是,除法计算的过程是 - DECIMAL(p1, s1) / DECIMAL(p2, s2) 先转换成 DECIMAL(p1 + s2 + div_precision_increment, s1 + s2 ) / DECIMAL(p2, s2) 然后再进行计算,所以可能会出现 - DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment) 是满足 DECIMAL 的范围,但是由于先转换成了 DECIMAL(p1 + s2 + div_precision_increment, s1 + s2 ) - 导致超出范围,目前 Doris 的处理是转成 Double 进行计算 +假定e1(p1, s1)和e2(p2, s2)是两个DECIMAL类型的数字,运算结果精度推演规则如下: + +|运算|结果precision|结果scale|溢出时结果precision|溢出时结果scale|中间结果e1类型|中间e2类型| +|-------|------------|---------|-----------------|--------------|--------|------| +|e1 + e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|按照结果cast|按照结果cast| +|e1 - e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2) |按照结果cast|按照结果cast| +|e1 * e2|p1 + p2|s1 + s2|MAX_P|<ol><li>整数部分小于32位:min(scale, 38 - (precision - scale))</li><li>整数部分大于32位,且小数部分小于6位:s1 + s2</li><li>整数部分大于32位,小数部分大于等于6位:6</li></ol>|不变|不变| +|e1 / e2|p1 + s2 + `div_precision_increment`|s1 + `div_precision_increment`|MAX_P|<ol><li>precision - s1小于max_precision - `decimal_overflow_scale`:(max_precision - (precision - s1))+ `div_precision_increment`</li><li>precision - s1大于max_precision - `decimal_overflow_scale`,且s1小于`decimal_overflow_scale`:s1 + `div_precision_increment`</li><li>precision - s1大于max_precision - `decimal_overflow_scale`,且s1大于等于`decimal_overflow_scale`:`decimal_overflow_scale` + `div_precision_increment`</li> [...] +|e1 % e2|max(p1 - s1,p2 - s2) + max(s1, s2) |max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2) |按照结果cast|按照结果cast| + +`div_precision_increment`是FE的配置参数,参见[div_precision_increment](../../../../admin-manual/config/fe-config#div_precision_increment)。 +`decimal_overflow_scale`是FE的session variable,表示当decimal数值计算结果精度溢出时,计算结果最多可保留的小数位数,默认值是6。 #### 聚合运算 -* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(38, b). -* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(38, max(b, 4)). +* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(MAX_P, b). +* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(MAX_P, max(b, 4)). #### 默认规则 @@ -70,17 +78,14 @@ DECIMAL 有一套很复杂的类型推演规则,针对不同的表达式,会 Doris 中的 DECIMAL 是真正意义上的高精度定点数,Decimal 有以下核心优势: 1. 可表示范围更大。DECIMAL 中 precision 和 scale 的取值范围都进行了明显扩充。 2. 性能更高。老版本的 DECIMAL 在内存中需要占用 16 bytes,在存储中占用 12 bytes,而 DECIMAL 进行了自适应调整(如下表格)。 -``` -+----------------------+-------------------+ + | precision | 占用空间(内存/磁盘)| -+----------------------+-------------------+ +|----------------------|-------------------| | 0 < precision <= 9 | 4 bytes | -+----------------------+-------------------+ | 9 < precision <= 18 | 8 bytes | -+----------------------+-------------------+ | 18 < precision <= 38 | 16 bytes | -+----------------------+-------------------+ -``` +| 38 < precision <= 76 | 32 bytes | + 3. 更完备的精度推演。对于不同的表达式,应用不同的精度推演规则对结果的精度进行推演。 ### keywords diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-type.md index eb4dd212f39..f1cf1140831 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-type.md @@ -39,7 +39,7 @@ Apache Doris 已支持的数据类型列表如下: | [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | 有符号整数,范围 [-2^127 + 1 ~ 2^127 - 1]。 | | [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | 浮点数,范围 [-3.4*10^38 ~ 3.4*10^38]。 | | [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | 浮点数,范围 [-1.79*10^308 ~ 1.79*10^308]。 | -| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16 | 高精度定点数,格式:DECIMAL(M[,D])。其中,M 代表一共有多少个有效数字(precision),D 代表小数位有多少数字(scale)。有效数字 M 的范围是 [1, 38],小数位数字数量 D 的范围是 [0, precision]。0 < precision <= 9 的场合,占用 4 字节。9 < precision <= 18 的场合,占用 8 字节。16 < precision <= 38 的场合,占用 16 字节。| +| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false时,MAX_P=38,`enable_decimal256`=true时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是false,设置为true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。<li>9 < precision <= 18 时,占用 8 字节。<li>16 < precision <= 38 时,占用 16 字节。<li>38 < precision <= 7 [...] ### [日期类型](../sql-manual/basic-element/sql-data-types/data-type-overview#日期类型) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md index 395c81209d8..4798494f3ff 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md @@ -29,11 +29,14 @@ under the License. DECIMAL ## 描述 - DECIMAL(M[,D]) - 高精度定点数,M 代表一共有多少个有效数字(precision),D 代表小数位有多少数字(scale), - 有效数字 M 的范围是 [1, 38],小数位数字数量 D 的范围是 [0, precision]。 + DECIMAL(P[,S]) + 高精度定点数,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。 + 有效数字 P 的范围是 [1, MAX_P],enable_decimal256=false时,MAX_P=38, enable_decimal256=true时,MAX_P=76。 + 小数位数字数量 S 的范围是 [0, P]。 - 默认值为 DECIMAL(9, 0)。 + P默认值是38,S默认是9(DECIMAL(38, 9))。 + + enable_decimal256 的默认值是false,设置为true 可以获得更加精确的结果,但是会带来一些性能损失。 ### 精度推演 @@ -41,19 +44,24 @@ DECIMAL 有一套很复杂的类型推演规则,针对不同的表达式,会 #### 四则运算 -* 加法 / 减法:DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(max(a - b, x - y) + max(b, y) + 1, max(b, y))。 -* 乘法:DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(a + x, b + y)。 -* 除法:DECIMAL(p1, s1) + DECIMAL(p2, s2) -> DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment)。div_precision_increment 默认为 4。 - 值得注意的是,除法计算的过程是 - DECIMAL(p1, s1) / DECIMAL(p2, s2) 先转换成 DECIMAL(p1 + s2 + div_precision_increment, s1 + s2 ) / DECIMAL(p2, s2) 然后再进行计算,所以可能会出现 - DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment) 是满足 DECIMAL 的范围,但是由于先转换成了 DECIMAL(p1 + s2 + div_precision_increment, s1 + s2 ) - 导致超出范围,目前 Doris 的处理是转成 Double 进行计算 +假定e1(p1, s1)和e2(p2, s2)是两个DECIMAL类型的数字,运算结果精度推演规则如下: + +|运算|结果precision|结果scale|溢出时结果precision|溢出时结果scale|中间结果e1类型|中间e2类型| +|-------|------------|---------|-----------------|--------------|--------|------| +|e1 + e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|按照结果cast|按照结果cast| +|e1 - e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2) |按照结果cast|按照结果cast| +|e1 * e2|p1 + p2|s1 + s2|MAX_P|<ol><li>整数部分小于32位:min(scale, 38 - (precision - scale))</li><li>整数部分大于32位,且小数部分小于6位:s1 + s2</li><li>整数部分大于32位,小数部分大于等于6位:6</li></ol>|不变|不变| +|e1 / e2|p1 + s2 + `div_precision_increment`|s1 + `div_precision_increment`|MAX_P|<ol><li>precision - s1小于max_precision - `decimal_overflow_scale`:(max_precision - (precision - s1))+ `div_precision_increment`</li><li>precision - s1大于max_precision - `decimal_overflow_scale`,且s1小于`decimal_overflow_scale`:s1 + `div_precision_increment`</li><li>precision - s1大于max_precision - `decimal_overflow_scale`,且s1大于等于`decimal_overflow_scale`:`decimal_overflow_scale` + `div_precision_increment`</li> [...] +|e1 % e2|max(p1 - s1,p2 - s2) + max(s1, s2) |max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2) |按照结果cast|按照结果cast| + +`div_precision_increment`是FE的配置参数,参见[div_precision_increment](../../../../admin-manual/config/fe-config#div_precision_increment)。 +`decimal_overflow_scale`是FE的session variable,表示当decimal数值计算结果精度溢出时,计算结果最多可保留的小数位数,默认值是6。 #### 聚合运算 -* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(38, b). -* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(38, max(b, 4)). +* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(MAX_P, b). +* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(MAX_P, max(b, 4)). #### 默认规则 @@ -70,17 +78,14 @@ DECIMAL 有一套很复杂的类型推演规则,针对不同的表达式,会 Doris 中的 DECIMAL 是真正意义上的高精度定点数,Decimal 有以下核心优势: 1. 可表示范围更大。DECIMAL 中 precision 和 scale 的取值范围都进行了明显扩充。 2. 性能更高。老版本的 DECIMAL 在内存中需要占用 16 bytes,在存储中占用 12 bytes,而 DECIMAL 进行了自适应调整(如下表格)。 -``` -+----------------------+-------------------+ + | precision | 占用空间(内存/磁盘)| -+----------------------+-------------------+ +|----------------------|-------------------| | 0 < precision <= 9 | 4 bytes | -+----------------------+-------------------+ | 9 < precision <= 18 | 8 bytes | -+----------------------+-------------------+ | 18 < precision <= 38 | 16 bytes | -+----------------------+-------------------+ -``` +| 38 < precision <= 76 | 32 bytes | + 3. 更完备的精度推演。对于不同的表达式,应用不同的精度推演规则对结果的精度进行推演。 ### keywords diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-type.md index eb4dd212f39..f1cf1140831 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/table-design/data-type.md @@ -39,7 +39,7 @@ Apache Doris 已支持的数据类型列表如下: | [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | 有符号整数,范围 [-2^127 + 1 ~ 2^127 - 1]。 | | [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | 浮点数,范围 [-3.4*10^38 ~ 3.4*10^38]。 | | [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | 浮点数,范围 [-1.79*10^308 ~ 1.79*10^308]。 | -| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16 | 高精度定点数,格式:DECIMAL(M[,D])。其中,M 代表一共有多少个有效数字(precision),D 代表小数位有多少数字(scale)。有效数字 M 的范围是 [1, 38],小数位数字数量 D 的范围是 [0, precision]。0 < precision <= 9 的场合,占用 4 字节。9 < precision <= 18 的场合,占用 8 字节。16 < precision <= 38 的场合,占用 16 字节。| +| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false时,MAX_P=38,`enable_decimal256`=true时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是false,设置为true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。<li>9 < precision <= 18 时,占用 8 字节。<li>16 < precision <= 38 时,占用 16 字节。<li>38 < precision <= 7 [...] ### [日期类型](../sql-manual/basic-element/sql-data-types/data-type-overview#日期类型) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md index 395c81209d8..4798494f3ff 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md @@ -29,11 +29,14 @@ under the License. DECIMAL ## 描述 - DECIMAL(M[,D]) - 高精度定点数,M 代表一共有多少个有效数字(precision),D 代表小数位有多少数字(scale), - 有效数字 M 的范围是 [1, 38],小数位数字数量 D 的范围是 [0, precision]。 + DECIMAL(P[,S]) + 高精度定点数,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。 + 有效数字 P 的范围是 [1, MAX_P],enable_decimal256=false时,MAX_P=38, enable_decimal256=true时,MAX_P=76。 + 小数位数字数量 S 的范围是 [0, P]。 - 默认值为 DECIMAL(9, 0)。 + P默认值是38,S默认是9(DECIMAL(38, 9))。 + + enable_decimal256 的默认值是false,设置为true 可以获得更加精确的结果,但是会带来一些性能损失。 ### 精度推演 @@ -41,19 +44,24 @@ DECIMAL 有一套很复杂的类型推演规则,针对不同的表达式,会 #### 四则运算 -* 加法 / 减法:DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(max(a - b, x - y) + max(b, y) + 1, max(b, y))。 -* 乘法:DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(a + x, b + y)。 -* 除法:DECIMAL(p1, s1) + DECIMAL(p2, s2) -> DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment)。div_precision_increment 默认为 4。 - 值得注意的是,除法计算的过程是 - DECIMAL(p1, s1) / DECIMAL(p2, s2) 先转换成 DECIMAL(p1 + s2 + div_precision_increment, s1 + s2 ) / DECIMAL(p2, s2) 然后再进行计算,所以可能会出现 - DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment) 是满足 DECIMAL 的范围,但是由于先转换成了 DECIMAL(p1 + s2 + div_precision_increment, s1 + s2 ) - 导致超出范围,目前 Doris 的处理是转成 Double 进行计算 +假定e1(p1, s1)和e2(p2, s2)是两个DECIMAL类型的数字,运算结果精度推演规则如下: + +|运算|结果precision|结果scale|溢出时结果precision|溢出时结果scale|中间结果e1类型|中间e2类型| +|-------|------------|---------|-----------------|--------------|--------|------| +|e1 + e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|按照结果cast|按照结果cast| +|e1 - e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2) |按照结果cast|按照结果cast| +|e1 * e2|p1 + p2|s1 + s2|MAX_P|<ol><li>整数部分小于32位:min(scale, 38 - (precision - scale))</li><li>整数部分大于32位,且小数部分小于6位:s1 + s2</li><li>整数部分大于32位,小数部分大于等于6位:6</li></ol>|不变|不变| +|e1 / e2|p1 + s2 + `div_precision_increment`|s1 + `div_precision_increment`|MAX_P|<ol><li>precision - s1小于max_precision - `decimal_overflow_scale`:(max_precision - (precision - s1))+ `div_precision_increment`</li><li>precision - s1大于max_precision - `decimal_overflow_scale`,且s1小于`decimal_overflow_scale`:s1 + `div_precision_increment`</li><li>precision - s1大于max_precision - `decimal_overflow_scale`,且s1大于等于`decimal_overflow_scale`:`decimal_overflow_scale` + `div_precision_increment`</li> [...] +|e1 % e2|max(p1 - s1,p2 - s2) + max(s1, s2) |max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2) |按照结果cast|按照结果cast| + +`div_precision_increment`是FE的配置参数,参见[div_precision_increment](../../../../admin-manual/config/fe-config#div_precision_increment)。 +`decimal_overflow_scale`是FE的session variable,表示当decimal数值计算结果精度溢出时,计算结果最多可保留的小数位数,默认值是6。 #### 聚合运算 -* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(38, b). -* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(38, max(b, 4)). +* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(MAX_P, b). +* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(MAX_P, max(b, 4)). #### 默认规则 @@ -70,17 +78,14 @@ DECIMAL 有一套很复杂的类型推演规则,针对不同的表达式,会 Doris 中的 DECIMAL 是真正意义上的高精度定点数,Decimal 有以下核心优势: 1. 可表示范围更大。DECIMAL 中 precision 和 scale 的取值范围都进行了明显扩充。 2. 性能更高。老版本的 DECIMAL 在内存中需要占用 16 bytes,在存储中占用 12 bytes,而 DECIMAL 进行了自适应调整(如下表格)。 -``` -+----------------------+-------------------+ + | precision | 占用空间(内存/磁盘)| -+----------------------+-------------------+ +|----------------------|-------------------| | 0 < precision <= 9 | 4 bytes | -+----------------------+-------------------+ | 9 < precision <= 18 | 8 bytes | -+----------------------+-------------------+ | 18 < precision <= 38 | 16 bytes | -+----------------------+-------------------+ -``` +| 38 < precision <= 76 | 32 bytes | + 3. 更完备的精度推演。对于不同的表达式,应用不同的精度推演规则对结果的精度进行推演。 ### keywords diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-type.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-type.md index eb4dd212f39..f1cf1140831 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-type.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/table-design/data-type.md @@ -39,7 +39,7 @@ Apache Doris 已支持的数据类型列表如下: | [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | 有符号整数,范围 [-2^127 + 1 ~ 2^127 - 1]。 | | [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | 浮点数,范围 [-3.4*10^38 ~ 3.4*10^38]。 | | [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | 浮点数,范围 [-1.79*10^308 ~ 1.79*10^308]。 | -| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16 | 高精度定点数,格式:DECIMAL(M[,D])。其中,M 代表一共有多少个有效数字(precision),D 代表小数位有多少数字(scale)。有效数字 M 的范围是 [1, 38],小数位数字数量 D 的范围是 [0, precision]。0 < precision <= 9 的场合,占用 4 字节。9 < precision <= 18 的场合,占用 8 字节。16 < precision <= 38 的场合,占用 16 字节。| +| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | 高精度定点数,格式:DECIMAL(P[,S])。其中,P 代表一共有多少个有效数字(precision),S 代表小数位有多少数字(scale)。有效数字 P 的范围是 [1, MAX_P],`enable_decimal256`=false时,MAX_P=38,`enable_decimal256`=true时,MAX_P=76。小数位数字数量 S 的范围是 [0, P]。<br>`enable_decimal256` 的默认值是false,设置为true 可以获得更加精确的结果,但是会带来一些性能损失。<br>存储空间:<ul><li>0 < precision <= 9 时,占用 4 字节。<li>9 < precision <= 18 时,占用 8 字节。<li>16 < precision <= 38 时,占用 16 字节。<li>38 < precision <= 7 [...] ### [日期类型](../sql-manual/basic-element/sql-data-types/data-type-overview#日期类型) diff --git a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md index 5be27fc1e69..54972f4f8ed 100644 --- a/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md +++ b/versioned_docs/version-2.1/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md @@ -29,59 +29,64 @@ under the License. DECIMAL ### Description -DECIMAL (M [,D]) + DECIMAL(P[,S]) + High-precision fixed-point number, where P represents the total count of significant digits (precision), and S is the count of decimal digits in the fractional part, to the right of the decimal point. + The range of significant digits P is [1, MAX_P], where MAX_P=38 when enable_decimal256=false, and MAX_P=76 when enable_decimal256=true. + The range of decimal places S is [0, P]. -High-precision fixed-point number, M represents the total number of significant digits, and D represents the scale. + By default, precision is 38, and scale is 9(that is DECIMAL(38, 9)). -The range of M is [1, 38], and the range of D is [0, precision]. - -The default value is DECIMAL(9, 0). + The default value of enable_decimal256 is false. Setting it to true can get more accurate results, but it will bring some performance loss. ### Precision Deduction DECIMAL has a very complex set of type inference rules. For different expressions, different rules will be applied for precision inference. -#### Arithmetic Expressions +#### Arithmetic Operations + +Assuming e1(p1, s1) and e2(p2, s2) are two DECIMAL numbers, the precision deduction rules for operation results are as follows: + +|Operation|Result precision|Result scale|Result precision if overflow|Result scale if overflow|Intermediate e1 type|Intermediate e2 type| +|-------|------------|---------|-----------------|--------------|--------|------| +|e1 + e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| +|e1 - e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| +|e1 * e2|p1 + p2|s1 + s2|MAX_P|<ol><li>Integer part less than 32 bits: min(scale, 38 - (precision - scale))</li><li>Integer part greater than 32 bits, and decimal part less than 6 bits: s1 + s2</li><li>Integer part greater than 32 bits, decimal part greater than or equal to 6 bits: 6</li></ol>|Unchanged|Unchanged| +|e1 / e2|p1 + s2 + `div_precision_increment`|s1 + `div_precision_increment`|MAX_P|<ol><li>precision - s1 less than max_precision - `decimal_overflow_scale`: (max_precision - (precision - s1)) + `div_precision_increment`</li><li>precision - s1 greater than max_precision - `decimal_overflow_scale`, and s1 less than `decimal_overflow_scale`: s1 + `div_precision_increment`</li><li>precision - s1 greater than max_precision - `decimal_overflow_scale`, and s1 greater than or equal to `decimal_o [...] +|e1 % e2|max(p1 - s1,p2 - s2) + max(s1, s2)|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| + +`div_precision_increment` is a configuration parameter of FE, see [div_precision_increment](../../../../admin-manual/config/fe-config#div_precision_increment). -* Plus / Minus: DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(max(a - b, x - y) + max(b, y) + 1, max(b, y)). -* Multiply: DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(a + x, b + y). -* Divide: DECIMAL(p1, s1) + DECIMAL(p2, s2) -> DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment).div_precision_increment default 4. -It is worth noting that the process of division calculation is as follows: -DECIMAL(p1, s1) / DECIMAL(p2, s2) is first converted to DECIMAL(p1 + s2 + div_precision_increment, s1 + s2) / DECIMAL(p2, s2) and then the calculation is performed. Therefore, it is possible that DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment) satisfies the range of DECIMAL, -but due to the conversion to DECIMAL(p1 + s2 + div_precision_increment, s1 + s2), -it exceeds the range. Currently, Doris handles this by converting it to Double for calculation. -#### Aggregation functions +`decimal_overflow_scale` is a session variable of FE, which indicates the maximum number of decimal places that can be retained in the calculation result when the precision of the decimal value calculation result overflows. The default value is 6. -* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(38, b). -* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(38, max(b, 4)). +#### Aggregation Operations -#### Default rules +* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(MAX_P, b). +* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(MAX_P, max(b, 4)). + +#### Default Rules Except for the expressions mentioned above, other expressions use default rules for precision deduction. That is, for the expression `expr(DECIMAL(a, b))`, the result type is also DECIMAL(a, b). -#### Adjust the result precision +#### Adjusting Result Precision -Different users have different accuracy requirements for DECIMAL. The above rules are the default behavior of Doris. If users **have different accuracy requirements, they can adjust the accuracy in the following ways**: +Different users have different precision requirements for DECIMAL. The above rules are the default behavior of Doris. If users **have different precision requirements, they can adjust the precision in the following ways**: * If the expected result precision is greater than the default precision, you can adjust the result precision by adjusting the parameter's precision. For example, if the user expects to calculate `AVG(col)` and get DECIMAL(x, y) as the result, where the type of `col` is DECIMAL (a, b), the expression can be rewritten to `AVG(CAST(col as DECIMAL (x, y))`. * If the expected result precision is less than the default precision, the desired precision can be obtained by approximating the output result. For example, if the user expects to calculate `AVG(col)` and get DECIMAL(x, y) as the result, where the type of `col` is DECIMAL(a, b), the expression can be rewritten as `ROUND(AVG(col), y)`. -### Why DECIMAL is required +### Why DECIMAL is Required DECIMAL in Doris is a real high-precision fixed-point number. Decimal has the following core advantages: 1. It can represent a wider range. The value ranges of both precision and scale in DECIMAL have been significantly expanded. 2. Higher performance. The old version of DECIMAL requires 16 bytes in memory and 12 bytes in storage, while DECIMAL has made adaptive adjustments as shown below. -``` -+----------------------+------------------------------+ + | precision | Space occupied (memory/disk) | -+----------------------+------------------------------+ -| 0 < precision <= 9 | 4 bytes | -+----------------------+------------------------------+ -| 9 < precision <= 18 | 8 bytes | -+----------------------+------------------------------+ -| 18 < precision <= 38 | 16 bytes | -+----------------------+------------------------------+ -``` +|----------------------|-------------------| +| 0 < precision <= 9 | 4 bytes | +| 9 < precision <= 18 | 8 bytes | +| 18 < precision <= 38 | 16 bytes | +| 38 < precision <= 76 | 32 bytes | + 3. More complete precision deduction. For different expressions, different precision inference rules are applied to deduce the precision of the results. ### keywords diff --git a/versioned_docs/version-2.1/table-design/data-type.md b/versioned_docs/version-2.1/table-design/data-type.md index 8ba8a5b7fcb..baab27ce65b 100644 --- a/versioned_docs/version-2.1/table-design/data-type.md +++ b/versioned_docs/version-2.1/table-design/data-type.md @@ -40,7 +40,7 @@ The list of data types supported by Doris is as follows: | [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | Integer value, range is [-2^127 + 1 to 2^127 - 1]. | | [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | Single precision floating point number, range is [-3.4 * 10^38 to 3.4 * 10^38]. | | [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | Double precision floating point number, range is [-1.79 * 10^308 to 1.79 * 10^308]. | -| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16 | An exact fixed-point number defined by precision (total number of digits) and scale (number of digits to the right of the decimal point). Format: DECIMAL(M[,D]), where M is precision and D is scale. The range for M is [1, 38], and for D is [0, precision]. Storage requirements: - 4 bytes for 0 < precision <= 9, - 8 bytes for 9 < precision <= 18, - 16 bytes for 18 < precision <= 38. | +| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | An exact fixed-point number defined by precision (total number of digits) and scale (number of digits to the right of the decimal point). Format: DECIMAL(P[,S]), where P is precision and S is scale. The range for P is [1, MAX_P], where MAX_P=38 when `enable_decimal256`=false, and MAX_P=76 when `enable_decimal256`=true, and for S is [0, P]. <br>The default value of `enable_decimal256` is f [...] ## [Datetime data type](../sql-manual/basic-element/sql-data-types/data-type-overview#date-types) @@ -68,12 +68,12 @@ The list of data types supported by Doris is as follows: ## [Aggregation data type](../sql-manual/basic-element/sql-data-types/data-type-overview#aggregation-types) -| Type name | Storeage (bytes)| Description | -|-------------------------------------------------------------------------| --------------- | ------------------------------------------------------------ | -| [HLL](../sql-manual/basic-element/sql-data-types/aggregate/HLL) | Variable Length | HLL stands for HyperLogLog, is a fuzzy deduplication. It performs better than Count Distinct when dealing with large datasets. The error rate of HLL is typically around 1%, and sometimes it can reach 2%. HLL cannot be used as a key column, and the aggregation type is HLL_UNION when creating a table. Users do not need to specify the length or default value as it is internally con [...] -| [BITMAP](../sql-manual/basic-element/sql-data-types/aggregate/BITMAP) | Variable Length | BITMAP type can be used in Aggregate tables, Unique tables or Duplicate tables. - When used in a Unique table or a Duplicate table, BITMAP must be employed as non-key columns. - When used in an Aggregate table, BITMAP must also serve as non-key columns, and the aggregation type must be set to BITMAP_UNION during table creation. Users do not need to specify the length or default [...] -| [QUANTILE_STATE](../sql-manual/basic-element/sql-data-types/aggregate/QUANTILE-STATE) | Variable Length | A type used to calculate approximate quantile values. When loading, it performs pre-aggregation for the same keys with different values. When the number of values does not exceed 2048, it records all data in detail. When the number of values is greater than 2048, it employs the TDigest algorithm to aggregate (cluster) the data and store the centroid points after clustering. QUAN [...] -| [AGG_STATE](../sql-manual/basic-element/sql-data-types/aggregate/AGG-STATE) | Variable Length | Aggregate function can only be used with state/merge/union function combiners. AGG_STATE cannot be used as a key column. When creating a table, the signature of the aggregate function needs to be declared alongside. Users do not need to specify the length or default value. The actual data storage size depends on the function's implementation. | +| Type name | Storeage (bytes)| Description | +| -------------- | --------------- | ------------------------------------------------------------ | +| [HLL](../sql-manual/basic-element/sql-data-types/aggregate/HLL) | Variable Length | HLL stands for HyperLogLog, is a fuzzy deduplication. It performs better than Count Distinct when dealing with large datasets. The error rate of HLL is typically around 1%, and sometimes it can reach 2%. HLL cannot be used as a key column, and the aggregation type is HLL_UNION when creating a table. Users do not need to specify the length or default value as it is internally controlled bas [...] +| [BITMAP](../sql-manual/basic-element/sql-data-types/aggregate/BITMAP) | Variable Length | BITMAP type can be used in Aggregate tables, Unique tables or Duplicate tables. - When used in a Unique table or a Duplicate table, BITMAP must be employed as non-key columns. - When used in an Aggregate table, BITMAP must also serve as non-key columns, and the aggregation type must be set to BITMAP_UNION during table creation. Users do not need to specify the length or default value as [...] +| [QUANTILE_STATE](../sql-manual/sql-data-types/aggregate/QUANTILE_STATE) | Variable Length | A type used to calculate approximate quantile values. When loading, it performs pre-aggregation for the same keys with different values. When the number of values does not exceed 2048, it records all data in detail. When the number of values is greater than 2048, it employs the TDigest algorithm to aggregate (cluster) the data and store the centroid points after clustering. QUANTILE_STATE can [...] +| [AGG_STATE](../sql-manual/sql-data-types/aggregate/AGG_STATE) | Variable Length | Aggregate function can only be used with state/merge/union function combiners. AGG_STATE cannot be used as a key column. When creating a table, the signature of the aggregate function needs to be declared alongside. Users do not need to specify the length or default value. The actual data storage size depends on the function's implementation. | ## [IP types](../sql-manual/basic-element/sql-data-types/data-type-overview#ip-types) diff --git a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md index 5be27fc1e69..54972f4f8ed 100644 --- a/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md +++ b/versioned_docs/version-3.0/sql-manual/basic-element/sql-data-types/numeric/DECIMAL.md @@ -29,59 +29,64 @@ under the License. DECIMAL ### Description -DECIMAL (M [,D]) + DECIMAL(P[,S]) + High-precision fixed-point number, where P represents the total count of significant digits (precision), and S is the count of decimal digits in the fractional part, to the right of the decimal point. + The range of significant digits P is [1, MAX_P], where MAX_P=38 when enable_decimal256=false, and MAX_P=76 when enable_decimal256=true. + The range of decimal places S is [0, P]. -High-precision fixed-point number, M represents the total number of significant digits, and D represents the scale. + By default, precision is 38, and scale is 9(that is DECIMAL(38, 9)). -The range of M is [1, 38], and the range of D is [0, precision]. - -The default value is DECIMAL(9, 0). + The default value of enable_decimal256 is false. Setting it to true can get more accurate results, but it will bring some performance loss. ### Precision Deduction DECIMAL has a very complex set of type inference rules. For different expressions, different rules will be applied for precision inference. -#### Arithmetic Expressions +#### Arithmetic Operations + +Assuming e1(p1, s1) and e2(p2, s2) are two DECIMAL numbers, the precision deduction rules for operation results are as follows: + +|Operation|Result precision|Result scale|Result precision if overflow|Result scale if overflow|Intermediate e1 type|Intermediate e2 type| +|-------|------------|---------|-----------------|--------------|--------|------| +|e1 + e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| +|e1 - e2|max(p1 - s1,p2 - s2) + max(s1, s2) + 1|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| +|e1 * e2|p1 + p2|s1 + s2|MAX_P|<ol><li>Integer part less than 32 bits: min(scale, 38 - (precision - scale))</li><li>Integer part greater than 32 bits, and decimal part less than 6 bits: s1 + s2</li><li>Integer part greater than 32 bits, decimal part greater than or equal to 6 bits: 6</li></ol>|Unchanged|Unchanged| +|e1 / e2|p1 + s2 + `div_precision_increment`|s1 + `div_precision_increment`|MAX_P|<ol><li>precision - s1 less than max_precision - `decimal_overflow_scale`: (max_precision - (precision - s1)) + `div_precision_increment`</li><li>precision - s1 greater than max_precision - `decimal_overflow_scale`, and s1 less than `decimal_overflow_scale`: s1 + `div_precision_increment`</li><li>precision - s1 greater than max_precision - `decimal_overflow_scale`, and s1 greater than or equal to `decimal_o [...] +|e1 % e2|max(p1 - s1,p2 - s2) + max(s1, s2)|max(s1, s2)|MAX_P|min(MAX_P, p) - max(p1 - s1,p2 - s2)|Cast according to result|Cast according to result| + +`div_precision_increment` is a configuration parameter of FE, see [div_precision_increment](../../../../admin-manual/config/fe-config#div_precision_increment). -* Plus / Minus: DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(max(a - b, x - y) + max(b, y) + 1, max(b, y)). -* Multiply: DECIMAL(a, b) + DECIMAL(x, y) -> DECIMAL(a + x, b + y). -* Divide: DECIMAL(p1, s1) + DECIMAL(p2, s2) -> DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment).div_precision_increment default 4. -It is worth noting that the process of division calculation is as follows: -DECIMAL(p1, s1) / DECIMAL(p2, s2) is first converted to DECIMAL(p1 + s2 + div_precision_increment, s1 + s2) / DECIMAL(p2, s2) and then the calculation is performed. Therefore, it is possible that DECIMAL(p1 + s2 + div_precision_increment, s1 + div_precision_increment) satisfies the range of DECIMAL, -but due to the conversion to DECIMAL(p1 + s2 + div_precision_increment, s1 + s2), -it exceeds the range. Currently, Doris handles this by converting it to Double for calculation. -#### Aggregation functions +`decimal_overflow_scale` is a session variable of FE, which indicates the maximum number of decimal places that can be retained in the calculation result when the precision of the decimal value calculation result overflows. The default value is 6. -* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(38, b). -* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(38, max(b, 4)). +#### Aggregation Operations -#### Default rules +* SUM / MULTI_DISTINCT_SUM: SUM(DECIMAL(a, b)) -> DECIMAL(MAX_P, b). +* AVG: AVG(DECIMAL(a, b)) -> DECIMAL(MAX_P, max(b, 4)). + +#### Default Rules Except for the expressions mentioned above, other expressions use default rules for precision deduction. That is, for the expression `expr(DECIMAL(a, b))`, the result type is also DECIMAL(a, b). -#### Adjust the result precision +#### Adjusting Result Precision -Different users have different accuracy requirements for DECIMAL. The above rules are the default behavior of Doris. If users **have different accuracy requirements, they can adjust the accuracy in the following ways**: +Different users have different precision requirements for DECIMAL. The above rules are the default behavior of Doris. If users **have different precision requirements, they can adjust the precision in the following ways**: * If the expected result precision is greater than the default precision, you can adjust the result precision by adjusting the parameter's precision. For example, if the user expects to calculate `AVG(col)` and get DECIMAL(x, y) as the result, where the type of `col` is DECIMAL (a, b), the expression can be rewritten to `AVG(CAST(col as DECIMAL (x, y))`. * If the expected result precision is less than the default precision, the desired precision can be obtained by approximating the output result. For example, if the user expects to calculate `AVG(col)` and get DECIMAL(x, y) as the result, where the type of `col` is DECIMAL(a, b), the expression can be rewritten as `ROUND(AVG(col), y)`. -### Why DECIMAL is required +### Why DECIMAL is Required DECIMAL in Doris is a real high-precision fixed-point number. Decimal has the following core advantages: 1. It can represent a wider range. The value ranges of both precision and scale in DECIMAL have been significantly expanded. 2. Higher performance. The old version of DECIMAL requires 16 bytes in memory and 12 bytes in storage, while DECIMAL has made adaptive adjustments as shown below. -``` -+----------------------+------------------------------+ + | precision | Space occupied (memory/disk) | -+----------------------+------------------------------+ -| 0 < precision <= 9 | 4 bytes | -+----------------------+------------------------------+ -| 9 < precision <= 18 | 8 bytes | -+----------------------+------------------------------+ -| 18 < precision <= 38 | 16 bytes | -+----------------------+------------------------------+ -``` +|----------------------|-------------------| +| 0 < precision <= 9 | 4 bytes | +| 9 < precision <= 18 | 8 bytes | +| 18 < precision <= 38 | 16 bytes | +| 38 < precision <= 76 | 32 bytes | + 3. More complete precision deduction. For different expressions, different precision inference rules are applied to deduce the precision of the results. ### keywords diff --git a/versioned_docs/version-3.0/table-design/data-type.md b/versioned_docs/version-3.0/table-design/data-type.md index 8ba8a5b7fcb..baab27ce65b 100644 --- a/versioned_docs/version-3.0/table-design/data-type.md +++ b/versioned_docs/version-3.0/table-design/data-type.md @@ -40,7 +40,7 @@ The list of data types supported by Doris is as follows: | [LARGEINT](../sql-manual/basic-element/sql-data-types/numeric/LARGEINT) | 16 | Integer value, range is [-2^127 + 1 to 2^127 - 1]. | | [FLOAT](../sql-manual/basic-element/sql-data-types/numeric/FLOAT) | 4 | Single precision floating point number, range is [-3.4 * 10^38 to 3.4 * 10^38]. | | [DOUBLE](../sql-manual/basic-element/sql-data-types/numeric/DOUBLE) | 8 | Double precision floating point number, range is [-1.79 * 10^308 to 1.79 * 10^308]. | -| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16 | An exact fixed-point number defined by precision (total number of digits) and scale (number of digits to the right of the decimal point). Format: DECIMAL(M[,D]), where M is precision and D is scale. The range for M is [1, 38], and for D is [0, precision]. Storage requirements: - 4 bytes for 0 < precision <= 9, - 8 bytes for 9 < precision <= 18, - 16 bytes for 18 < precision <= 38. | +| [DECIMAL](../sql-manual/basic-element/sql-data-types/numeric/DECIMAL) | 4/8/16/32 | An exact fixed-point number defined by precision (total number of digits) and scale (number of digits to the right of the decimal point). Format: DECIMAL(P[,S]), where P is precision and S is scale. The range for P is [1, MAX_P], where MAX_P=38 when `enable_decimal256`=false, and MAX_P=76 when `enable_decimal256`=true, and for S is [0, P]. <br>The default value of `enable_decimal256` is f [...] ## [Datetime data type](../sql-manual/basic-element/sql-data-types/data-type-overview#date-types) @@ -68,12 +68,12 @@ The list of data types supported by Doris is as follows: ## [Aggregation data type](../sql-manual/basic-element/sql-data-types/data-type-overview#aggregation-types) -| Type name | Storeage (bytes)| Description | -|-------------------------------------------------------------------------| --------------- | ------------------------------------------------------------ | -| [HLL](../sql-manual/basic-element/sql-data-types/aggregate/HLL) | Variable Length | HLL stands for HyperLogLog, is a fuzzy deduplication. It performs better than Count Distinct when dealing with large datasets. The error rate of HLL is typically around 1%, and sometimes it can reach 2%. HLL cannot be used as a key column, and the aggregation type is HLL_UNION when creating a table. Users do not need to specify the length or default value as it is internally con [...] -| [BITMAP](../sql-manual/basic-element/sql-data-types/aggregate/BITMAP) | Variable Length | BITMAP type can be used in Aggregate tables, Unique tables or Duplicate tables. - When used in a Unique table or a Duplicate table, BITMAP must be employed as non-key columns. - When used in an Aggregate table, BITMAP must also serve as non-key columns, and the aggregation type must be set to BITMAP_UNION during table creation. Users do not need to specify the length or default [...] -| [QUANTILE_STATE](../sql-manual/basic-element/sql-data-types/aggregate/QUANTILE-STATE) | Variable Length | A type used to calculate approximate quantile values. When loading, it performs pre-aggregation for the same keys with different values. When the number of values does not exceed 2048, it records all data in detail. When the number of values is greater than 2048, it employs the TDigest algorithm to aggregate (cluster) the data and store the centroid points after clustering. QUAN [...] -| [AGG_STATE](../sql-manual/basic-element/sql-data-types/aggregate/AGG-STATE) | Variable Length | Aggregate function can only be used with state/merge/union function combiners. AGG_STATE cannot be used as a key column. When creating a table, the signature of the aggregate function needs to be declared alongside. Users do not need to specify the length or default value. The actual data storage size depends on the function's implementation. | +| Type name | Storeage (bytes)| Description | +| -------------- | --------------- | ------------------------------------------------------------ | +| [HLL](../sql-manual/basic-element/sql-data-types/aggregate/HLL) | Variable Length | HLL stands for HyperLogLog, is a fuzzy deduplication. It performs better than Count Distinct when dealing with large datasets. The error rate of HLL is typically around 1%, and sometimes it can reach 2%. HLL cannot be used as a key column, and the aggregation type is HLL_UNION when creating a table. Users do not need to specify the length or default value as it is internally controlled bas [...] +| [BITMAP](../sql-manual/basic-element/sql-data-types/aggregate/BITMAP) | Variable Length | BITMAP type can be used in Aggregate tables, Unique tables or Duplicate tables. - When used in a Unique table or a Duplicate table, BITMAP must be employed as non-key columns. - When used in an Aggregate table, BITMAP must also serve as non-key columns, and the aggregation type must be set to BITMAP_UNION during table creation. Users do not need to specify the length or default value as [...] +| [QUANTILE_STATE](../sql-manual/sql-data-types/aggregate/QUANTILE_STATE) | Variable Length | A type used to calculate approximate quantile values. When loading, it performs pre-aggregation for the same keys with different values. When the number of values does not exceed 2048, it records all data in detail. When the number of values is greater than 2048, it employs the TDigest algorithm to aggregate (cluster) the data and store the centroid points after clustering. QUANTILE_STATE can [...] +| [AGG_STATE](../sql-manual/sql-data-types/aggregate/AGG_STATE) | Variable Length | Aggregate function can only be used with state/merge/union function combiners. AGG_STATE cannot be used as a key column. When creating a table, the signature of the aggregate function needs to be declared alongside. Users do not need to specify the length or default value. The actual data storage size depends on the function's implementation. | ## [IP types](../sql-manual/basic-element/sql-data-types/data-type-overview#ip-types) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org