This is an automated email from the ASF dual-hosted git repository. lihaopeng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push: new 43b19fd99e3 [docs](timezone) refactor docs of timezone 43b19fd99e3 is described below commit 43b19fd99e33c83e54a711b64a15e58f630110fc Author: zclllyybb <zhaochan...@selectdb.com> AuthorDate: Thu Jan 4 20:20:40 2024 +0800 [docs](timezone) refactor docs of timezone --- docs/en/docs/advanced/time-zone.md | 150 ++++++++++++++++----- docs/en/docs/advanced/variables.md | 4 +- .../Load/STREAM-LOAD.md | 10 +- .../sql-reference/Data-Types/DATETIME.md | 35 ++++- .../SET-VARIABLE.md | 22 +-- docs/zh-CN/docs/advanced/time-zone.md | 140 ++++++++++++++++--- docs/zh-CN/docs/advanced/variables.md | 4 +- .../Load/STREAM-LOAD.md | 13 +- .../sql-reference/Data-Types/DATETIME.md | 35 ++++- .../SET-VARIABLE.md | 22 +-- .../datatype_p0/datetimev2/test_tz_streamload.out | 3 + .../datatype_p0/datetimev2/test_tz_streamload2.csv | 5 +- 12 files changed, 338 insertions(+), 105 deletions(-) diff --git a/docs/en/docs/advanced/time-zone.md b/docs/en/docs/advanced/time-zone.md index e7297047146..e18d06c632a 100644 --- a/docs/en/docs/advanced/time-zone.md +++ b/docs/en/docs/advanced/time-zone.md @@ -26,24 +26,14 @@ under the License. # Time Zone -Doris supports multiple time zone settings - -## Noun Interpretation - -* FE: Frontend, the front-end node of Doris. Responsible for metadata management and request access. -* BE: Backend, Doris's back-end node. Responsible for query execution and data storage. +Doris supports custom time zone settings ## Basic concepts -There are multiple time zone related parameters in Doris - -* `system_time_zone`: - -When the server starts, it will be set automatically according to the time zone set by the machine, which cannot be modified after setting. +The following two time zone related parameters exist within Doris: -* `time_zone`: - -Server current time zone, set it at session level or global level. +- `system_time_zone` : When the server starts up, it will be set automatically according to the time zone set by the machine, and cannot be modified after it is set. +- `time_zone` : The current time zone of the cluster. ## Specific operations @@ -51,21 +41,27 @@ Server current time zone, set it at session level or global level. View the current time zone related configuration -2. `SET time_zone = 'Asia/Shanghai'` +2. `SET [global] time_zone = 'Asia/Shanghai'` + + This command sets the time zone at the session level. If the `global` keyword is used, Doris FE persists the parameter and it takes effect for all new sessions afterwards. + +## Data source + +The time zone data contains the name of the time zone, the corresponding time offset, and the change of daylight saving time. On the machine where the BE is located, the sources of the data are as follows: - This command can set the session level time zone, which will fail after disconnection. +1. the directory returned by the `TZDIR` command +2. the `/usr/share/zoneinfo` directory +3. the `zoneinfo` directory generated under the doris BE deployment directory. The `resource/zoneinfo.tar.gz` directory from the doris repository. -3. `SET global time_zone = 'Asia/Shanghai'` - - This command can set time zone parameters at the global level. The FE will persist the parameters and will not fail when the connection is disconnected. +Look up the above data sources in order and use the current item if found. If none of the three are found, the doris BE will fail to start, please rebuild the BE correctly or get the distribution. -### Impact of time zone +## Impact of time zone -Time zone setting affects the display and storage of time zone sensitive values. +### 1. functions -It includes the values displayed by time functions such as `NOW()` or `CURTIME()`, as well as the time values in `SHOW LOAD` and `SHOW BACKENDS` statements. +Includes values displayed by time functions such as `NOW()` or `CURTIME()`, and also time values in `show load`, `show backends`. -However, it does not affect the `LESS THAN VALUE` of the time-type partition column in the `CREATE TABLE` statement, nor does it affect the display of values stored as `DATE/DATETIME` type. +However, it does not affect the less than value of the time-type partitioned columns in `create table`, nor does it affect the display of values stored as `date/datetime` types. Functions affected by time zone: @@ -79,18 +75,110 @@ Functions affected by time zone: * `CONVERT_TZ`: Converts a date and time from one specified time zone to another. -## Restrictions +### 2. Values of time types + +For `DATE`, `DATEV2`, `DATETIME`, `DATETIMEV2` types, we support time zone conversion when inserting data. + +- If the data comes with a time zone, such as "2020-12-12 12:12:12+08:00", and the current Doris `time_zone = +00:00`, you get the actual value "2020-12-12 04:12:12". +- If the data does not come with a time zone, such as "2020-12-12 12:12:12", the time is considered to be absolute and no conversion occurs. + +### 3. Daylight Saving Time + +Daylight Saving Time is essentially the actual time offset of a named time zone, which changes on certain dates. + +For example, the `America/Los_Angeles` time zone contains a Daylight Saving Time adjustment that begins and ends approximately in March and November of each year. That is, the `America/Los_Angeles` actual time zone offset changes from `-08:00` to `-07:00` at the start of Daylight Savings Time in March, and from `-07:00` to `-08:00` at the end of Daylight Savings Time in November. +If you do not want Daylight Saving Time to be turned on, set `time_zone` to `-08:00` instead of `America/Los_Angeles`. + +## Usage + +Time zone values can be given in a variety of formats. The following standard formats are well supported in Doris: + +1. standard named time zone formats, such as "Asia/Shanghai", "America/Los_Angeles". +2. standard offset formats, such as "+02:30", "-10:00". +3. abbreviated time zone formats, currently only support: + 1. "GMT", "UTC", equivalent to "+00:00" time zone + 2. "CST", which is equivalent to the "Asia/Shanghai" time zone +4. single letter Z, for Zulu time zone, equivalent to "+00:00" time zone + +Note: Some other formats are currently supported in some imports in Doris due to different implementations. **Production environments should not rely on these formats that are not listed here, and their behaviour may change at any time**, so keep an eye on the relevant changelog for version updates. + +## Best Practices + +### Time Zone Sensitive Data + +The time zone issue involves three main influences: + +1. session variable `time_zone` -- cluster timezone +2. header `timezone` specified during import(Stream Load, Broker Load etc.) -- importing timezone +3. timezone type literal "+08:00" in "2023-12-12 08:00:00+08:00" -- data timezone + +We can understand it as follows: + +Doris is currently compatible with importing data into Doris under all time zones. Since time types such as `DATETIME` do not contain time zone information, the time type data in the Doris cluster can be divided into two categories: + +1. absolute time +2. time in a specific time zone + +Absolute time means that it is associated with a data scenario that is independent of the time zone. For this type of data, it should be imported without any time zone suffix and they will be stored as is. For this type of time, since it is not associated with an actual time zone, taking the result of a function such as `unix_timestamp` is meaningless. Changes to the cluster `time_zone` will not affect its use. + +The time in a particular time zone. This "specific time zone" is our session variable `time_zone`. As a matter of best practice, this variable should be set before data is imported **and never changed**. At this point in time, this type of time data in the Doris cluster will actually mean: time in the `time_zone` time zone. Example: + +```sql +mysql> select @@time_zone; ++----------------+ +| @@time_zone | ++----------------+ +| Asia/Hong_Kong | ++----------------+ +1 row in set (0.12 sec) + +mysql> insert into dtv23 values('2020-12-12 12:12:12+02:00'); --- absolute timezone is +02:00 +Query OK, 1 row affected (0.27 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 18:12:12.000 | --- converted to Doris' cluster timezone Asia/Hong_Kong. This semantics should be maintained. ++-------------------------+ +1 row in set (0.19 sec) + +mysql> set time_zone = 'America/Los_Angeles'; +Query OK, 0 rows affected (0.15 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 18:12:12.000 | --- If time_zone is modified, the time value does not change and its meaning is disturbed. ++-------------------------+ +1 row in set (0.18 sec) + +mysql> insert into dtv23 values('2020-12-12 12:12:12+02:00'); +Query OK, 1 row affected (0.17 sec) -Time zone values can be given in several formats, case-insensitive: +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 02:12:12.000 | +| 2020-12-12 18:12:12.000 | ++-------------------------+ --- the data has been misplaced. +2 rows in set (0.19 sec) +``` -* A string representing UTC offset, such as '+10:00' or '-6:00'. +In summary, the best practice for dealing with time zone issues is to: -* Standard time zone formats, such as "Asia/Shanghai", "America/Los_Angeles" +1. Confirm the timezone characterised by the cluster and set the `time_zone` before use, and do not change it afterwards. +2. Set the header `timezone` on import to match the cluster `time_zone`. +3. For absolute time, import without a time zone suffix; for time in a time zone, import with a specific time zone suffix, which will be converted to the Doris `time_zone` time zone after import. -* Abbreviated time zone formats such as MET and CTT are not supported. Because the abbreviated time zone is ambiguous in different scenarios, it is not recommended to use it. +### Daylight Saving Time -* In order to be compatible with Doris and support CST abbreviated time zone, CST will be internally transferred to "Asia/Shanghai", which is Chinese standard time zone. +The start and end times for Daylight Saving Time are taken from the [current time zone data source](#data-source) and may not necessarily correspond exactly to the actual officially recognised times for the current year's time zone location. This data is maintained by ICANN. If you need to ensure that Daylight Saving Time behaves as specified for the current year, please make sure that data source selected by Doris is the latest ICANN published time zone data, which could be downloaded a [...] -## Time zone format list +## Extended Reading -[List of TZ database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) +- [List of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) +- [IANA Time Zone Database](https://www.iana.org/time-zones) +- [The tz-announce Archives](https://mm.icann.org/pipermail/tz-announce/) diff --git a/docs/en/docs/advanced/variables.md b/docs/en/docs/advanced/variables.md index 030e228f523..459f1f5601e 100644 --- a/docs/en/docs/advanced/variables.md +++ b/docs/en/docs/advanced/variables.md @@ -393,11 +393,11 @@ Note that the comment must start with /*+ and can only follow the SELECT. * `system_time_zone` - Displays the current system time zone. Cannot be changed. + Set to the current system time zone when the cluster is initialised. It cannot be changed. * `time_zone` - Used to set the time zone of the current session. The time zone has an effect on the results of certain time functions. For the time zone, see [here](./time-zone.md). + Used to set the time zone for the current session. Defaults to the value of `system_time_zone`. It affects the results of certain time functions. For more information, see the [time zone](./time-zone) documentation. * `tx_isolation` diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md index 2431a824d77..86601d58b76 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md @@ -95,7 +95,7 @@ Parameter introduction: 9. strict_mode: The user specifies whether to enable strict mode for this import. The default is off. The enable mode is -H "strict_mode: true". -10. timezone: Specify the time zone used for this import. The default is Dongba District. This parameter affects the results of all time zone-related functions involved in the import. +10. timezone: Specifies the timezone used for this import. The default is "+08:00". This variable replaces the session variable `time_zone` in this import transaction. See the section "Importing with timezones" in [Best Practice](#best-practice) for more information. 11. exec_mem_limit: Import memory limit. Default is 2GB. The unit is bytes. @@ -462,3 +462,11 @@ separated by commas. Doris also limits the number of import tasks running at the same time in the cluster, usually ranging from 10-20. Import jobs submitted after that will be rejected. +10. Importing with timezones + + Since Doris currently has no built-in time types for time zones, all `DATETIME` related types only represent absolute points in time, and do not contain time zone information, which does not change due to time zone changes in the Doris system. Therefore, for importing data with a time zone, we uniformly handle it as **converting it to data in a specific target time zone**. In the Doris system, this is the time zone represented by the session variable `time_zone`. + + In the import, on the other hand, our target timezone is specified by the parameter `timezone`, which will replace the session variable `time_zone` when timezone conversions occur, and when computing timezone-sensitive functions. Therefore, if there are no special circumstances, `timezone` should be set in the import transaction to match the `time_zone` of the current Doris cluster. This means that all time data with a time zone will be converted to that time zone. + For example, if the Doris system timezone is "+08:00", and the time column in the imported data contains two pieces of data, "2012-01-01 01:00:00Z" and "2015-12-12 12:12:12-08:00", then after we specify the timezone of the imported transaction via `-H "timezone: +08:00"` during import, both pieces of data will be converted to that timezone, resulting in the results "2012-01-01 09:00:00" and "2015-12-13 04:12:12". + + For a more detailed understanding, see [time-zone](../../../../advanced/time-zone) document. diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Types/DATETIME.md b/docs/en/docs/sql-manual/sql-reference/Data-Types/DATETIME.md index 78ea8f5499d..554c9e634a0 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Types/DATETIME.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Types/DATETIME.md @@ -41,7 +41,36 @@ The form of printing is 'yyyy-MM-dd HH:mm:ss.SSSSSS' ### note -DATETIME supports precision up to microseconds. + DATETIME supports temporal precision up to microseconds. When parsing imported DATETIME type data using the BE side (e.g. using Stream load, Spark load, etc.), or using the FE side with the [Nereids](../../../query-acceleration/nereids) on, decimals exceeding the current precision will be **rounded**. + DATETIME reads support resolving the time zone in the format of the original DATETIME literal followed by the time zone: +```sql +<date> <time>[<timezone>] +``` -### keywords -DATETIME + For the specific supported formats for `<timezone>`, see [timezone](../../../advanced/time-zone). Note that the `DATE`, `DATEV2`, `DATETIME`, and `DATETIMEV2` types **don't** contain time zone information. For example, if an input time string "2012-12-12 08:00:00+08:00" is parsed and converted to the current time zone "+02:00", and the actual value "2012-12-12 02:00:00" is stored in the DATETIME column, the value itself will not change, no matter how much the cluster environment vari [...] + +### example + +```sql +mysql> select @@time_zone; ++----------------+ +| @@time_zone | ++----------------+ +| Asia/Hong_Kong | ++----------------+ +1 row in set (0.11 sec) + +mysql> insert into dtv23 values ("2020-12-12 12:12:12Z"), ("2020-12-12 12:12:12GMT"), ("2020-12-12 12:12:12+02:00"), ("2020-12-12 12:12:12America/Los_Angeles"); +Query OK, 4 rows affected (0.17 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 20:12:12.000 | +| 2020-12-12 20:12:12.000 | +| 2020-12-13 04:12:12.000 | +| 2020-12-12 18:12:12.000 | ++-------------------------+ +4 rows in set (0.15 sec) +``` diff --git a/docs/en/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md b/docs/en/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md index 70b23582365..75d0ca806cb 100644 --- a/docs/en/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md +++ b/docs/en/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md @@ -49,25 +49,7 @@ illustrate: > Note: > > 1. Only ADMIN users can set variables to take effect globally -> 2. The globally effective variable does not affect the variable value of the current session, but only affects the variable in the new session. - -Variables that support both the current session and the global effect include: - -- `time_zone` -- `wait_timeout` -- `sql_mode` -- `enable_profile` -- `query_timeout` -- <version since="dev" type="inline">`insert_timeout`</version> -- `exec_mem_limit` -- `batch_size` -- `allow_partition_column_nullable` -- `insert_visible_timeout_ms` -- `enable_fold_constant_by_be` - -Variables that only support global effects include: - -- `default_rowset_type` +> 2. The globally effective variable affects the current session and new sessions thereafter, but does not affect other sessions that currently exist. ### Example @@ -87,5 +69,3 @@ Variables that only support global effects include: SET, VARIABLE -### Best Practice - diff --git a/docs/zh-CN/docs/advanced/time-zone.md b/docs/zh-CN/docs/advanced/time-zone.md index 7d4dc9aa23f..847572fe610 100644 --- a/docs/zh-CN/docs/advanced/time-zone.md +++ b/docs/zh-CN/docs/advanced/time-zone.md @@ -26,36 +26,42 @@ under the License. # 时区 -Doris 支持多时区设置 +Doris 支持自定义时区设置 ## 基本概念 -Doris 内部存在多个时区相关参数 +Doris 内部存在以下两个时区相关参数: - system_time_zone : 当服务器启动时,会根据机器设置时区自动设置,设置后不可修改。 -- time_zone : 服务器当前时区,区分session级别和global级别 +- time_zone : 集群当前时区,可以修改。 ## 具体操作 -1. show variables like '%time_zone%' +1. `show variables like '%time_zone%'` 查看当前时区相关配置 -2. SET time_zone = 'Asia/Shanghai' +2. `SET [global] time_zone = 'Asia/Shanghai'` - 该命令可以设置session级别的时区,连接断开后失效 + 该命令可以设置session级别的时区,如使用`global`关键字,则Doris FE会将参数持久化,之后对所有新session生效。 -3. SET global time_zone = 'Asia/Shanghai' +## 数据来源 - 该命令可以设置global级别的时区参数,fe会将参数持久化,连接断开后不失效 +时区数据包含时区名、对应时间偏移量、夏令时变化情况等。在 BE 所在机器上,其数据来源依次为: -### 时区的影响 +1. `TZDIR` 命令返回的目录 +2. `/usr/share/zoneinfo` 目录 +3. doris BE 部署目录下生成的 `zoneinfo` 目录。来自 doris repository 下的 `resource/zoneinfo.tar.gz` -时区设置会影响对时区敏感的时间值的显示和存储。 +按顺序查找以上数据源,如果找到则使用当前项。三项均未找到,则 doris BE 将启动失败,请重新正确构建 BE 或获取发行版。 -包括NOW()或CURTIME()等时间函数显示的值,也包括show load, show backends中的时间值。 +## 时区的影响 -但不会影响 create table 中时间类型分区列的 less than 值,也不会影响存储为 date/datetime 类型的值的显示。 +### 1. 函数 + +包括`NOW()`或`CURTIME()`等时间函数显示的值,也包括`show load`, `show backends`中的时间值。 + +但不会影响 `create table` 中时间类型分区列的 less than 值,也不会影响存储为 `date/datetime` 类型的值的显示。 受时区影响的函数: @@ -65,16 +71,110 @@ Doris 内部存在多个时区相关参数 - `NOW`:返指定地时区日期时间。 - `CONVERT_TZ`:将一个日期时间从一个指定时区转换到另一个指定时区。 -## 使用限制 +### 2. 时间类型的值 + +对于`DATE`, `DATEV2`, `DATETIME`, `DATETIMEV2`类型,我们支持插入数据时对时区进行转换。 + +- 如果数据带有时区,如 "2020-12-12 12:12:12+08:00",而当前 Doris `time_zone = +00:00`,则得到实际值 "2020-12-12 04:12:12"。 +- 如果数据不带有时区,如 "2020-12-12 12:12:12",则认为该时间为绝对时间,不发生任何转换。 + +### 3. 夏令时 + +夏令时的本质是具名时区的实际时间偏移量,在一定日期内发生改变。 + +例如,`America/Los_Angeles`时区包含一次夏令时调整,起止时间为约为每年3月至11月。即,三月份夏令时开始时,`America/Los_Angeles`实际时区偏移由`-08:00`变为`-07:00`,11月夏令时结束时,又从`-07:00`变为`-08:00`。 +如果不希望开启夏令时,则应设定 `time_zone` 为 `-08:00` 而非 `America/Los_Angeles`。 + +## 使用方式 + +时区值可以使用多种格式给出,以下是 Doris 中完善支持的标准格式: + +1. 标准具名时区格式,如 "Asia/Shanghai", "America/Los_Angeles" +2. 标准偏移格式,如 "+02:30", "-10:00" +3. 缩写时区格式,当前仅支持: + 1. "GMT", "UTC",等同于 "+00:00" 时区 + 2. "CST", 等同于 "Asia/Shanghai" 时区 +4. 单字母Z,代表Zulu时区,等同于 "+00:00" 时区 + +注意:由于实现方式的不同,当前 Doris 存在部分其他格式在部分导入方式中得到了支持。**生产环境不应当依赖这些未列于此的格式,它们的行为随时可能发生变化**,请关注版本更新时的相关 changelog。 + +## 最佳实践 + +### 时区敏感数据 + +时区问题主要涉及三个影响因素: + +1. session variable `time_zone` —— 集群时区 +2. Stream Load、Broker Load 等导入时指定的 header `timezone` —— 导入时区 +3. 时区类型字面量 "2023-12-12 08:00:00+08:00" 中的 "+08:00" —— 数据时区 + +我们可以做如下理解: + +Doris 目前兼容各时区下的数据向 Doris 中进行导入。而由于 `DATETIME` 等各个时间类型本身不内含时区信息,因此 Doris 集群内的时间类型数据,可以分为两类: + +1. 绝对时间 +2. 特定时区下的时间 + +所谓绝对时间是指,它所关联的数据场景与时区无关。对于这类数据,在导入时应该不带有任何时区后缀,它们将被原样存储。对于这类时间,因为不关联到实际的时区,取其 `unix_timestamp` 等函数结果是无实际意义的。而集群 `time_zone` 的改变不会影响它的使用。 + +所谓“某个特定时区下”的时间。这个“特定时区”就是我们的 session variable `time_zone`。就最佳实践而言,该变量应当在数据导入前确定,**且不再更改**。此时 Doris 集群中的该类时间数据,其实际意义为:在 `time_zone` 时区下的时间。例如: + +```sql +mysql> select @@time_zone; ++----------------+ +| @@time_zone | ++----------------+ +| Asia/Hong_Kong | ++----------------+ +1 row in set (0.12 sec) + +mysql> insert into dtv23 values('2020-12-12 12:12:12+02:00'); --- 绝对时区为+02:00 +Query OK, 1 row affected (0.27 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 18:12:12.000 | --- 被转换为 Doris 集群时区 Asia/Hong_Kong,应当保持此语义。 ++-------------------------+ +1 row in set (0.19 sec) + +mysql> set time_zone = 'America/Los_Angeles'; +Query OK, 0 rows affected (0.15 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 18:12:12.000 | --- 如果修改 time_zone,时间值不会随之改变,其意义发生紊乱。 ++-------------------------+ +1 row in set (0.18 sec) + +mysql> insert into dtv23 values('2020-12-12 12:12:12+02:00'); +Query OK, 1 row affected (0.17 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 02:12:12.000 | +| 2020-12-12 18:12:12.000 | ++-------------------------+ --- 此时可以发现,数据已经发生错乱。 +2 rows in set (0.19 sec) +``` + +综上所述,处理时区问题最佳的实践是: -时区值可以使用几种格式给出,不区分大小写: +1. 在使用前确认该集群所表征的时区并设置 `time_zone`,在此之后不再更改。 +2. 在导入时设定 header `timezone` 同集群 `time_zone` 一致。 +3. 对于绝对时间,导入时不带时区后缀;对于有时区的时间,导入时带具体时区后缀,导入后将被转化至 Doris `time_zone` 时区。 -- 表示UTC偏移量的字符串,如'+10:00'或'-6:00' -- 标准时区格式,如"Asia/Shanghai"、"America/Los_Angeles" -- 不支持缩写时区格式,如"MET"、"CTT"。因为缩写时区在不同场景下存在歧义,不建议使用。 -- 为了兼容Doris,支持CST缩写时区,内部会将CST转移为"Asia/Shanghai"的中国标准时区 +### 夏令时 -## 时区格式列表 +夏令时的起讫时间来自于[当前时区数据源](#数据来源),不一定与当年度时区所在地官方实际确认时间完全一致。该数据由 ICANN 进行维护。如果需要确保夏令时表现与当年度实际规定一致,请保证 Doris 所选择的数据源为最新的 ICANN 所公布时区数据,下载途径见于[拓展阅读](#拓展阅读)中。 -[List of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) +## 拓展阅读 +- 时区格式列表:[List of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) +- IANA 时区数据库:[IANA Time Zone Database](https://www.iana.org/time-zones) +- ICANN 时区数据库:[The tz-announce Archives](https://mm.icann.org/pipermail/tz-announce/) diff --git a/docs/zh-CN/docs/advanced/variables.md b/docs/zh-CN/docs/advanced/variables.md index 2a697b01bf7..99166b8293f 100644 --- a/docs/zh-CN/docs/advanced/variables.md +++ b/docs/zh-CN/docs/advanced/variables.md @@ -389,11 +389,11 @@ SELECT /*+ SET_VAR(query_timeout = 1, enable_partition_cache=true) */ sleep(3); - `system_time_zone` - 显示当前系统时区。不可更改。 + 集群初始化时设置为当前系统时区。不可更改。 - `time_zone` - 用于设置当前会话的时区。时区会对某些时间函数的结果产生影响。关于时区,可以参阅 [这里](./time-zone.md)。 + 用于设置当前会话的时区。默认值为 `system_time_zone` 的值。时区会对某些时间函数的结果产生影响。关于时区,可以参阅 [时区](./time-zone)文档。 - `tx_isolation` diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md index 3149002f2a0..3e3388fa1be 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md @@ -95,7 +95,7 @@ curl --location-trusted -u user:passwd [-H ""...] -T data.file -XPUT http://fe_h 9. strict_mode: 用户指定此次导入是否开启严格模式,默认为关闭。开启方式为 -H "strict_mode: true"。 -10. timezone: 指定本次导入所使用的时区。默认为东八区。该参数会影响所有导入涉及的和时区有关的函数结果。 +10. timezone: 指定本次导入所使用的时区。默认为东八区。在本次导入事务中,该变量起到了替代session variable `time_zone` 的作用。详情请见[最佳实践](#best-practice)中“涉及时区的导入”一节。 11. exec_mem_limit: 导入内存限制。默认为 2GB。单位为字节。 @@ -123,7 +123,7 @@ curl --location-trusted -u user:passwd [-H ""...] -T data.file -XPUT http://fe_h 16. merge_type: 数据的合并类型,一共支持三种类型APPEND、DELETE、MERGE 其中,APPEND是默认值,表示这批数据全部需要追加到现有数据中,DELETE 表示删除与这批数据key相同的所有行,MERGE 语义 需要与delete 条件联合使用,表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理, 示例:`-H "merge_type: MERGE" -H "delete: flag=1"` -17. delete: 仅在 MERGE下有意义, 表示数据的删除条件 +17. delete: 仅在 MERGE下有意义,表示数据的删除条件 18. function_column.sequence_col: 只适用于UNIQUE_KEYS,相同key列下,保证value列按照source_sequence列进行REPLACE, source_sequence可以是数据源中的列,也可以是表结构中的一列。 @@ -446,3 +446,12 @@ curl --location-trusted -u user:passwd [-H ""...] -T data.file -XPUT http://fe_h Stream Load 适合导入几个GB以内的数据,因为数据为单线程传输处理,因此导入过大的数据性能得不到保证。当有大量本地数据需要导入时,可以并行提交多个导入任务。 Doris 同时会限制集群内同时运行的导入任务数量,通常在 10-20 个不等。之后提交的导入作业会被拒绝。 + +10. 涉及时区的导入 + + 由于 Doris 目前没有内置时区的时间类型,所有 `DATETIME` 相关类型均只表示绝对的时间点,而不包含时区信息,不因 Doris 系统时区变化而发生变化。因此,对于带时区数据的导入,我们统一的处理方式为**将其转换为特定目标时区下的数据**。在 Doris 系统中,即 session variable `time_zone` 所代表的时区。 + + 而在导入中,我们的目标时区通过参数 `timezone` 指定,该变量在发生时区转换、运算时区敏感函数时将会替代 session variable `time_zone`。因此,如果没有特殊情况,在导入事务中应当设定 `timezone` 与当前 Doris 集群的 `time_zone` 一致。此时意味着所有带时区的时间数据,均会发生向该时区的转换。 + 例如,Doris 系统时区为 "+08:00",导入数据中的时间列包含两条数据,分别为 "2012-01-01 01:00:00Z" 和 "2015-12-12 12:12:12-08:00",则我们在导入时通过 `-H "timezone: +08:00"` 指定导入事务的时区后,这两条数据都会向该时区发生转换,从而得到结果 "2012-01-01 09:00:00" 和 "2015-12-13 04:12:12"。 + + 更详细的理解,请参阅[时区](../../../../advanced/time-zone)文档。 diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/DATETIME.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/DATETIME.md index f149bd1ae49..d819db31baa 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/DATETIME.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Types/DATETIME.md @@ -41,7 +41,40 @@ DATETIMEV2 ### note - DATETIME支持了最多到微秒的时间精度。 + DATETIME 支持了最多到微秒的时间精度。在使用 BE 端解析导入的 DATETIME 类型数据时(如使用Stream load、Spark load等),或开启[新优化器](../../../query-acceleration/nereids)后在 FE 端解析 DATETIME 类型数据时,将会对超出当前精度的小数进行**四舍五入**。 + + DATETIME 读入时支持解析时区,格式为原本 DATETIME 字面量后紧贴时区: +```sql +<date> <time>[<timezone>] +``` + + 关于`<timezone>`的具体支持格式,请见[时区](../../../advanced/time-zone)。需要注意的是,`DATE`, `DATEV2`, `DATETIME`, `DATETIMEV2` 类型均**不**包含时区信息。例如,一个输入的时间字符串 "2012-12-12 08:00:00+08:00" 经解析并转换至当前时区 "+02:00",得到实际值 "2012-12-12 02:00:00" 后存储于 DATETIME 列中,则之后无论本集群环境变量如何改变,该值本身都不会发生变化。 + +### example + +```sql +mysql> select @@time_zone; ++----------------+ +| @@time_zone | ++----------------+ +| Asia/Hong_Kong | ++----------------+ +1 row in set (0.11 sec) + +mysql> insert into dtv23 values ("2020-12-12 12:12:12Z"), ("2020-12-12 12:12:12GMT"), ("2020-12-12 12:12:12+02:00"), ("2020-12-12 12:12:12America/Los_Angeles"); +Query OK, 4 rows affected (0.17 sec) + +mysql> select * from dtv23; ++-------------------------+ +| k0 | ++-------------------------+ +| 2020-12-12 20:12:12.000 | +| 2020-12-12 20:12:12.000 | +| 2020-12-13 04:12:12.000 | +| 2020-12-12 18:12:12.000 | ++-------------------------+ +4 rows in set (0.15 sec) +``` ### keywords diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md b/docs/zh-CN/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md index b8c7963da35..c049215e8f1 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Database-Administration-Statements/SET-VARIABLE.md @@ -49,25 +49,7 @@ SET variable_assignment [, variable_assignment] ... > 注意: > > 1. 只有 ADMIN 用户可以设置变量的全局生效 -> 2. 全局生效的变量不影响当前会话的变量值,仅影响新的会话中的变量。 - -既支持当前会话生效又支持全局生效的变量包括: - -- `time_zone` -- `wait_timeout` -- `sql_mode` -- `enable_profile` -- `query_timeout` -- <version since="dev" type="inline">`insert_timeout`</version> -- `exec_mem_limit` -- `batch_size` -- `allow_partition_column_nullable` -- `insert_visible_timeout_ms` -- `enable_fold_constant_by_be` - -只支持全局生效的变量包括: - -- `default_rowset_type` +> 2. 全局生效的变量影响当前会话和此后的新会话,不影响当前已经存在的其他会话。 ### Example @@ -87,5 +69,3 @@ SET variable_assignment [, variable_assignment] ... SET, VARIABLE -### Best Practice - diff --git a/regression-test/data/datatype_p0/datetimev2/test_tz_streamload.out b/regression-test/data/datatype_p0/datetimev2/test_tz_streamload.out index cc87e0fdfa0..c7fd50e3bb4 100644 --- a/regression-test/data/datatype_p0/datetimev2/test_tz_streamload.out +++ b/regression-test/data/datatype_p0/datetimev2/test_tz_streamload.out @@ -15,4 +15,7 @@ 3 2023-08-17T17:41:18 4 2023-08-17T14:41:18 5 2023-08-17T09:41:18 +6 2023-08-18T01:41:18 +7 2023-08-17T17:41:18 +8 2023-08-17T19:41:18 diff --git a/regression-test/data/datatype_p0/datetimev2/test_tz_streamload2.csv b/regression-test/data/datatype_p0/datetimev2/test_tz_streamload2.csv index 9d94561af4c..22490c28b8d 100644 --- a/regression-test/data/datatype_p0/datetimev2/test_tz_streamload2.csv +++ b/regression-test/data/datatype_p0/datetimev2/test_tz_streamload2.csv @@ -2,4 +2,7 @@ 2,2023-08-17T01:41:18uTc 3,2023-08-17T01:41:18UTC 4,2023-08-17T01:41:18UTC+3 -5,2023-08-17T01:41:18Asia/Shanghai \ No newline at end of file +5,2023-08-17T01:41:18Asia/Shanghai +6,2023-08-17T01:41:18America/Los_Angeles +7,2023-08-17T01:41:18GMT +8,2023-08-17T01:41:18GMT-2 \ No newline at end of file --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org