[jira] [Updated] (SPARK-51162) SPIP: Add the TIME data type

Max Gekk (Jira) Tue, 11 Feb 2025 08:27:16 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-51162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Max Gekk updated SPARK-51162:
-----------------------------
    Summary: SPIP: Add the TIME data type  (was: [WIP] SPIP: Add the TIME data 
type)

> SPIP: Add the TIME data type
> ----------------------------
>
>                 Key: SPARK-51162
>                 URL: https://issues.apache.org/jira/browse/SPARK-51162
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Max Gekk
>            Assignee: Max Gekk
>            Priority: Major
>              Labels: SPIP
>
> *Q1. What are you trying to do? Articulate your objectives using absolutely 
> no jargon.*
> Add new data type *TIME* to Spark SQL which represents a time value with 
> fields hour, minute, second, up to microseconds. All operations over the type 
> are performed without taking any time zone into account. New data type should 
> conform to the type *TIME\(n\) WITHOUT TIME ZONE* defined by the SQL standard.
> *Q2. What problem is this proposal NOT designed to solve?*
> Don't support the TIME type with time zone defined by the SQL standard: 
> *TIME\(n\) WITH TIME ZONE*.
> *Q3. How is it done today, and what are the limits of current practice?*
> The TIME type can be emulated via the TIMESTAMP_NTZ data type by setting the 
> date part to the some constant value like 1970-01-01, 0001-01-01 or 
> 0000-00-00 (though this is out of supported rage of dates).
> Although the type can be emulation via TIMESTAMP_NTZ, Spark SQL cannot 
> recognize it in data sources, and for instance cannot load the TIME values 
> from parquet files.
> *Q4. What is new in your approach and why do you think it will be successful?*
> The approach is not new, and we have clear picture how to split the work by 
> sub-tasks based on our experience of adding new types ANSI intervals and 
> TIMESTAMP_NTZ.
> *Q5. Who cares? If you are successful, what difference will it make?*
> The new type simplifies migrations to Spark SQL from other DBMS like 
> PostgreSQL, Snowflake, Google SQL, Amazon Redshift, Teradata, DB2. Such users 
> don't have to rewrite their SQL code to emulate the TIME type. Also new 
> functionality impacts on existing Spark SQL users who need to load data w/ 
> the TIME values that were stored by other systems.
> *Q6. What are the risks?*
> Additional handling new type in operators, expression and data sources can 
> cause performance regressions. Such risk can be compensated by developing 
> time benchmarks in parallel with supporting new type in different places in 
> Spark SQL.
>  
> *Q7. How long will it take?*
> In total it might take around *9 months*. The estimation is based on similar 
> tasks: ANSI intervals 
> ([SPARK-27790|https://issues.apache.org/jira/browse/SPARK-27790]) and 
> TIMESTAMP_NTZ 
> ([SPARK-35662|https://issues.apache.org/jira/browse/SPARK-35662]). We can 
> split the work by function blocks:
> # Base functionality - *3 weeks*
> Add new type TimeType, forming/parsing time literals, type constructor, and 
> external types.
> # Persistence - *3.5 months*
> Ability to create tables of the type TIME, read/write from/to Parquet and 
> other built-in data types, partitioning, stats, predicate push down.
> # Time operators - *2 months*
> Arithmetic ops, field extract, sorting, and aggregations.
> # Clients support - *1 month*
> JDBC, Hive, Thrift server, connect
> # PySpark integration - *1 month*
> DataFrame support, pandas API, python UDFs, Arrow column vectors
> # Docs + testing/benchmarking - *1 month*
> *Q8. What are the mid-term and final “exams” to check for success?*
> The mid-term is in 4 month: basic functionality, read/write new type to 
> built-in datasources, basic time operations such as arithmetic ops, casting.
> The final "exams" is to support the same functionality as other time types: 
> TIMESTAMP_NTZ, DATE, TIMESTAMP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-51162) SPIP: Add the TIME data type

Reply via email to