Nikola Mandic created SPARK-47617:
-------------------------------------
Summary: Add TCP-DS testing infrastructure for collations
Key: SPARK-47617
URL: https://issues.apache.org/jira/browse/SPARK-47617
Project: Spark
Issue Type: Task
Components: SQL
Affects Versions: 4.0.0
Reporter: Nikola Mandic
As collation support grows across all SQL features and new collation types are
added, we need to have reliable testing model covering as many standard SQL
capabilities as possible.
We can utilize TCP-DS testing infrastructure already present in Spark. The idea
is to vary TCP-DS table string columns by adding multiple collations with
different ordering rules and case sensitivity, producing new tables. These
tables should yield the same results against predefined TCP-DS queries for
certain batches of collations. For example, when comparing query runs on table
where columns are first collated as UTF8_BINARY and then as UTF8_BINARY_LCASE,
we should be getting same results after converting to lowercase.
Introduce new query suite which tests the described behavior with available
collations (utf8_binary and unicode) combined with case conversions (lowercase,
uppercase, randomized case for fuzzy testing).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]