This is an automated email from the ASF dual-hosted git repository. xxyu pushed a commit to branch doc5.0 in repository https://gitbox.apache.org/repos/asf/kylin.git
commit 69ee8500bb27e6585f877772b2da0f6921e5e949 Author: Mukvin <boyboys...@163.com> AuthorDate: Tue Aug 16 14:33:59 2022 +0800 KYLIN-5221 add configuration docs --- website/docs/configuration/configuration.md | 220 +++++++++++++++++++++ website/docs/configuration/hadoop_queue_config.md | 53 +++++ website/docs/configuration/https.md | 74 +++++++ .../docs/configuration/images/hadoop_queue/1.png | Bin 0 -> 126858 bytes .../docs/configuration/images/hadoop_queue/2.png | Bin 0 -> 44273 bytes .../docs/configuration/images/hadoop_queue/3.png | Bin 0 -> 183045 bytes .../configuration/images/spark_executor_max.jpg | Bin 0 -> 71527 bytes .../configuration/images/spark_executor_min.jpg | Bin 0 -> 35309 bytes .../images/spark_executor_original.jpg | Bin 0 -> 53356 bytes website/docs/configuration/intro.md | 35 +++- website/docs/configuration/log_rotate.md | 36 ++++ website/docs/configuration/query_cache.md | 72 +++++++ .../docs/configuration/spark_dynamic_allocation.md | 93 +++++++++ website/docs/configuration/spark_rpc_encryption.md | 43 ++++ 14 files changed, 619 insertions(+), 7 deletions(-) diff --git a/website/docs/configuration/configuration.md b/website/docs/configuration/configuration.md new file mode 100644 index 0000000000..979e92303f --- /dev/null +++ b/website/docs/configuration/configuration.md @@ -0,0 +1,220 @@ +--- +title: Basic Configuration +language: en +sidebar_label: Basic Configuration +pagination_label: Basic Configuration +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - Basic Configuration +draft: true +last_update: + date: 08/16/2022 +--- + +This chapter will introduce some common configurations, the main contents are as follows: + +- [Common Configuration](#conf) +- [Configuration Override](#override) +- [JVM Configuration Setting](#jvm) +- [Kylin Warm Start after Config Parameters Modified](#update) +- [Recommended Configurations for Production](#min_prod) +- [Spark-related Configuration](#spark) +- [Spark Context Canary Configuration](#spark_canary) + + + +### <span id="conf">Common Configuration</span> + +The file **kylin.properties** occupies some of the most important configurations in Kylin. This section will give detailed explanations of some common properties. + +| Properties | Description | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| server.port | This parameter specifies the port used by the Kylin service. The default is `7070`. | +| server.address | This parameter specifies the address used by the Kylin service. The default is `0.0.0.0`. | +| kylin.env.ip-address | When the network address of the node where the Kylin service is located has the ipv6 format, you can specify the ipv4 format through this configuration item. The default is `0.0.0.0` | +| kylin.env.hdfs-working-dir | Working path of Kylin instance on HDFS is specified by this property. The default value is `/kylin` on HDFS, with table name in metadata path as the sub-directory. For example, suppose the metadata path is `kylin_metadata@jdbc`, the HDFS default path should be `/kylin/kylin_metadata`. Please make sure the user running Kylin instance has read/write permissions on that directory. | +| kylin.env.zookeeper-connect-string | This parameter specifies the address of ZooKeeper. There is no default value. **This parameter must be manually configured before starting Kylin instance**, otherwise Kylin will not start. | +| kylin.metadata.url | Kylin metadata path is specified by this property. The default value is `kylin_metadata` table in PostgreSQL while users can customize it to store metadata into any other table. When deploying multiple Kylin instances on a cluster, it's necessary to specify a unique path for each of them to guarantee the isolation among them. For example, the value of this property for Production instance could be `kylin_metadata_prod`, whi [...] +| kylin.metadata.ops-cron | This parameter specifies the timing task cron expression for timed backup metadata and garbage cleanup. The default value is `0 0 0 * * *`. | +| kylin.metadata.audit-log.max-size | This parameter specifies the maximum number of rows in the audit-log. The default value is `500000`. | +| kylin.metadata.compress.enabled | This parameter specifies whether to compress the contents of metadata and audit log. The default value is `true`. | +| kylin.server.mode | There are three modes in Kylin, `all` , `query` and `job`, and you can change it by modifying the property. The default value is `all`. For `query` mode, it can only serves queries. For`job` mode, it can run building jobs and execute metadata operations and cannot serve queries. `all` mode can handle both of them. | +| kylin.web.timezone | Time zone used for Kylin Rest service is specified by this property. The default value is the time zone of the local machine's system. You can change it according to the requirement of your application. For more details, please refer to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones with the `TZ database name` column. | +| kylin.web.export-allow-admin | Whether to allow Admin user to export query results to a CSV file, the default is true. | +| kylin.web.export-allow-other | Whether to allow non-Admin user to export query results to a CSV file, the default is true. | +| kylin.web.stack-trace.enabled | The error prompts whether the popup window displays details. The default value is false. Introduced in: 4.1.1 | +| kylin.env | The usage of the Kylin instance is specified by this property. Optional values include `DEV`, `PROD` and `QA`, among them `PROD` is the default one. In `DEV` mode some developer functions are enabled. | +| kylin.circuit-breaker.threshold.project | The maximum number of projects allowed to be created, the default value is `100` | +| kylin.circuit-breaker.threshold.model | The maximum number of models allowed to be created in a single project, the default value is `100` | +| kylin.query.force-limit | Some BI tools always send query like `select * from fact_table`, but the process may stuck if the table size is extremely large. `LIMIT` clause helps in this case, and setting the value of this property to a positive integer make Kylin append `LIMIT` clause if there's no one. For instance the value is `1000`, query `select * from fact_table` will be transformed to `select * from fact_table limit 1000`. This configuration ca [...] +| kylin.query.max-result-rows | This property specifies the maximum number of rows that a query can return. This property applies on all ways of executing queries, including Web UI, Asynchronous Query, JDBC Driver and ODBC Driver. This configuration can be overridden at **project** level. For this property to take effect, it needs to be a positive integer less than or equal to 2147483647. The default value is 0, meaning no limit on the result. <br />Below [...] +| kylin.query.init-sparder-async | The default value is `true`,which means that sparder will start asynchronously. Therefore, the Kylin web service and the query spark service will start separately; If set to `false`, the Kylin web service will be only available after the sparder service has been started. | +| kylin.circuit-breaker.threshold.query-result-row-count | This parameter is the maximum number of rows in the result set returned by the SQL query. The default is `2000000`. If the maximum number of rows is exceeded, the backend will throw an exception | +| kylin.query.timeout-seconds | Query timeout, in seconds. The default value is `300` seconds. If the query execution time exceeds 300 seconds, an error will be returned: `Query timeout after: 300s`. The minimum value is `30` seconds, and the configured value less than `30` seconds also takes effect according to `30` seconds. | +| kylin.query.convert-create-table-to-with | Some BI software will send Create Table statement to create a permanent or temporary table in the data source. If this setting is set to `true`, the create table statement in the query will be converted to a with statement, when a later query utilizes the table that the query created in the previous step, the create table statement will be converted into a subquery, which can hit on an index if there is one to serve the query. | +| kylin.query.replace-count-column-with-count-star | The default value is `false` , which means that COUNT(column) measure will hit a model only after it has been set up in the model. If COUNT(column) measure is called in SQL while not having been set up in the model, this parameter value can be set to `true`, then the system will use COUNT(constant) measure to replace COUNT(column) measure approximately. COUNT(constant) measure takes all Null value into calculation. | +| kylin.query.match-partial-inner-join-model | The default value is `false`, which means that the multi-table inner join model does not support the SQL which matches the inner join part partially. For example: Assume there are three tables A, B, and C . By default, the SQL `A inner join B` can only be answered by the model of A inner join B or the model of A inner join B left join C. The model of A inner join B inner join C cannot answer it. If this parameter is set to [...] +| kylin.query.match-partial-non-equi-join-model | default to `false` ,currently if the model contains non-equi joins, the query can be matched with the model only if it contains all the non-equi joins defined in the model. If the config is set to `true`, the query is allowed to contain only part of the non-equi joins. e.g. model: A left join B non-equi left join C. When the config is set to `false`, only query with the complete join relations of the model can be matched wi [...] +| kylin.query.use-tableindex-answer-non-raw-query | The default value is `false`, which means that the aggregate query can only be answered with the aggregate index. If the parameter is set to `true`, the system allows the corresponding table index to be used to answer the aggregate query. | +| kylin.query.layout.prefer-aggindex | The default value is `true`, which means that when index comparison selections are made for aggregate indexes and detail indexes, aggregate indexes are preferred. | +| kylin.storage.columnar.spark-conf.spark.yarn.queue | This property specifies the yarn queue which is used by spark query cluster. | +| kylin.storage.columnar.spark-conf.spark.master | Spark deployment is normally divided into **Spark on YARN**, **Spark on Mesos**, and **standalone**. We usually use Spark on YARN as default. This property enables Kylin to use standalone deployment, which could submit jobs to the specific spark-master-url. | +| kylin.job.retry | This property specifies the auto retry times for error jobs. The default value is 0, which means job will not auto retry when it's in error. Set a value greater than 0 to enable this property and it applies on every step within a job and it will be reset if that step is finished. | +| kylin.job.retry-interval | This property specifies the time interval to retry an error job and the default value is `30000` ms. This property is valid only when the job retry property is set to be 1 or above. | +| kylin.job.max-concurrent-jobs | Kylin has a default concurrency limit of **20** for jobs in a single project. If there are already too many running jobs reaching the limit, the new submitted job will be added into job queue. Once one running job finishes, jobs in the queue will be scheduled using FIFO mechanism. | +| kylin.scheduler.schedule-job-timeout-minute | Job execution timeout period. The default is `0` minute. This property is valid only when the it is set to be 1 or above. When the job execution exceeds the timeout period, it will change to the Error status. | +| kylin.garbage.storage.cuboid-layout-survival-time-threshold | This property specifies the threshold of invalid files on HDFS. When executing the command line tool to clean up the garbage, invalid files on HDFS that exceed this threshold will be cleaned up. The default value is `7d`, which means 7 days. Invalid files on HDFS include expired indexes, expired snapshots, expired dictionaries, etc. At the same time, indexes with lower cost performance will be cleaned up according to the in [...] +| kylin.garbage.storage.executable-survival-time-threshold | This property specifies the threshold for the expired job. The metadata of jobs that have exceeded this threshold and have been completed will be cleaned up. The default is `30d`, which means 30 days. | +| kylin.storage.quota-in-giga-bytes | This property specifies the storage quota for each project. The default is `10240`, in gigabytes. | +| kylin.influxdb.address | This property specifies the address of InfluxDB. The default is `localhost:8086`. | +| kylin.influxdb.username | This property specifies the username of InfluxDB. The defaul is `root`. | +| kylin.influxdb.password | This property specifiess the password of InfluxDB. The default is `root`. | +| kylin.metrics.influx-rpc-service-bind-address | If the property `# bind-address = "127.0.0.1:8088"` was modified in the influxdb's configuration file, the value of this should be modified at the same time. This parameter will influence whether the diagnostic package can contain system metrics. | +| kylin.security.user-password-encoder | Encryption algorithm of user password. The default is the BCrypt algorithm. If you want to use the Pbkdf2 algorithm, configure the value to <br />org.springframework.security.crypto.<br />password.Pbkdf2PasswordEncoder. <br />Note: Please do not change this configuration item arbitrarily, otherwise the user may not be able to log in | +| kylin.web.session.secure-random-create-enabled | The default is false. Use UUID to generate sessionId, and use JDK's SecureRandom random number to enable sessionId after MD5 encryption, please use the upgrade session table tool to upgrade the session table first otherwise the user will report an error when logging in. | +| kylin.web.session.jdbc-encode-enabled | The default is false, sessionId is saved directly into the database without encryption, and sessionId will be encrypted and saved to the database after opening. Note: If the encryption function is configured, Please use the upgrade session table tool to upgrade the session table first, otherwise the user will report an error when logging in. | +| kylin.server.cors.allow-all | allow all corss origin requests(CORS). `true` for allowing any CORS request, `false` for refusing all CORS requests. Default to `false`. | +| kylin.server.cors.allowed-origin | Specify a whitelist that allows cross-domain, default all domain names (*), use commas (,) to separate multiple domain names. This parameter is valid when `kylin.server.cors.allow-all`=true | +| kylin.storage.columnar.spark-conf.spark.driver.host | Configure the IP of the node where the Kylin is located | +| kylin.engine.spark-conf.spark.driver.host | Configure the IP of the node where the Kylin is located | +| kylin.engine.sanity-check-enabled | Configure Kylin whether to open Sanity Check during indexes building. The default value is `true` | +| kylin.job.finished-notifier-url | When the building job is completed, the job status information will be sent to the url via HTTP request | +| kylin.diag.obf.level | The desensitization level of the diagnostic package. `RAW` means no desensitization, `OBF` means desensitization. Configuring `OBF` will desensitize sensitive information such as usernames and passwords in the `kylin.properties` file (please refer to the [Diagnosis Kit Tool](../operations/cli_tool/diagnosis.md) chapter), The default value is `OBF`. | +| kylin.diag.task-timeout | The subtask timeout time for the diagnostic package, whose default value is 3 minutes | +| kylin.diag.task-timeout-black-list | Diagnostic package subtask timeout blacklist (the values are separated by commas). The subtasks in the blacklist will be skipped by the timeout settings and will run until it finished. The default value is `METADATA`, `LOG` <br />The optional value is as below: <br />METADATA, AUDIT_LOG, CLIENT, JSTACK, CONF, HADOOP_CONF, BIN, HADOOP_ENV, CATALOG_INFO, SYSTEM_METRICS, MONITOR_METRICS, SPARK_LOGS, SPARDER_HISTORY, KG_LOGS, L [...] +| kylin.query.queryhistory.max-size | The total number of records in the query history of all projects, the default is 10000000 | +| kylin.query.queryhistory.project-max-size | The number of records in the query history retained of a single project, the default is 1000000 | +| kylin.query.queryhistory.survival-time-threshold | The number of records in the query history retention time of all items, the default is 30d, which means 30 days, and other units are also supported: millisecond ms, microsecond us, minute m or min, hour h | +| kylin.query.engine.spark-scheduler-mode | The scheduling strategy of query engine whose default is FAIR (Fair scheduler). The optional value is SJF (Smallest Job First scheduler). Other value is illegal and FAIR strategy will be used as the default strategy. | +| kylin.query.realization.chooser.thread-core-num | The number of core threads of the model matching thread pool in the query engine, the default is 5. It should be noted that when the number of core threads is set to less than 0, this thread pool will be unavailable, which will cause the entire query engine to be unavailable | +| kylin.query.realization.chooser.thread-max-num | The maximum number of threads in the model matching thread pool in the query engine, the default is 50. It should be noted that when the maximum number of threads is set to be less than or equal to 0 or less than the number of core threads, this thread pool will be unavailable, which will cause the entire query engine to be unavailable | +| kylin.query.memory-limit-during-collect-mb | Limit the memory usage when getting query result in Kylin,the unit is megabytes, defaults to 5400mb | +| kylin.query.auto-model-view-enabled | Automatically generate views for model. When the config is on, a view will be generated for each model and user can query on that view. The view will be named with {project_name}.{model_name} and contains all the tables defined in the model and all the columns referenced by the dimension and measure of the table. | +| kylin.streaming.job.max-concurrent-jobs | Only for Kylin Realtime. Max tasks numbers used to ingesting realtime data and merging segments. | +| kylin.streaming.kafka-conf.maxOffsetsPerTrigger | Only for Kylin Realtime. Max records numbers of ingesting data at one time. -1 stands for no limitation. | +| kylin.streaming.job-status-watch-enabled | Only for Kylin Realtime. Whether enabling tasks monitor, "true" stands for enabled and "false" stands for disabled. | +| kylin.streaming.job-retry-enabled | Only for Kylin Realtime. Whether retrying after tasks failed, "true" stands for enabled and "false" stands for disabled. | +| kylin.streaming.job-retry-interval | Only for Kylin Realtime. How many minutes the tasks will retry after failed. | +| kylin.streaming.job-retry-max-interval | Only for Kylin Realtime. How many minutes the interval is when the tasks retry. | +| kylin.engine.streaming-metrics-enabled | Only for Kylin Realtime. Whether enabling tasks metrics monitor, "true" stands for enabled and "false" stands for disabled. | +| kylin.engine.streaming-segment-merge-interval | Only for Kylin Realtime. How many seconds the interval is when merging segments. | +| kylin.engine.streaming-segment-clean-interval | Only for Kylin Realtime. How many hours the time is before which the segments will be cleaned after being merged. | +| kylin.engine.streaming-segment-merge-ratio | Only for Kylin Realtime. The ratio, which the summary of the segments reach, will trigger merging segments. | +| kylin.streaming.jobstats.survival-time-threshold | Only for Kylin Realtime. How many days the realtime data statistics keeps. The default value is 7. | +| kylin.streaming.spark-conf.spark.yarn.queue | Only for Kylin Realtime. The name of the yarn queue which realtime tasks exclusively use. | +| kylin.streaming.spark-conf.spark.port.maxRetries | Only for Kylin Realtime. The number to retry when the port is occupied. | +| kylin.streaming.kafka.starting-offsets | Only for Kylin Realtime. The offset from where to consume Kafka message. The default value is 'earliest'. | +| kylin.storage.columnar.spark-conf.spark.sql.view-truncate-enabled | Allow spark view to lose precision when loading tables and queries, the default value is false | +| kylin.engine.spark-conf.spark.sql.view-truncate-enabled=true | Allow spark view to lose precision during construction, the default value is false | +| kylin.source.hive.databases | Configure the database list loaded by the data source. There is no default value. Both the system level and the project level can be configured. The priority of the project level is greater than the system level. | +| kylin.query.spark-job-trace-enabled | Enable the job tracking log of spark. Record additional information about spark: Submission waiting time, execution waiting time, execution time and result acquisition time are displayed in the timeline of history. | +| kylin.query.spark-job-trace-timeout-ms | Only for the job tracking log of spark. The longest waiting time of query history. If it exceeds, the job tracking log of spark will not be recorded. | +| kylin.query.spark-job-trace-cache-max | Only for the job tracking log of spark. The maximum number of job tracking log caches in spark. The elimination strategy is LRU,TTL is kylin.query.spark-job-trace-timeout-ms + 20000 ms. | +| kylin.query.spark-job-trace-parallel-max | Only for the job tracking log of spark. Spark's job tracks the concurrency of log processing, "Additional information about spark" will be lost if the concurrency exceeds this limit. | +| kylin.query.replace-dynamic-params-enabled | Whether to enable dynamic parameter binding for JDBC query, the default value is false, which means it is not enabled. For more, please refer to [Kylin JDBC Driver](#TODO) | +| kylin.second-storage.route-when-ch-fail | When tiered storage is enabled, whether the query matching the base table index is answered only by tiered storage. The default value is `0`, which means that when tiered storage cannot answer, it is answered by the base table index on HDFS, configured as `1` indicates that when the tiered storage cannot answer the query, the query will be pushdown, configured as `2`, indicates that the query fails when the tiered storage c [...] +| kylin.second-storage.query-pushdown-limit | When query result sets are large, the performance of query using tiered storage may degrade. This parameter indicates whether to use the limit statement to limit whether the detailed query uses tiered storage, the default value is `0`, which means it is not enabled. If you need to enable it, you can configure a specific value. For example, if it is configured as `100000`, it means that the detailed query with the value afte [...] + +### <span id="override">Configuration Override</span> + +There are many configurations avaiable in the file `kylin.properties`. If you need to modify several of them, you can create a new file named `kylin.properties.override` in the `$KYLIN_HOME/conf` directory. Then you can put the customized config items into `kylin.properties.override`, +the items in this file will override the default value in `kylin.properties` at runtime. +It is easy to upgrade. In the system upgrade, put the `kylin.properties.override` together with new version `kylin.properties`. + + + +### <span id="jvm">JVM Configuration Setting</span> + +In `$KYLIN_HOME/conf/setenv.sh.template`, the sample setting for `KYLIN_JVM_SETTINGS` environment variable is given. The default setting uses relatively little memory. You can always adjust it according to your own environment. The default configuration is: + +```properties +export KYLIN_JVM_SETTINGS="-server -Xms1g -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=16m -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xloggc:$KYLIN_HOME/logs/kylin.gc.$$ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${KYLIN_HOME}/logs" +``` + +If you need to change it, you need to make a copy, name it `setenv.sh` and put it in the` $KYLIN_HOME/conf/ `folder, then modify the configuration in it. The parameter "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${KYLIN_HOME}/logs" would generate logs when OutOfMemory happened. The default log file path is ${KYLIN_HOME}/logs, you can modify it if needed. + +```bash +export JAVA_VM_XMS=1g #The initial memory of the JVM when kylin starts. +export JAVA_VM_XMX=8g #The maximum memory of the JVM when kylin starts. +export JAVA_VM_TOOL_XMS=1g #The initial memory of the JVM when the tool class is started. +export JAVA_VM_TOOL_XMX=8g #The maximum memory of the JVM when the tool class is started. +``` + +If the value of JAVA_VM_TOOL_XMS is not set, then the value of JAVA_VM_TOOL_XMS will use the value of JAVA_VM_XMS. Similarly, when the value of JAVA_VM_TOOL_XMX is not set, JAVA_VM_TOOL_XMX will use the value of JAVA_VM_XMX. + +Note: 1. Some special tool classes, such as guardian.sh, check-2100-hive-acl.sh, get-properties.sh, are not affected by the JAVA_VM_TOOL_XMS, JAVA_VM_TOOL_XMX configuration. + 2. The two configuration items JAVA_VM_TOOL_XMS and JAVA_VM_TOOL_XMX have been added and take effect. You need to configure them manually when upgrading the old version. + +### <span id="update">Kylin Warm Start after Config Parameters Modified</span> + +The parameters defined in `kylin.properties` (global) will be loaded by default when Kylin is started. Once modified, restart Kylin for the changes to take effect. + + + +### <span id="min_prod">Recommended Configurations for Production</span> + +Under `$KYLIN_HOME/conf/`, there are two sets of configurations ready for use: `production` and `minimal`. The former is the default configuration, which is recommended for production environment. The latter uses minimal resource, and is suitable for sandbox or other single node with limited resources. You can switch to `minimal` configurations if your environment has only limited resource. To switch to `minimal`, please uncomment the following configuration items in `$KYLIN_HOME/conf/ky [...] + +```properties +# KAP provides two configuration profiles: minimal and production(by default). +# To switch to minimal: uncomment the properties +# kylin.storage.columnar.spark-conf.spark.driver.memory=512m +# kylin.storage.columnar.spark-conf.spark.executor.memory=512m +# kylin.storage.columnar.spark-conf.spark.executor.memoryOverhead=512m +# kylin.storage.columnar.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug -Dkylin.hdfs.working.dir=${kylin.env.hdfs-working-dir} -Dkylin.metadata.identifier=${kylin.metadata.url.identifier} - Dkylin.spark.category=sparder -Dkylin.spark.project=${job.project} -XX:MaxDirectMemorySize=512M +# kylin.storage.columnar.spark-conf.spark.yarn.am.memory=512m +# kylin.storage.columnar.spark-conf.spark.executor.cores=1 +# kylin.storage.columnar.spark-conf.spark.executor.instances=1 +``` + + +### <span id="spark"> Spark-related Configuration</span> + +For a detailed explanation of the spark configuration, please refer to the official documentation, [Spark Configuration](https://spark.apache.org/docs/latest/configuration.html). The following are some configurations related to the query and build tasks in Kylin. + +The parameters start with ```kylin.storage.columnar.spark-conf```, the subsequent part is the spark parameter used by the query task. The default parameters in the recommended configuration file `kylin.properties` are as follows: + +| Properties Name | Min | Prod | +| ---------------------------------------------------------------- | ------- | ------ | +| kylin.storage.columnar.spark-conf.spark.driver.memory | 512m | 4096m | +| kylin.storage.columnar.spark-conf.spark.executor.memory | 512m | 12288m | +| kylin.storage.columnar.spark-conf.spark.executor.memoryOverhead | 512m | 3072m | +| kylin.storage.columnar.spark-conf.spark.yarn.am.memory | 512m | 1024m | +| kylin.storage.columnar.spark-conf.spark.executor.cores | 1 | 5 | +| kylin.storage.columnar.spark-conf.spark.executor.instances | 1 | 4 | + +Kylin provides customized Spark configurations. The configurations will have an affect on how Spark Execution Plan is generated. The default parameters in the recommended configuration file `kylin.properties` are as follows: + +| Properties Name | Default | Description | +| ---------------------------------------------------------------- | ------- | ------ | +| kylin.storage.columnar.spark-conf.spark.sql.cartesianPartitionNumThreshold | -1 | Threshold for Cartesian Partition number in Spark Execution Plan. Query will be terminated if Cartesian Partition number reaches or exceeds the threshold. If this value is set to empty or negative, the threshold will be set to spark.executor.cores * spark.executor.instances * 100. | + +The parameters start with ```kylin.engine.spark-conf```, the subsequent part is the spark parameter used for the build task. The default parameters are not configured and they will be automatically adjusted and configured according to the cluster environment during the build task. If you configure these parameters in `kylin.properties`, Kylin will use the configuration in `kylin.properties` first. + +```properties +kylin.engine.spark-conf.spark.executor.instances +kylin.engine.spark-conf.spark.executor.cores +kylin.engine.spark-conf.spark.executor.memory +kylin.engine.spark-conf.spark.executor.memoryOverhead +kylin.engine.spark-conf.spark.sql.shuffle.partitions +kylin.engine.spark-conf.spark.driver.memory +kylin.engine.spark-conf.spark.driver.memoryOverhead +kylin.engine.spark-conf.spark.driver.cores +``` + +If you need to enable Spark RPC communication encryption, you can refer to the [Spark RPC Communication Encryption](spark_rpc_encryption.md) chapter. + + +### <span id="spark_canary">Spark Context Canary Configuration</span> +Sparder Canary is a component used to monitor the running status of Sparder. It will periodically check whether the current Sparder is running normally. If the running status is abnormal, such as Sparder unexpectedly exits or becomes unresponsive, Sparder Canary will create a new Sparder instance. + +| Properties | Description | +| ----------------------------------------------------------- | ------------------------------------------------------------ | +| kylin.canary.sqlcontext-enabled | Whether to enable the Sparder Canary function, the default is `false` | +| kylin.canary.sqlcontext-threshold-to-restart-spark | When the number of abnormal detection times exceeds this threshold, restart spark context | +| kylin.canary.sqlcontext-period-min | Check interval, default is `3` minutes | +| kylin.canary.sqlcontext-error-response-ms | Single detection timeout time, the default is `3` minutes, if single detection timeout means no response from spark context | +| kylin.canary.sqlcontext-type | The detection method, the default is `file`, this method confirms whether the spark context is still running normally by writing a parquet file to the directory configured by `kylin.env.hdfs-working-dir` . It can also be configured as `count` to confirm whether the spark context is running normally by performing an accumulation operation| | diff --git a/website/docs/configuration/hadoop_queue_config.md b/website/docs/configuration/hadoop_queue_config.md new file mode 100644 index 0000000000..b252ec2350 --- /dev/null +++ b/website/docs/configuration/hadoop_queue_config.md @@ -0,0 +1,53 @@ +--- +title: Hadoop Queue Configuration +language: en +sidebar_label: Hadoop Queue Configuration +pagination_label: Hadoop Queue Configuration +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - hadoop queue configuration +draft: true +last_update: + date: 08/16/2022 +--- + + +## Hadoop Queue Configuration + +In the case of a multiple-tenants environment, to securely share a large cluster, each tenant needs to have the allocated resources in a timely manner under the constraints of the allocated capacities. To achieve computing resources allocation and separation, each Kylin instance or project can be configured to use a different YARN queue. + + +###<span id="instance">Instance-level YARN Queue Setting</span> + +To achieve this, first create a new YARN capacity scheduler queue. By default, the job sent out by Kylin will go to the default YARN queue. + +In the screenshot below, a new YARN queue `learn_kylin` has been set up. + + + +Then you may modify `kylin.properties` to configure the YARN queue used in Kylin for building or querying (you will need to change the YOUR_QUEUE_NAME to your queue name). + +```shell +Building configuration: kylin.engine.spark-conf.spark.yarn.queue=YOUR_QUEUE_NAME +Querying configuration: kylin.storage.columnar.spark-conf.spark.yarn.queue=YOUR_QUEUE_NAME +``` + + + +In this example, the queue for querying has been changed to `learn_kylin` (as shown above). You can test this change by triggering a querying job. + +Now, go to YARN Resource Manager on the cluster. You will see this job has been submitted under queue `learn_kylin`. + + + + +Similarly, you may set up YARN queue for other Kylin instances to achieve computing resource separation. + + + +###<span id="project">Project-level YARN Queue Setting</span> + +The system admin user can set the YARN Application Queue of the project in **Setting -> Advanced Settings -> YARN Application Queue**, please refer to the [Project Settings](../operations/project-maintenance/project_settings.md) for more information. diff --git a/website/docs/configuration/https.md b/website/docs/configuration/https.md new file mode 100644 index 0000000000..dd41340f55 --- /dev/null +++ b/website/docs/configuration/https.md @@ -0,0 +1,74 @@ +--- +title: HTTPS Configuration +language: en +sidebar_label: HTTPS Configuration +pagination_label: HTTPS Configuration +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - https configuration +draft: false +last_update: + date: 08/16/2022 +--- + +Kylin 5.x provides a HTTPS connection. It is disabled by default. If you need to enable it, please follow the steps below. + +### Use Default Certificate + +Kylin ships a HTTPS certificate. If you want to enable this function with the default certificate, you just need to add or modify the following properties in `$KYLIN_HOME/conf/kylin.properties`. + +```properties +# enable HTTPS connection +kylin.server.https.enable=true +# port number +kylin.server.https.port=7443 +``` + +The default port is `7443`, please check the port has not been taken by other processes. You can run the command below to check. If the port is in use, please use an available port number. + +``` +netstat -tlpn | grep 7443 +``` + +After modifying the above properties, please restart Kylin for the changes to take effect. Assuming you set the https port to 7443, the access url would be `https://localhost:7443/kylin/index.html`. + +**Note:** Because the certificate is generated automatically, you may see a browser notice about certificate installation when you access the url. Please ignore it. + +### User Other Certificates + +Kylin also supports third-party certificates, you just need to provide the certificate file and make the following changes in the `$KYLIN_HOME/conf/kylin.properties` file: + +```properties +# enable HTTPS connection +kylin.server.https.enable=true +# port number +kylin.server.https.port=7443 +# ormat of keystore, Tomcat 8 supports JKS, PKCS11 or PKCS12 format +kylin.server.https.keystore-type=JKS +# location of your certificate file +kylin.server.https.keystore-file=${KYLIN_HOME}/server/.keystore +# password +kylin.server.https.keystore-password=changeit +# alias name for keystore entry, which is optional. Please skip it if you don't need. +kylin.server.https.key-alias=tomcat +``` + +### Encrypt kylin.server.https.keystore-password +If you need to encrypt `kylin.server.https.keystore-password`, you can do it like this: + +i.run following commands in `${KYLIN_HOME}`, it will print encrypted password +```shell +./bin/kylin.sh io.kyligence.kap.tool.general.CryptTool -e AES -s <password> +``` + +ii.config `kylin.server.https.keystore-password` like this +```properties +kylin.server.https.keystore-password=ENC('${encrypted_password}') +``` + +After modifying the properties above, please restart Kylin for the changes to take effect. Assuming you set the https port to 7443, the access url would be `https://localhost:7443/kylin/index.html`. + +> **Note**: If you are not using the default SSL certificate and put your certificate under `$KYLIN_HOME`. Please backup your certificate before upgrading your instance, and specify the file path in the new Kylin configuration file. We recommend putting the certificate under an independent path. diff --git a/website/docs/configuration/images/hadoop_queue/1.png b/website/docs/configuration/images/hadoop_queue/1.png new file mode 100644 index 0000000000..96562495aa Binary files /dev/null and b/website/docs/configuration/images/hadoop_queue/1.png differ diff --git a/website/docs/configuration/images/hadoop_queue/2.png b/website/docs/configuration/images/hadoop_queue/2.png new file mode 100644 index 0000000000..42dad34da8 Binary files /dev/null and b/website/docs/configuration/images/hadoop_queue/2.png differ diff --git a/website/docs/configuration/images/hadoop_queue/3.png b/website/docs/configuration/images/hadoop_queue/3.png new file mode 100644 index 0000000000..a63b446fb2 Binary files /dev/null and b/website/docs/configuration/images/hadoop_queue/3.png differ diff --git a/website/docs/configuration/images/spark_executor_max.jpg b/website/docs/configuration/images/spark_executor_max.jpg new file mode 100644 index 0000000000..96adbf72f3 Binary files /dev/null and b/website/docs/configuration/images/spark_executor_max.jpg differ diff --git a/website/docs/configuration/images/spark_executor_min.jpg b/website/docs/configuration/images/spark_executor_min.jpg new file mode 100644 index 0000000000..4544a426f0 Binary files /dev/null and b/website/docs/configuration/images/spark_executor_min.jpg differ diff --git a/website/docs/configuration/images/spark_executor_original.jpg b/website/docs/configuration/images/spark_executor_original.jpg new file mode 100644 index 0000000000..5b0e783873 Binary files /dev/null and b/website/docs/configuration/images/spark_executor_original.jpg differ diff --git a/website/docs/configuration/intro.md b/website/docs/configuration/intro.md index 0cca415b79..ff4769bade 100644 --- a/website/docs/configuration/intro.md +++ b/website/docs/configuration/intro.md @@ -1,13 +1,34 @@ --- -sidebar_position: 1 +title: System Configuration +language: en +sidebar_label: System Configuration +pagination_label: System Configuration +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - system configuration +draft: false +last_update: + date: 08/16/2022 --- -# Tutorial Intro +After deploying Kylin on your cluster, configure Kylin so that it can interact with Apache Hadoop and Apache Hive. You can also optimize the performance of Kylin by configuring to your own environment. -Let's discover ** Kylin 5.0 in than 15 minutes **. +This chapter introduces some configurations for Kylin. -## Basic Configuration +### Kylin Configuration File List -| asda | asdas| -|--|--| -|sdasda|dasda| +| Component | File | Description | +| -------------------- | --------------------------- | ------------------------------------------------------------ | +| Kylin | conf/kylin.properties | This is the global configuration file, with all configuration properties about Kylin in it. Details will be discussed in the subsequent chapter [Basic Configuration](configuration.md). | +| Hadoop | hadoop_conf/core-site.xml | Global configuration file used by Hadoop, which defines system-level parameters such as HDFS URLs and Hadoop temporary directories, etc. | +| Hadoop | hadoop_conf/hdfs-site.xml | HDFS configuration file, which defines HDFS parameters such as the storage location of NameNode and DataNode and the number of file copies, etc. | +| Hadoop | hadoop_conf/yarn-site.xml | Yarn configuration file,which defines Hadoop cluster resource management system parameters, such as ResourceManager, NodeManager communication port and web monitoring port, etc. | +| Hadoop | hadoop_conf/mapred-site.xml | Map Reduce configuration file used in Hadoop,which defines the default number of reduce tasks, the default upper and lower limits of the memory that the task can use, etc. | +| Hive | hadoop_conf/hive-site.xml | Hive configuration file, which defines Hive parameters such as hive data storage directory and database address, etc. | + +>Note: +> +>+ Unless otherwise specified, the configuration file `kylin.properties` mentioned in this manual refers to the corresponding configuration file in the list. diff --git a/website/docs/configuration/log_rotate.md b/website/docs/configuration/log_rotate.md new file mode 100644 index 0000000000..cd163c2bd8 --- /dev/null +++ b/website/docs/configuration/log_rotate.md @@ -0,0 +1,36 @@ +--- +title: Log Rotate Configuration +language: en +sidebar_label: Log Rotate Configuration +pagination_label: Log Rotate Configuration +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - log rotate configuration +draft: false +last_update: + date: 08/16/2022 +--- + +The three log files, `shell.stderr`, `shell.stdout`, and `kylin.out` under the log directory `$KYLIN_HOME/logs/` of Kylin, trigger log rolling checks regularly by default. + +> **Caution:** Any change of configurations below requires a restart to take effect. + +| Properties | Descript | Default | Options | +|------------------------------------------| --------------------------------|----------------------|---------| +| kylin.env.max-keep-log-file-number | Maximum number of files to keep for log rotate | 10 | | +| kylin.env.max-keep-log-file-threshold-mb | Log files are rotated when they grow bigger than this | 256,whose unit is MB | | +| kylin.env.log-rotate-check-cron | The `crontab` time configuration | 33 * * * * | | +| kylin.env.log-rotate-enabled | Whether to enable `crontab` to check log rotate | true | false | + +### Default Regularly Rotate strategy + +To use the default regularly rotate strategy, you need to set the parameter `kylin.env.log-rotate-enabled=true` (default), and also need to ensure that users running Kylin can use the `logrotate` and `crontab` commands to add a scheduled task. + +When using the rotate strategy, Kylin will add or update `crontab` tasks according to the `kylin.env.log-rotate-check-cron` parameter on startup or restart, and remove the added `crontab` tasks on exit. + +### Known Limitations +- If the default regularly rotate policy conditions are not met, Kylin will only trigger the log rolling check at startup. Every time the `kylin.sh start` command is executed, according to the parameter `kylin.env.max-keep-log-file-number` and `kylin.env.max-keep-log-file-threshold-mb` for log rolling. If Kylin runs for a long time, the log file may be too large. +- When using `crontab` to control log rotation, the rolling operation is implemented by the `logrotate` command. If the log file is too large, the log may be lost during the rotation. diff --git a/website/docs/configuration/query_cache.md b/website/docs/configuration/query_cache.md new file mode 100644 index 0000000000..27e09396f1 --- /dev/null +++ b/website/docs/configuration/query_cache.md @@ -0,0 +1,72 @@ +--- +title: Query Cache Settings +language: en +sidebar_label: Query Cache Settings +pagination_label: Query Cache Settings +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - query cache settings +draft: false +last_update: + date: 08/16/2022 +--- + +By default, Kylin enables query cache in each process to improve query performance. + +> **Note**: In order to ensure data consistency, query cache is not available in pushdown. + + +###Use Default Cache + +Kylin enables query cache by default at each node/process level. The configuration details are described below. You can change them in `$KYLIN_HOME/conf/kylin.properties` under Kylin installation directory. + +> **Caution:** Must restart for any configurations to take effect. + +| Properties | Description | Default | Options | +| ------------------------- | ------------------------------------------------------------ | ------- | ------- | +| kylin.query.cache-enabled | Whether to enable query cache. When this property is enabled, the following properties take effect. | true | false | + + +### Query Cache Criteria +Kylin doesn't cache the query result of all SQL queries by default (because the memory resource might be limited). It only caches slow queries and the result size is appropriate. The criterion are configured by the following parameters. +The query that satisfies any one of the No.1, No.2, No.3 configuration and also satisfies No.4 configuration will be cached. + +|No | Properties | Description | Default | Default unit | +| ----| ---------------------------------- | ------------------------------------------------------------ | -------------- | ------- | +| 1|kylin.query.cache-threshold-duration | Queries whose duration is above this value | 2000 | millisecond | +| 2|kylin.query.cache-threshold-scan-count | Queries whose scan row count is above this value | 10240 | row | +| 3|kylin.query.cache-threshold-scan-bytes | Queries whose scan bytes is above this value | 1048576 | byte | +| 4|kylin.query.large-query-threshold | Queries whose result set size is below this value | 1000000 | cell | + +### Ehcache Cache Configuration + +By default, Kylin uses Ehcache as the query cache. You can configure Ehcache to control the query cache size and policy. You can replace the default query cache configuration by modifying the following configuration item. For more Ehcache configuration items, please refer to the official website [ehcache documentation](https://www.ehcache.org/generated/2.9.0/html/ehc-all/#page/Ehcache_Documentation_Set%2Fehcache_all.1.017.html%23). + +| Properties | Description | Default | +| ----- | ---- | ----- | +| kylin.cache.config | The path to ehcache.xml. To replace the default query cache configuration file, you can create a new file `xml`, for exemple `ekcache2.xml`, in the directory `${KYLIN_HOME}/conf/`, and modify the value of this configuration item: `file://${KYLIN_HOME}/conf/ehcache2.xml` | classpath:ehcache.xml | + + +### Redis Cache Configuration + +The default query cache cannot be shared among different nodes or processes because it is process level. Because of this, when subsequent and same queries are routed to different Kylin nodes, the cache of the first query result cannot be used in cluster deployment mode. Therefore, you can configure Redis cluster as distributed cache, which can be shared across all Kylin nodes. The detail configurations are described as below: +(Redis 5.0 or 5.0.5 is recommended.) + +| Properties | Description | Default | Options | +| ---------------------------------- | ------------------------------------------------------------ | -------------- | ------- | +| kylin.cache.redis.enabled | Whether to enable query cache by using Redis cluster. | false | true | +| kylin.cache.redis.cluster-enabled | Whether to enable Redis cluster mode. | false | true | +| kylin.cache.redis.hosts | Redis host. If you need to connect to a Redis cluster, please use comma to split the hosts, such as, kylin.cache.redis.hosts=localhost:6379,localhost:6380 | localhost:6379 | | +| kylin.cache.redis.expire-time-unit | Time unit for cache period. EX means seconds and PX means milliseconds. | EX | PX | +| kylin.cache.redis.expire-time | Valid cache period. | 86400 | | +| kylin.cache.redis.reconnection.enabled | Whether to enable redis reconnection when cache degrades to ehcache | true | false | +| kylin.cache.redis.reconnection.interval | Automatic reconnection interval, in minutes | 60 | | +| kylin.cache.redis.password | Redis password | | | + +#### Limitation +Due to metadata inconsistency between Query nodes and All/Job nodes, the redis cache swith `kylin.cache.redis.enabled=true` should be configured along with `kylin.server.store-type=jdbc`. + +> **Caution:** Redis passwords can be encrypted, please refer to: [Use MySQL as Metastore](../deployment/rdbms_metastore/mysql/mysql_metastore.md) diff --git a/website/docs/configuration/spark_dynamic_allocation.md b/website/docs/configuration/spark_dynamic_allocation.md new file mode 100644 index 0000000000..c1f4d2c2eb --- /dev/null +++ b/website/docs/configuration/spark_dynamic_allocation.md @@ -0,0 +1,93 @@ +--- +title: Spark Dynamic Allocation +language: en +sidebar_label: Spark Dynamic Allocation +pagination_label: Spark Dynamic Allocation +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - spark dynamic allocation +draft: false +last_update: + date: 08/16/2022 +--- + +In Spark, the resource unit is executor, something like containers in YARN. Under Spark on YARN, we use num-executors to specify the executor numbers. While executor-memory and executor-cores will limit the memory and virtual CPU cores each executor consumes. + +Take Kylin as sample, if user choose fixed resource allocation strategy and set num-executor to 3. Then each Kylin instance will always keep 4 YARN containers(1 for application master and 3 for executor). These 4 containers will be occupied until user log out. While we use Dynamic Resource Allocation, Spark will dynamically increase and reduce executors according to Kylin query engine workload which will dramatically save resource. + +Please refer to official document for details of Spark Dynamic Allocation: + +http://spark.apache.org/docs/2.4.1/job-scheduling.html#dynamic-resource-allocation + +### Spark Dynamic Allocation Config + +#### Overview +There are two parts we need to configure for Spark Dynamic Allocation: +1. Resource Management for cluster, it will be diversed due to different resource manager(YARN、Mesos、Standalone). +2. Configure file spark-default.conf, this one is irrespective of the environment. + +#### Resource Manager Configuration +##### CDH + +1. Log into Cloudera Manager, choose YARN configuration and find NodeManager Advanced Configuration Snippet(Safety Valve) for yarn-site.xml, config as following: + +``` +<property> + <name>yarn.nodemanager.aux-services</name> + <value>mapreduce_shuffle,spark_shuffle</value> +</property> +<property> + <name>yarn.nodemanager.aux-services.spark_shuffle.class</name> + <value>org.apache.spark.network.yarn.YarnShuffleService</value> +</property> +``` + +2. Copy the `$KYLIN_HOME/spark/yarn/spark-<version>-yarn-shuffle.jar` and put it under path /opt/lib/kylin/ of Hadoop node. + + Find NodeManager Environment Advanced Configuration Snippet (Safety Valve) in Cloudera Manager, Config: + + `YARN_USER_CLASSPATH=/opt/lib/kylin/*` + + Then yarn-shuffle.jar will be added into the startup classpath of Node Manager. + +3. Save the config and restart + In Cloudera Manager, choose actions --> deploy client configuration, save and restart all services. + +##### HDP +1. Log into Ambari management page, choose Yarn -> Configs -> Advanced, find following configurations via filter and update: + `yarn.nodemanager.aux-services.spark_shuffle.class=org.apache.spark.network.yarn.YarnShuffleService` + +2. Save the config and restart all services. + + +#### Kylin configuration +To enable the Spark Dynamic Allocaiton, we will need to add some configuration items in Spark config files. Since we can override spark configuraion in kylin.properties, we will add following configuration items in it: + +`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.enabled=true` + +`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.maxExecutors=5` + +`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.minExecutors=1` + +`kylin.storage.columnar.spark-conf.spark.shuffle.service.enabled=true` + +`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.initialExecutors=3` + +More configurations please refer to: +http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation + +### Spark Dynamic Allocation Verification +After above configurations, start Kylin and monitor current executor numbers in Spark Executor page. + + + +The executors will keep idle, so they will be reduced after a while until reaching the minimum number in configuration item. + + + +Submit multi-thread queries to Kylin via Restful API. The executors will be increase but never exceed the maximum number in configuration item. + + diff --git a/website/docs/configuration/spark_rpc_encryption.md b/website/docs/configuration/spark_rpc_encryption.md new file mode 100644 index 0000000000..0e696b829e --- /dev/null +++ b/website/docs/configuration/spark_rpc_encryption.md @@ -0,0 +1,43 @@ +--- +title: Spark RPC Communication Encryption +language: en +sidebar_label: Spark RPC Communication Encryption +pagination_label: Spark RPC Communication Encryption +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - spark rpc communication encryption +draft: false +last_update: + date: 08/16/2022 +--- + +Kylin supports the configuration of communication encryption between Spark nodes, which can improve the security of internal communication and prevent specific security attacks. + +For more details about Spark RPC communication encryption, please see [Spark Security](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation). + +This function is disabled by default. If you need to enable it, please refer to the following method for configuration. + +### Spark RPC Communication Encryption Configuration +1、1. Please refer to [Spark Security](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation) to ensure that RPC communication encryption is enabled in the Spark cluster. +2、Add the following configurations in `$KYLIN_HOME/conf/kylin.properties`, to To enable Kylin nodes and Spark cluster communication encryption +``` +### spark rpc encryption for build jobs +kylin.storage.columnar.spark-conf.spark.authenticate=true +kylin.storage.columnar.spark-conf.spark.authenticate.secret=kylin +kylin.storage.columnar.spark-conf.spark.network.crypto.enabled=true +kylin.storage.columnar.spark-conf.spark.network.crypto.keyLength=256 +kylin.storage.columnar.spark-conf.spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA256 + +### spark rpc encryption for query jobs +kylin.engine.spark-conf.spark.authenticate=true +kylin.engine.spark-conf.spark.authenticate.secret=kylin +kylin.engine.spark-conf.spark.network.crypto.enabled=true +kylin.engine.spark-conf.spark.network.crypto.keyLength=256 +kylin.engine.spark-conf.spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA256 +``` + +### Spark RPC Communication Encryption Cerification +After the configuration is complete, start Kylin and verify that the query and build tasks can be executed normally.