[kylin] 02/07: KYLIN-5221 add configuration docs

xxyu Tue, 16 Aug 2022 22:26:42 -0700

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch doc5.0
in repository https://gitbox.apache.org/repos/asf/kylin.git


commit 69ee8500bb27e6585f877772b2da0f6921e5e949
Author: Mukvin <boyboys...@163.com>
AuthorDate: Tue Aug 16 14:33:59 2022 +0800

    KYLIN-5221 add configuration docs
---
 website/docs/configuration/configuration.md        | 220 +++++++++++++++++++++
 website/docs/configuration/hadoop_queue_config.md  |  53 +++++
 website/docs/configuration/https.md                |  74 +++++++
 .../docs/configuration/images/hadoop_queue/1.png   | Bin 0 -> 126858 bytes
 .../docs/configuration/images/hadoop_queue/2.png   | Bin 0 -> 44273 bytes
 .../docs/configuration/images/hadoop_queue/3.png   | Bin 0 -> 183045 bytes
 .../configuration/images/spark_executor_max.jpg    | Bin 0 -> 71527 bytes
 .../configuration/images/spark_executor_min.jpg    | Bin 0 -> 35309 bytes
 .../images/spark_executor_original.jpg             | Bin 0 -> 53356 bytes
 website/docs/configuration/intro.md                |  35 +++-
 website/docs/configuration/log_rotate.md           |  36 ++++
 website/docs/configuration/query_cache.md          |  72 +++++++
 .../docs/configuration/spark_dynamic_allocation.md |  93 +++++++++
 website/docs/configuration/spark_rpc_encryption.md |  43 ++++
 14 files changed, 619 insertions(+), 7 deletions(-)

diff --git a/website/docs/configuration/configuration.md 
b/website/docs/configuration/configuration.md
new file mode 100644
index 0000000000..979e92303f
--- /dev/null
+++ b/website/docs/configuration/configuration.md
@@ -0,0 +1,220 @@
+---
+title: Basic Configuration
+language: en
+sidebar_label: Basic Configuration
+pagination_label: Basic Configuration
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - Basic Configuration
+draft: true
+last_update:
+    date: 08/16/2022
+---
+
+This chapter will introduce some common configurations, the main contents are 
as follows:
+
+- [Common Configuration](#conf)
+- [Configuration Override](#override)
+- [JVM Configuration Setting](#jvm)
+- [Kylin Warm Start after Config Parameters Modified](#update)
+- [Recommended Configurations for Production](#min_prod)
+- [Spark-related Configuration](#spark)
+- [Spark Context Canary Configuration](#spark_canary)
+
+
+
+### <span id="conf">Common Configuration</span>
+
+The file **kylin.properties** occupies some of the most important 
configurations in Kylin. This section will give detailed explanations of some 
common properties.  
+
+| Properties                                                   | Description   
                                               |
+| ------------------------------------------------------------ | 
------------------------------------------------------------ |
+| server.port                                                  | This 
parameter specifies the port used by the Kylin service. The default is `7070`. |
+| server.address                                               | This 
parameter specifies the address used by the Kylin service. The default is 
`0.0.0.0`. |
+| kylin.env.ip-address                                         | When the 
network address of the node where the Kylin service is located has the ipv6 
format, you can specify the ipv4 format through this configuration item. The 
default is `0.0.0.0` |
+| kylin.env.hdfs-working-dir                                   | Working path 
of Kylin instance on HDFS is specified by this property. The default value is 
`/kylin` on HDFS, with table name in metadata path as the sub-directory. For 
example, suppose the metadata path is `kylin_metadata@jdbc`, the HDFS default 
path should be `/kylin/kylin_metadata`. Please make sure the user running Kylin 
instance has read/write permissions on that directory. |
+| kylin.env.zookeeper-connect-string                           | This 
parameter specifies the address of ZooKeeper. There is no default value. **This 
parameter must be manually configured before starting Kylin instance**, 
otherwise Kylin will not start. |
+| kylin.metadata.url                                           | Kylin 
metadata path is specified by this property. The default value is 
`kylin_metadata` table in PostgreSQL while users can customize it to store 
metadata into any other table. When deploying multiple Kylin instances on a 
cluster, it's necessary to specify a unique path for each of them to guarantee 
the isolation among them. For example, the value of this property for 
Production instance could be `kylin_metadata_prod`, whi [...]
+| kylin.metadata.ops-cron                                      | This 
parameter specifies the timing task cron expression for timed backup metadata 
and garbage cleanup. The default value is `0 0 0 * * *`. |
+| kylin.metadata.audit-log.max-size                            | This 
parameter specifies the maximum number of rows in the audit-log. The default 
value is `500000`. |
+| kylin.metadata.compress.enabled                              | This 
parameter specifies whether to compress the contents of metadata and audit log. 
The default value is `true`. |
+| kylin.server.mode                                            | There are 
three modes in Kylin, `all` , `query` and `job`, and you can change it by 
modifying the property. The default value is `all`. For `query` mode, it can 
only serves queries. For`job` mode, it can run building jobs and execute 
metadata operations and cannot serve queries. `all` mode can handle both of 
them. |
+| kylin.web.timezone                                           | Time zone 
used for Kylin Rest service is specified by this property. The default value is 
the time zone of the local machine's system. You can change it according to the 
requirement of your application. For more details, please refer to 
https://en.wikipedia.org/wiki/List_of_tz_database_time_zones with the `TZ 
database name` column. |
+| kylin.web.export-allow-admin                                 | Whether to 
allow Admin user to export query results to a CSV file, the default is true. |
+| kylin.web.export-allow-other                                 | Whether to 
allow non-Admin user to export query results to a CSV file, the default is 
true. |
+| kylin.web.stack-trace.enabled                                | The error 
prompts whether the popup window displays details. The default value is false. 
Introduced in: 4.1.1 |
+| kylin.env                                                    | The usage of 
the Kylin instance is specified by this property. Optional values include 
`DEV`, `PROD` and `QA`, among them `PROD` is the default one. In `DEV` mode 
some developer functions are enabled. |
+| kylin.circuit-breaker.threshold.project                      | The maximum 
number of projects allowed to be created, the default value is `100` |
+| kylin.circuit-breaker.threshold.model                        | The maximum 
number of models allowed to be created in a single project, the default value 
is `100` |
+| kylin.query.force-limit                                      | Some BI tools 
always send query like `select * from fact_table`, but the process may stuck if 
the table size is extremely large. `LIMIT` clause helps in this case, and 
setting the value of this property to a positive integer make Kylin append 
`LIMIT` clause if there's no one. For instance the value is `1000`, query 
`select * from fact_table` will be transformed to `select * from fact_table 
limit 1000`. This configuration ca [...]
+| kylin.query.max-result-rows                                  | This property 
specifies the maximum number of rows that a query can return. This property 
applies on all ways of executing queries, including Web UI, Asynchronous Query, 
JDBC Driver and ODBC Driver. This configuration can be overridden at 
**project** level. For this property to take effect, it needs to be a positive 
integer less than or equal to 2147483647. The default value is 0, meaning no 
limit on the result. <br />Below [...]
+| kylin.query.init-sparder-async                               | The default 
value is `true`，which means that sparder will start asynchronously. Therefore, 
the Kylin web service and the query spark service will start separately; If set 
to `false`, the Kylin web service will be only available after the sparder 
service has been started. |
+| kylin.circuit-breaker.threshold.query-result-row-count       | This 
parameter is the maximum number of rows in the result set returned by the SQL 
query. The default is `2000000`. If the maximum number of rows is exceeded, the 
backend will throw an exception |
+| kylin.query.timeout-seconds                                  | Query 
timeout, in seconds. The default value is `300` seconds. If the query execution 
time exceeds 300 seconds, an error will be returned: `Query timeout after: 
300s`. The minimum value is `30` seconds, and the configured value less than 
`30` seconds also takes effect according to `30` seconds. |
+| kylin.query.convert-create-table-to-with                     | Some BI 
software will send Create Table statement to create a permanent or temporary 
table in the data source. If this setting is set to `true`, the create table 
statement in the query will be converted to a with statement, when a later 
query utilizes the table that the query created in the previous step, the 
create table statement will be converted into a subquery, which can hit on an 
index if there is one to serve the query. |
+| kylin.query.replace-count-column-with-count-star             | The default 
value is `false` , which means that COUNT(column) measure will hit a model only 
after it has been set up in the model. If COUNT(column) measure is called in 
SQL while not having been set up in the model, this parameter value can be set 
to `true`, then the system will use COUNT(constant) measure to replace 
COUNT(column) measure approximately. COUNT(constant) measure takes all Null 
value into calculation. |
+| kylin.query.match-partial-inner-join-model                   | The default 
value is `false`, which means that the multi-table inner join model does not 
support the SQL which matches the inner join part partially. For example: 
Assume there are three tables A, B, and C . By default, the SQL `A inner join 
B` can only be answered by the model of A inner join B or the model of A inner 
join B left join C. The model of A inner join B inner join C cannot answer it. 
If this parameter is set to  [...]
+| kylin.query.match-partial-non-equi-join-model                | default to 
`false` ，currently if the model contains non-equi joins, the query can be 
matched with the model only if it contains all the non-equi joins defined in 
the model. If the config is set to `true`, the query is allowed to contain only 
part of the non-equi joins. e.g. model: A left join B non-equi left join C. 
When the config is set to `false`, only query with the complete join relations 
of the model can be matched wi [...]
+| kylin.query.use-tableindex-answer-non-raw-query              | The default 
value is `false`, which means that the aggregate query can only be answered 
with the aggregate index. If the parameter is set to `true`, the system allows 
the corresponding table index to be used to answer the aggregate query. |
+| kylin.query.layout.prefer-aggindex                           | The default 
value is `true`, which means that when index comparison selections are made for 
aggregate indexes and detail indexes, aggregate indexes are preferred. |
+| kylin.storage.columnar.spark-conf.spark.yarn.queue           | This property 
specifies the yarn queue which is used by spark query cluster. |
+| kylin.storage.columnar.spark-conf.spark.master               | Spark 
deployment is normally divided into **Spark on YARN**, **Spark on Mesos**, and 
**standalone**. We usually use Spark on YARN as default. This property enables 
Kylin to use standalone deployment, which could submit jobs to the specific 
spark-master-url. |
+| kylin.job.retry                                              | This property 
specifies the auto retry times for error jobs. The default value is 0, which 
means job will not auto retry when it's in error. Set a value greater than 0 to 
enable this property and it applies on every step within a job and it will be 
reset if that step is finished. |
+| kylin.job.retry-interval                                     | This property 
specifies the time interval to retry an error job and the default value is 
`30000` ms. This property is valid only when the job retry property is set to 
be 1 or above. |
+| kylin.job.max-concurrent-jobs                                | Kylin has a 
default concurrency limit of **20** for jobs in a single project. If there are 
already too many running jobs reaching the limit, the new submitted job will be 
added into job queue. Once one running job finishes, jobs in the queue will be 
scheduled using FIFO mechanism. |
+| kylin.scheduler.schedule-job-timeout-minute                  | Job execution 
timeout period. The default is `0` minute. This property is valid only when the 
it is set to be 1 or above. When the job execution exceeds the timeout period, 
it will change to the Error status. |
+| kylin.garbage.storage.cuboid-layout-survival-time-threshold  | This property 
specifies the threshold of invalid files on HDFS. When executing the command 
line tool to clean up the garbage, invalid files on HDFS that exceed this 
threshold will be cleaned up. The default value is `7d`, which means 7 days. 
Invalid files on HDFS include expired indexes, expired snapshots, expired 
dictionaries, etc. At the same time, indexes with lower cost performance will 
be cleaned up according to the in [...]
+| kylin.garbage.storage.executable-survival-time-threshold     | This property 
specifies the threshold for the expired job. The metadata of jobs that have 
exceeded this threshold and have been completed will be cleaned up. The default 
is `30d`, which means 30 days. |
+| kylin.storage.quota-in-giga-bytes                            | This property 
specifies the storage quota for each project. The default is `10240`, in 
gigabytes. |
+| kylin.influxdb.address                                       | This property 
specifies the address of InfluxDB. The default is `localhost:8086`. |
+| kylin.influxdb.username                                      | This property 
specifies the username of InfluxDB. The defaul is `root`. |
+| kylin.influxdb.password                                      | This property 
specifiess the password of InfluxDB. The default is `root`. |
+| kylin.metrics.influx-rpc-service-bind-address                | If the 
property `# bind-address = "127.0.0.1:8088"` was modified in the influxdb's 
configuration file, the value of this should be modified at the same time. This 
parameter will influence whether the diagnostic package can contain system 
metrics. |
+| kylin.security.user-password-encoder                         | Encryption 
algorithm of user password. The default is the BCrypt algorithm. If you want to 
use the Pbkdf2 algorithm, configure the value to <br 
/>org.springframework.security.crypto.<br />password.Pbkdf2PasswordEncoder. <br 
/>Note: Please do not change this configuration item arbitrarily, otherwise the 
user may not be able to log in |
+| kylin.web.session.secure-random-create-enabled               | The default 
is false. Use UUID to generate sessionId, and use JDK's SecureRandom random 
number to enable sessionId after MD5 encryption, please use the upgrade session 
table tool to upgrade the session table first otherwise the user will report an 
error when logging in. |
+| kylin.web.session.jdbc-encode-enabled                        | The default 
is false, sessionId is saved directly into the database without encryption, and 
sessionId will be encrypted and saved to the database after opening. Note: If 
the encryption function is configured, Please use the upgrade session table 
tool to upgrade the session table first, otherwise the user will report an 
error when logging in. |
+| kylin.server.cors.allow-all                                  | allow all 
corss origin requests(CORS). `true` for allowing any CORS request, `false` for 
refusing all CORS requests. Default to `false`. |
+| kylin.server.cors.allowed-origin                             | Specify a 
whitelist that allows cross-domain, default all domain names (*), use commas 
(,) to separate multiple domain names. This parameter is valid when 
`kylin.server.cors.allow-all`=true |
+| kylin.storage.columnar.spark-conf.spark.driver.host          | Configure the 
IP of the node where the Kylin is located |
+| kylin.engine.spark-conf.spark.driver.host                    | Configure the 
IP of the node where the Kylin is located |
+| kylin.engine.sanity-check-enabled                            | Configure 
Kylin whether to open Sanity Check during indexes building. The default value 
is `true` |
+| kylin.job.finished-notifier-url                              | When the 
building job is completed, the job status information will be sent to the url 
via HTTP request |
+| kylin.diag.obf.level                                         | The 
desensitization level of the diagnostic package. `RAW` means no 
desensitization, `OBF` means desensitization. Configuring `OBF` will 
desensitize sensitive information such as usernames and passwords in the 
`kylin.properties` file (please refer to the [Diagnosis Kit 
Tool](../operations/cli_tool/diagnosis.md) chapter), The default value is 
`OBF`. |
+| kylin.diag.task-timeout                                      | The subtask 
timeout time for the diagnostic package, whose default value is 3 minutes |
+| kylin.diag.task-timeout-black-list                           | Diagnostic 
package subtask timeout blacklist (the values are separated by commas). The 
subtasks in the blacklist will be skipped by the timeout settings and will run 
until it finished. The default value is `METADATA`, `LOG` <br />The optional 
value is as below: <br />METADATA, AUDIT_LOG, CLIENT, JSTACK, CONF, 
HADOOP_CONF, BIN, HADOOP_ENV, CATALOG_INFO, SYSTEM_METRICS, MONITOR_METRICS, 
SPARK_LOGS, SPARDER_HISTORY, KG_LOGS, L [...]
+| kylin.query.queryhistory.max-size                            | The total 
number of records in the query history of all projects, the default is 10000000 
|
+| kylin.query.queryhistory.project-max-size                    | The number of 
records in the query history retained of a single project, the default is 
1000000 |
+| kylin.query.queryhistory.survival-time-threshold             | The number of 
records in the query history retention time of all items, the default is 30d, 
which means 30 days, and other units are also supported: millisecond ms, 
microsecond us, minute m or min, hour h |
+| kylin.query.engine.spark-scheduler-mode                      | The 
scheduling strategy of query engine whose default is FAIR (Fair scheduler). The 
optional value is SJF (Smallest Job First scheduler). Other value is illegal 
and FAIR strategy will be used as the default strategy. |
+| kylin.query.realization.chooser.thread-core-num              | The number of 
core threads of the model matching thread pool in the query engine, the default 
is 5. It should be noted that when the number of core threads is set to less 
than 0, this thread pool will be unavailable, which will cause the entire query 
engine to be unavailable |
+| kylin.query.realization.chooser.thread-max-num               | The maximum 
number of threads in the model matching thread pool in the query engine, the 
default is 50. It should be noted that when the maximum number of threads is 
set to be less than or equal to 0 or less than the number of core threads, this 
thread pool will be unavailable, which will cause the entire query engine to be 
unavailable |
+| kylin.query.memory-limit-during-collect-mb                   | Limit the 
memory usage when getting query result in Kylin，the unit is megabytes, defaults 
to 5400mb |
+| kylin.query.auto-model-view-enabled                          | Automatically 
generate views for model. When the config is on, a view will be generated for 
each model and user can query on that view. The view will be named with 
{project_name}.{model_name} and contains all the tables defined in the model 
and all the columns referenced by the dimension and measure of the table. |
+| kylin.streaming.job.max-concurrent-jobs                      | Only for 
Kylin Realtime. Max tasks numbers used to ingesting realtime data and merging 
segments. |
+| kylin.streaming.kafka-conf.maxOffsetsPerTrigger              | Only for 
Kylin Realtime. Max records numbers of ingesting data at one time. -1 stands 
for no limitation. |
+| kylin.streaming.job-status-watch-enabled                     | Only for 
Kylin Realtime. Whether enabling tasks monitor, "true" stands for enabled and 
"false" stands for disabled. |
+| kylin.streaming.job-retry-enabled                            | Only for 
Kylin Realtime. Whether retrying after tasks failed, "true" stands for enabled 
and "false" stands for disabled. |
+| kylin.streaming.job-retry-interval                           | Only for 
Kylin Realtime. How many minutes the tasks will retry after failed. |
+| kylin.streaming.job-retry-max-interval                       | Only for 
Kylin Realtime. How many minutes the interval is when the tasks retry. |
+| kylin.engine.streaming-metrics-enabled                       | Only for 
Kylin Realtime. Whether enabling tasks metrics monitor, "true" stands for 
enabled and "false" stands for disabled. |
+| kylin.engine.streaming-segment-merge-interval                | Only for 
Kylin Realtime. How many seconds the interval is when merging segments. |
+| kylin.engine.streaming-segment-clean-interval                | Only for 
Kylin Realtime. How many hours the time is before which the segments will be 
cleaned after being merged. |
+| kylin.engine.streaming-segment-merge-ratio                   | Only for 
Kylin Realtime. The ratio, which the summary of the segments reach, will 
trigger merging segments. |
+| kylin.streaming.jobstats.survival-time-threshold             | Only for 
Kylin Realtime. How many days the realtime data statistics keeps. The default 
value is 7. |
+| kylin.streaming.spark-conf.spark.yarn.queue                  | Only for 
Kylin Realtime. The name of the yarn queue which realtime tasks exclusively 
use. |
+| kylin.streaming.spark-conf.spark.port.maxRetries             | Only for 
Kylin Realtime. The number to retry when the port is occupied. |
+| kylin.streaming.kafka.starting-offsets                       | Only for 
Kylin Realtime. The offset from where to consume Kafka message. The default 
value is 'earliest'. |
+| kylin.storage.columnar.spark-conf.spark.sql.view-truncate-enabled | Allow 
spark view to lose precision when loading tables and queries, the default value 
is false |
+| kylin.engine.spark-conf.spark.sql.view-truncate-enabled=true | Allow spark 
view to lose precision during construction, the default value is false |
+| kylin.source.hive.databases                                  | Configure the 
database list loaded by the data source. There is no default value. Both the 
system level and the project level can be configured. The priority of the 
project level is greater than the system level. |
+| kylin.query.spark-job-trace-enabled                          | Enable the 
job tracking log of spark. Record additional information about spark: 
Submission waiting time, execution waiting time, execution time and result 
acquisition time are displayed in the timeline of history. |
+| kylin.query.spark-job-trace-timeout-ms                       | Only for the 
job tracking log of spark. The longest waiting time of query history. If it 
exceeds, the job tracking log of spark will not be recorded. |
+| kylin.query.spark-job-trace-cache-max                        | Only for the 
job tracking log of spark. The maximum number of job tracking log caches in 
spark. The elimination strategy is LRU，TTL is 
kylin.query.spark-job-trace-timeout-ms + 20000 ms. |
+| kylin.query.spark-job-trace-parallel-max                     | Only for the 
job tracking log of spark. Spark's job tracks the concurrency of log 
processing, "Additional information about spark" will be lost if the 
concurrency exceeds this limit. |
+| kylin.query.replace-dynamic-params-enabled                   | Whether to 
enable dynamic parameter binding for JDBC query, the default value is false, 
which means it is not enabled. For more, please refer to [Kylin JDBC 
Driver](#TODO) |
+| kylin.second-storage.route-when-ch-fail                      | When tiered 
storage is enabled, whether the query matching the base table index is answered 
only by tiered storage. The default value is `0`, which means that when tiered 
storage cannot answer, it is answered by the base table index on HDFS, 
configured as `1` indicates that when the tiered storage cannot answer the 
query, the query will be pushdown, configured as `2`, indicates that the query 
fails when the tiered storage c [...]
+| kylin.second-storage.query-pushdown-limit                    | When query 
result sets are large, the performance of query using tiered storage may 
degrade. This parameter indicates whether to use the limit statement to limit 
whether the detailed query uses tiered storage, the default value is `0`, which 
means it is not enabled. If you need to enable it, you can configure a specific 
value. For example, if it is configured as `100000`, it means that the detailed 
query with the value afte [...]
+
+### <span id="override">Configuration Override</span>
+
+There are many configurations avaiable in the file `kylin.properties`. If you 
need to modify several of them, you can create a new file named 
`kylin.properties.override` in the `$KYLIN_HOME/conf` directory. Then you can 
put the customized config items into `kylin.properties.override`, 
+the items in this file will override the default value in `kylin.properties` 
at runtime. 
+It is easy to upgrade. In the system upgrade, put the 
`kylin.properties.override` together with new version `kylin.properties`. 
+
+
+
+### <span id="jvm">JVM Configuration Setting</span>
+
+In `$KYLIN_HOME/conf/setenv.sh.template`, the sample setting for 
`KYLIN_JVM_SETTINGS` environment variable is given. The default setting uses 
relatively little memory. You can always adjust it according to your own 
environment. The default configuration is: 
+
+```properties
+export KYLIN_JVM_SETTINGS="-server -Xms1g -Xmx8g -XX:+UseG1GC 
-XX:MaxGCPauseMillis=200 -XX:G1HeapRegionSize=16m -XX:+PrintFlagsFinal 
-XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
-XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark  
-Xloggc:$KYLIN_HOME/logs/kylin.gc.$$  -XX:+UseGCLogFileRotation 
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64M -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=${KYLIN_HOME}/logs"
+```
+
+If you need to change it, you need to make a copy, name it `setenv.sh` and put 
it in the` $KYLIN_HOME/conf/ `folder, then modify the configuration in it. The 
parameter "-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${KYLIN_HOME}/logs" 
would generate logs when OutOfMemory happened. The default log file path is 
${KYLIN_HOME}/logs, you can modify it if needed.
+
+```bash
+export JAVA_VM_XMS=1g        #The initial memory of the JVM when kylin starts.
+export JAVA_VM_XMX=8g        #The maximum memory of the JVM when kylin starts.
+export JAVA_VM_TOOL_XMS=1g   #The initial memory of the JVM when the tool 
class is started.
+export JAVA_VM_TOOL_XMX=8g   #The maximum memory of the JVM when the tool 
class is started.
+```
+
+If the value of JAVA_VM_TOOL_XMS is not set, then the value of 
JAVA_VM_TOOL_XMS will use the value of JAVA_VM_XMS. Similarly, when the value 
of JAVA_VM_TOOL_XMX is not set, JAVA_VM_TOOL_XMX will use the value of 
JAVA_VM_XMX.
+
+Note: 1. Some special tool classes, such as guardian.sh, 
check-2100-hive-acl.sh, get-properties.sh, are not affected by the 
JAVA_VM_TOOL_XMS, JAVA_VM_TOOL_XMX configuration.
+      2. The two configuration items JAVA_VM_TOOL_XMS and JAVA_VM_TOOL_XMX 
have been added and take effect. You need to configure them manually when 
upgrading the old version.
+
+### <span id="update">Kylin Warm Start after Config Parameters Modified</span>
+
+The parameters defined in `kylin.properties` (global) will be loaded by 
default when Kylin is started. Once modified, restart Kylin for the changes to 
take effect. 
+
+
+
+### <span id="min_prod">Recommended Configurations for Production</span>
+
+Under `$KYLIN_HOME/conf/`, there are two sets of configurations ready for use: 
`production` and `minimal`. The former is the default configuration, which is 
recommended for production environment. The latter uses minimal resource, and 
is suitable for sandbox or other single node with limited resources. You can 
switch to `minimal` configurations if your environment has only limited 
resource. To switch to `minimal`, please uncomment the following configuration 
items in `$KYLIN_HOME/conf/ky [...]
+
+```properties
+# KAP provides two configuration profiles: minimal and production(by default).
+# To switch to minimal: uncomment the properties
+# kylin.storage.columnar.spark-conf.spark.driver.memory=512m
+# kylin.storage.columnar.spark-conf.spark.executor.memory=512m
+# kylin.storage.columnar.spark-conf.spark.executor.memoryOverhead=512m
+# 
kylin.storage.columnar.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
 -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug 
-Dkylin.hdfs.working.dir=${kylin.env.hdfs-working-dir} 
-Dkylin.metadata.identifier=${kylin.metadata.url.identifier} -    
Dkylin.spark.category=sparder -Dkylin.spark.project=${job.project} 
-XX:MaxDirectMemorySize=512M
+# kylin.storage.columnar.spark-conf.spark.yarn.am.memory=512m
+# kylin.storage.columnar.spark-conf.spark.executor.cores=1
+# kylin.storage.columnar.spark-conf.spark.executor.instances=1
+```
+
+
+### <span id="spark"> Spark-related Configuration</span>
+
+For a detailed explanation of the spark configuration, please refer to the 
official documentation, [Spark 
Configuration](https://spark.apache.org/docs/latest/configuration.html). The 
following are some configurations related to the query and build tasks in Kylin.
+
+The parameters start with ```kylin.storage.columnar.spark-conf```, the 
subsequent part is the spark parameter used by the query task. The default 
parameters in the recommended configuration file `kylin.properties` are as 
follows:
+
+| Properties Name                                                  | Min     | 
Prod   |
+| ---------------------------------------------------------------- | ------- | 
------ |
+| kylin.storage.columnar.spark-conf.spark.driver.memory              | 512m    
| 4096m  |
+| kylin.storage.columnar.spark-conf.spark.executor.memory            | 512m    
| 12288m |
+| kylin.storage.columnar.spark-conf.spark.executor.memoryOverhead    | 512m    
| 3072m  |
+| kylin.storage.columnar.spark-conf.spark.yarn.am.memory             | 512m    
| 1024m  |
+| kylin.storage.columnar.spark-conf.spark.executor.cores             | 1       
| 5      |
+| kylin.storage.columnar.spark-conf.spark.executor.instances         | 1       
| 4      |
+
+Kylin provides customized Spark configurations. The configurations will have 
an affect on how Spark Execution Plan is generated. The default parameters in 
the recommended configuration file `kylin.properties` are as follows:
+
+| Properties Name                                                  | Default   
  | Description   |
+| ---------------------------------------------------------------- | ------- | 
------ |
+| kylin.storage.columnar.spark-conf.spark.sql.cartesianPartitionNumThreshold | 
 -1     | Threshold for Cartesian Partition number in Spark Execution Plan. 
Query will be terminated if Cartesian Partition number reaches or exceeds the 
threshold. If this value is set to empty or negative, the threshold will be set 
to spark.executor.cores * spark.executor.instances * 100. |
+
+The parameters start with ```kylin.engine.spark-conf```, the subsequent part 
is the spark parameter used for the build task. The default parameters are not 
configured and they will be automatically adjusted and configured according to 
the cluster environment during the build task. If you configure these 
parameters in `kylin.properties`, Kylin will use the configuration in 
`kylin.properties` first.
+
+```properties
+kylin.engine.spark-conf.spark.executor.instances
+kylin.engine.spark-conf.spark.executor.cores
+kylin.engine.spark-conf.spark.executor.memory
+kylin.engine.spark-conf.spark.executor.memoryOverhead
+kylin.engine.spark-conf.spark.sql.shuffle.partitions
+kylin.engine.spark-conf.spark.driver.memory
+kylin.engine.spark-conf.spark.driver.memoryOverhead
+kylin.engine.spark-conf.spark.driver.cores
+```
+
+If you need to enable Spark RPC communication encryption, you can refer to the 
[Spark RPC Communication Encryption](spark_rpc_encryption.md) chapter.
+
+
+### <span id="spark_canary">Spark Context Canary Configuration</span>
+Sparder Canary is a component used to monitor the running status of Sparder. 
It will periodically check whether the current Sparder is running normally. If 
the running status is abnormal, such as Sparder unexpectedly exits or becomes 
unresponsive, Sparder Canary will create a new Sparder instance.
+
+| Properties                                                      | 
Description                                                         |
+| ----------------------------------------------------------- | 
------------------------------------------------------------ |
+| kylin.canary.sqlcontext-enabled                             | Whether to 
enable the Sparder Canary function, the default is `false`                   |
+| kylin.canary.sqlcontext-threshold-to-restart-spark          | When the 
number of abnormal detection times exceeds this threshold, restart spark 
context                   |
+| kylin.canary.sqlcontext-period-min                          | Check 
interval, default is `3` minutes                                 |
+| kylin.canary.sqlcontext-error-response-ms                   | Single 
detection timeout time, the default is `3` minutes, if single detection timeout 
means no response from spark context                                   |
+| kylin.canary.sqlcontext-type                                | The detection 
method, the default is `file`, this method confirms whether the spark context 
is still running normally by writing a parquet file to the directory configured 
by `kylin.env.hdfs-working-dir` . It can also be configured as `count` to 
confirm whether the spark context is running normally by performing an 
accumulation operation|                                 |
diff --git a/website/docs/configuration/hadoop_queue_config.md 
b/website/docs/configuration/hadoop_queue_config.md
new file mode 100644
index 0000000000..b252ec2350
--- /dev/null
+++ b/website/docs/configuration/hadoop_queue_config.md
@@ -0,0 +1,53 @@
+---
+title: Hadoop Queue Configuration
+language: en
+sidebar_label: Hadoop Queue Configuration
+pagination_label: Hadoop Queue Configuration
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - hadoop queue configuration
+draft: true
+last_update:
+    date: 08/16/2022
+---
+
+
+## Hadoop Queue Configuration
+
+In the case of a multiple-tenants environment, to securely share a large 
cluster, each tenant needs to have the allocated resources in a timely manner 
under the constraints of the allocated capacities. To achieve computing 
resources allocation and separation, each Kylin instance or project can be 
configured to use a different YARN queue.  
+
+
+###<span id="instance">Instance-level YARN Queue Setting</span>
+
+To achieve this, first create a new YARN capacity scheduler queue. By default, 
the job sent out by Kylin will go to the default YARN queue.
+
+In the screenshot below, a new YARN queue `learn_kylin` has been set up.
+
+![](images/hadoop_queue/1.png)
+
+Then you may modify `kylin.properties` to configure the YARN queue used in 
Kylin for building or querying (you will need to change the YOUR_QUEUE_NAME to 
your queue name).
+
+```shell
+Building configuration: 
kylin.engine.spark-conf.spark.yarn.queue=YOUR_QUEUE_NAME
+Querying configuration: 
kylin.storage.columnar.spark-conf.spark.yarn.queue=YOUR_QUEUE_NAME
+```
+
+![](images/hadoop_queue/2.png)
+
+In this example, the queue for querying has been changed to `learn_kylin` (as 
shown above). You can test this change by triggering a querying job.
+
+Now, go to YARN Resource Manager on the cluster. You will see this job has 
been submitted under queue `learn_kylin`. 
+
+![](images/hadoop_queue/3.png)
+
+
+Similarly, you may set up YARN queue for other Kylin instances to achieve 
computing resource separation.
+
+
+
+###<span id="project">Project-level YARN Queue Setting</span>
+
+The system admin user can set the YARN Application Queue of the project in 
**Setting -> Advanced Settings -> YARN Application Queue**, please refer to the 
[Project Settings](../operations/project-maintenance/project_settings.md) for 
more information.
diff --git a/website/docs/configuration/https.md 
b/website/docs/configuration/https.md
new file mode 100644
index 0000000000..dd41340f55
--- /dev/null
+++ b/website/docs/configuration/https.md
@@ -0,0 +1,74 @@
+---
+title: HTTPS Configuration
+language: en
+sidebar_label: HTTPS Configuration
+pagination_label: HTTPS Configuration
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - https configuration
+draft: false
+last_update:
+    date: 08/16/2022
+---
+
+Kylin 5.x provides a HTTPS connection. It is disabled by default. If you need 
to enable it, please follow the steps below.
+
+### Use Default Certificate
+
+Kylin ships a HTTPS certificate. If you want to enable this function with the 
default certificate, you just need to add or modify the following properties in 
`$KYLIN_HOME/conf/kylin.properties`.
+
+```properties
+# enable HTTPS connection
+kylin.server.https.enable=true
+# port number
+kylin.server.https.port=7443
+```
+
+The default port is `7443`, please check the port has not been taken by other 
processes. You can run the command below to check. If the port is in use, 
please use an available port number.
+
+```
+netstat -tlpn | grep 7443
+```
+
+After modifying the above properties, please restart Kylin for the changes to 
take effect. Assuming you set the https port to 7443, the access url would be 
`https://localhost:7443/kylin/index.html`.
+
+**Note:**  Because the certificate is generated automatically, you may see a 
browser notice about certificate installation when you access the url. Please 
ignore it.
+
+### User Other Certificates
+
+Kylin also supports third-party certificates, you just need to provide the 
certificate file and make the following changes in the 
`$KYLIN_HOME/conf/kylin.properties` file:
+
+```properties
+# enable HTTPS connection
+kylin.server.https.enable=true
+# port number
+kylin.server.https.port=7443
+# ormat of keystore, Tomcat 8 supports JKS, PKCS11 or PKCS12 format
+kylin.server.https.keystore-type=JKS
+# location of your certificate file
+kylin.server.https.keystore-file=${KYLIN_HOME}/server/.keystore
+# password
+kylin.server.https.keystore-password=changeit
+# alias name for keystore entry, which is optional. Please skip it if you 
don't need.
+kylin.server.https.key-alias=tomcat
+```
+
+### Encrypt kylin.server.https.keystore-password
+If you need to encrypt `kylin.server.https.keystore-password`, you can do it 
like this：
+
+i.run following commands in `${KYLIN_HOME}`, it will print encrypted password
+```shell
+./bin/kylin.sh io.kyligence.kap.tool.general.CryptTool -e AES -s <password>
+```
+
+ii.config `kylin.server.https.keystore-password` like this
+```properties
+kylin.server.https.keystore-password=ENC('${encrypted_password}')
+```
+
+After modifying the properties above, please restart Kylin for the changes to 
take effect. Assuming you set the https port to 7443, the access url would be 
`https://localhost:7443/kylin/index.html`.
+
+> **Note**: If you are not using the default SSL certificate and put your 
certificate under `$KYLIN_HOME`. Please backup your certificate before 
upgrading your instance, and specify the file path in the new Kylin 
configuration file. We recommend putting the certificate under an independent 
path.
diff --git a/website/docs/configuration/images/hadoop_queue/1.png 
b/website/docs/configuration/images/hadoop_queue/1.png
new file mode 100644
index 0000000000..96562495aa
Binary files /dev/null and 
b/website/docs/configuration/images/hadoop_queue/1.png differ
diff --git a/website/docs/configuration/images/hadoop_queue/2.png 
b/website/docs/configuration/images/hadoop_queue/2.png
new file mode 100644
index 0000000000..42dad34da8
Binary files /dev/null and 
b/website/docs/configuration/images/hadoop_queue/2.png differ
diff --git a/website/docs/configuration/images/hadoop_queue/3.png 
b/website/docs/configuration/images/hadoop_queue/3.png
new file mode 100644
index 0000000000..a63b446fb2
Binary files /dev/null and 
b/website/docs/configuration/images/hadoop_queue/3.png differ
diff --git a/website/docs/configuration/images/spark_executor_max.jpg 
b/website/docs/configuration/images/spark_executor_max.jpg
new file mode 100644
index 0000000000..96adbf72f3
Binary files /dev/null and 
b/website/docs/configuration/images/spark_executor_max.jpg differ
diff --git a/website/docs/configuration/images/spark_executor_min.jpg 
b/website/docs/configuration/images/spark_executor_min.jpg
new file mode 100644
index 0000000000..4544a426f0
Binary files /dev/null and 
b/website/docs/configuration/images/spark_executor_min.jpg differ
diff --git a/website/docs/configuration/images/spark_executor_original.jpg 
b/website/docs/configuration/images/spark_executor_original.jpg
new file mode 100644
index 0000000000..5b0e783873
Binary files /dev/null and 
b/website/docs/configuration/images/spark_executor_original.jpg differ
diff --git a/website/docs/configuration/intro.md 
b/website/docs/configuration/intro.md
index 0cca415b79..ff4769bade 100644
--- a/website/docs/configuration/intro.md
+++ b/website/docs/configuration/intro.md
@@ -1,13 +1,34 @@
 ---
-sidebar_position: 1
+title: System Configuration
+language: en
+sidebar_label: System Configuration
+pagination_label: System Configuration
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - system configuration
+draft: false
+last_update:
+    date: 08/16/2022
 ---
 
-# Tutorial Intro
+After deploying Kylin on your cluster, configure Kylin so that it can interact 
with Apache Hadoop and Apache Hive. You can also optimize the performance of 
Kylin by configuring to your own environment.
 
-Let's discover ** Kylin 5.0 in than 15 minutes **.
+This chapter introduces some configurations for Kylin.
 
-## Basic Configuration
+### Kylin Configuration File List
 
-| asda | asdas|
-|--|--|
-|sdasda|dasda|
+| Component            | File                        | Description             
                                     |
+| -------------------- | --------------------------- | 
------------------------------------------------------------ |
+| Kylin                | conf/kylin.properties                   | This is the 
global configuration file, with all configuration properties about Kylin in it. 
Details will be discussed in the subsequent chapter [Basic 
Configuration](configuration.md). |
+| Hadoop               | hadoop_conf/core-site.xml               | Global 
configuration file used by Hadoop, which defines system-level parameters such 
as HDFS URLs and Hadoop temporary directories, etc. |
+| Hadoop               | hadoop_conf/hdfs-site.xml               | HDFS 
configuration file, which defines HDFS parameters such as the storage location 
of NameNode and DataNode and the number of file copies, etc. |
+| Hadoop               | hadoop_conf/yarn-site.xml               | Yarn 
configuration file,which defines Hadoop cluster resource management system 
parameters, such as ResourceManager, NodeManager communication port and web 
monitoring port, etc. |
+| Hadoop               | hadoop_conf/mapred-site.xml             | Map Reduce 
configuration file used in Hadoop,which defines the default number of reduce 
tasks, the default upper and lower limits of the memory that the task can use, 
etc. |
+| Hive                 | hadoop_conf/hive-site.xml               | Hive 
configuration file, which defines Hive parameters such as hive data storage 
directory and database address, etc. |
+
+>Note:
+>
+>+ Unless otherwise specified, the configuration file `kylin.properties` 
mentioned in this manual refers to the corresponding configuration file in the 
list.
diff --git a/website/docs/configuration/log_rotate.md 
b/website/docs/configuration/log_rotate.md
new file mode 100644
index 0000000000..cd163c2bd8
--- /dev/null
+++ b/website/docs/configuration/log_rotate.md
@@ -0,0 +1,36 @@
+---
+title: Log Rotate Configuration
+language: en
+sidebar_label: Log Rotate Configuration
+pagination_label: Log Rotate Configuration
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - log rotate configuration
+draft: false
+last_update:
+    date: 08/16/2022
+---
+
+The three log files, `shell.stderr`, `shell.stdout`, and `kylin.out` under the 
log directory `$KYLIN_HOME/logs/` of Kylin, trigger log rolling checks 
regularly by default.
+
+> **Caution:** Any change of configurations below requires a restart to take 
effect. 
+
+| Properties                               | Descript                        | 
Default              | Options |
+|------------------------------------------| 
--------------------------------|----------------------|---------|
+| kylin.env.max-keep-log-file-number       | Maximum number of files to keep 
for log rotate | 10                   |         |
+| kylin.env.max-keep-log-file-threshold-mb | Log files are rotated when they 
grow bigger than this  | 256，whose unit is MB |         |
+| kylin.env.log-rotate-check-cron          | The `crontab` time configuration  
                       | 33 * * * *           |         |
+| kylin.env.log-rotate-enabled             | Whether to enable `crontab` to 
check log rotate               | true                 | false   |
+
+### Default Regularly Rotate strategy
+
+To use the default regularly rotate strategy, you need to set the parameter 
`kylin.env.log-rotate-enabled=true` (default), and also need to ensure that 
users running Kylin can use the `logrotate` and `crontab` commands to add a 
scheduled task.
+
+When using the rotate strategy, Kylin will add or update `crontab` tasks 
according to the `kylin.env.log-rotate-check-cron` parameter on startup or 
restart, and remove the added `crontab` tasks on exit.
+
+### Known Limitations
+- If the default regularly rotate policy conditions are not met, Kylin will 
only trigger the log rolling check at startup. Every time the `kylin.sh start` 
command is executed, according to the parameter 
`kylin.env.max-keep-log-file-number` and 
`kylin.env.max-keep-log-file-threshold-mb` for log rolling. If Kylin runs for a 
long time, the log file may be too large.
+- When using `crontab` to control log rotation, the rolling operation is 
implemented by the `logrotate` command. If the log file is too large, the log 
may be lost during the rotation.
diff --git a/website/docs/configuration/query_cache.md 
b/website/docs/configuration/query_cache.md
new file mode 100644
index 0000000000..27e09396f1
--- /dev/null
+++ b/website/docs/configuration/query_cache.md
@@ -0,0 +1,72 @@
+---
+title: Query Cache Settings
+language: en
+sidebar_label: Query Cache Settings
+pagination_label: Query Cache Settings
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - query cache settings
+draft: false
+last_update:
+    date: 08/16/2022
+---
+
+By default, Kylin enables query cache in each process to improve query 
performance.
+
+> **Note**: In order to ensure data consistency, query cache is not available 
in pushdown.
+
+
+###Use Default Cache
+
+Kylin enables query cache by default at each node/process level. The 
configuration details are described  below. You can change them in 
`$KYLIN_HOME/conf/kylin.properties` under Kylin installation directory.
+
+> **Caution:** Must restart for any configurations to take effect. 
+
+| Properties                | Description                                      
            | Default | Options |
+| ------------------------- | 
------------------------------------------------------------ | ------- | 
------- |
+| kylin.query.cache-enabled | Whether to enable query cache. When this 
property is enabled, the following properties take effect. | true    | false   |
+
+
+### Query Cache Criteria
+Kylin doesn't cache the query result of all SQL queries by default (because 
the memory resource might be limited). It only caches slow queries and the 
result size is appropriate. The criterion are configured by the following 
parameters. 
+The query that satisfies any one of the No.1, No.2, No.3 configuration and 
also satisfies No.4 configuration will be cached.
+
+|No |  Properties                         | Description                        
                          | Default        | Default unit |
+| ----| ---------------------------------- | 
------------------------------------------------------------ | -------------- | 
------- |
+| 1|kylin.query.cache-threshold-duration          | Queries whose duration is 
above this value | 2000           | millisecond |
+| 2|kylin.query.cache-threshold-scan-count          | Queries whose scan row 
count is above this value | 10240           | row |
+| 3|kylin.query.cache-threshold-scan-bytes          | Queries whose scan bytes 
is above this value | 1048576           | byte |
+| 4|kylin.query.large-query-threshold          | Queries whose result set size 
is below this value  | 1000000           | cell |
+
+### Ehcache Cache Configuration
+
+By default, Kylin uses Ehcache as the query cache. You can configure Ehcache 
to control the query cache size and policy. You can replace the default query 
cache configuration by modifying the following configuration item. For more 
Ehcache configuration items, please refer to the official website [ehcache 
documentation](https://www.ehcache.org/generated/2.9.0/html/ehc-all/#page/Ehcache_Documentation_Set%2Fehcache_all.1.017.html%23).
+
+| Properties | Description | Default |
+| ----- | ---- | ----- |
+| kylin.cache.config | The path to ehcache.xml. To replace the default query 
cache configuration file, you can create a new file `xml`, for exemple 
`ekcache2.xml`, in the directory  `${KYLIN_HOME}/conf/`, and modify the value 
of this configuration item: `file://${KYLIN_HOME}/conf/ehcache2.xml` | 
classpath:ehcache.xml |
+
+
+### Redis Cache Configuration
+
+The default query cache cannot be shared among different nodes or processes 
because it is process level. Because of this,  when subsequent and same queries 
are routed to different Kylin nodes, the cache of the first query result cannot 
be used in cluster deployment mode. Therefore, you can configure Redis cluster 
as distributed cache, which can be shared across all Kylin nodes. The detail 
configurations are described as below:
+(Redis 5.0 or 5.0.5 is recommended.)
+
+| Properties                         | Description                             
                     | Default        | Options |
+| ---------------------------------- | 
------------------------------------------------------------ | -------------- | 
------- |
+| kylin.cache.redis.enabled          | Whether to enable query cache by using 
Redis cluster.         | false          | true    |
+| kylin.cache.redis.cluster-enabled  | Whether to enable Redis cluster mode.   
                      | false          | true    |
+| kylin.cache.redis.hosts             | Redis host. If you need to connect to 
a Redis cluster, please use comma to split the hosts, such as, 
kylin.cache.redis.hosts=localhost:6379,localhost:6380 | localhost:6379 |        
 |
+| kylin.cache.redis.expire-time-unit | Time unit for cache period. EX means 
seconds and PX means milliseconds. | EX             | PX      |
+| kylin.cache.redis.expire-time      | Valid cache period.                     
                      | 86400          |         |
+| kylin.cache.redis.reconnection.enabled | Whether to enable redis 
reconnection when cache degrades to ehcache | true | false |
+| kylin.cache.redis.reconnection.interval | Automatic reconnection interval, 
in minutes | 60 | |
+| kylin.cache.redis.password | Redis password | | |
+
+#### Limitation
+Due to metadata inconsistency between Query nodes and All/Job nodes, the redis 
cache swith `kylin.cache.redis.enabled=true` should be configured along with 
`kylin.server.store-type=jdbc`.
+
+> **Caution:** Redis passwords can be encrypted, please refer to: [Use MySQL 
as Metastore](../deployment/rdbms_metastore/mysql/mysql_metastore.md)
diff --git a/website/docs/configuration/spark_dynamic_allocation.md 
b/website/docs/configuration/spark_dynamic_allocation.md
new file mode 100644
index 0000000000..c1f4d2c2eb
--- /dev/null
+++ b/website/docs/configuration/spark_dynamic_allocation.md
@@ -0,0 +1,93 @@
+---
+title: Spark Dynamic Allocation
+language: en
+sidebar_label: Spark Dynamic Allocation
+pagination_label: Spark Dynamic Allocation
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+      - spark dynamic allocation
+draft: false
+last_update:
+      date: 08/16/2022
+---
+
+In Spark, the resource unit is executor, something like containers in YARN. 
Under Spark on YARN, we use num-executors to specify the executor numbers. 
While executor-memory and executor-cores will limit the memory and virtual CPU 
cores each executor consumes.
+
+Take Kylin as sample, if user choose fixed resource allocation strategy and 
set num-executor to 3. Then each Kylin instance will always keep 4 YARN 
containers(1 for application master and 3 for executor). These 4 containers 
will be occupied until user log out. While we use Dynamic Resource Allocation, 
Spark will dynamically increase and reduce executors according to Kylin query 
engine workload which will dramatically save resource.
+
+Please refer to official document for details of Spark Dynamic Allocation:
+
+http://spark.apache.org/docs/2.4.1/job-scheduling.html#dynamic-resource-allocation
+
+### Spark Dynamic Allocation Config
+
+#### Overview
+There are two parts we need to configure for Spark Dynamic Allocation:
+1.  Resource Management for cluster, it will be diversed due to different 
resource manager(YARN、Mesos、Standalone).
+2.  Configure file spark-default.conf, this one is irrespective of the 
environment.
+
+#### Resource Manager Configuration
+##### CDH
+
+1. Log into Cloudera Manager, choose YARN configuration and find NodeManager 
Advanced Configuration Snippet(Safety Valve) for yarn-site.xml, config as 
following：
+
+```
+<property>
+ <name>yarn.nodemanager.aux-services</name>
+ <value>mapreduce_shuffle,spark_shuffle</value>
+</property>
+<property>
+ <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
+ <value>org.apache.spark.network.yarn.YarnShuffleService</value>
+</property>
+```
+
+2. Copy the `$KYLIN_HOME/spark/yarn/spark-<version>-yarn-shuffle.jar` and put 
it under path /opt/lib/kylin/ of Hadoop node.
+
+   Find NodeManager Environment Advanced Configuration Snippet  (Safety Valve) 
in Cloudera Manager, Config:
+
+   `YARN_USER_CLASSPATH=/opt/lib/kylin/*`
+
+   Then yarn-shuffle.jar will be added into the startup classpath of Node 
Manager.
+
+3. Save the config and restart
+   In Cloudera Manager, choose actions --> deploy client configuration, save 
and restart all services.
+
+##### HDP
+1. Log into Ambari management page, choose Yarn -> Configs -> Advanced, find 
following configurations via filter and update: 
+   
`yarn.nodemanager.aux-services.spark_shuffle.class=org.apache.spark.network.yarn.YarnShuffleService`
+
+2. Save the config and restart all services.
+
+
+#### Kylin configuration
+To enable the Spark Dynamic Allocaiton, we will need to add some configuration 
items in Spark config files. Since we can override spark configuraion in 
kylin.properties, we will add following configuration items in it:
+
+`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.enabled=true`
+
+`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.maxExecutors=5`
+
+`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.minExecutors=1`
+
+`kylin.storage.columnar.spark-conf.spark.shuffle.service.enabled=true`
+
+`kylin.storage.columnar.spark-conf.spark.dynamicAllocation.initialExecutors=3`
+
+More configurations please refer to: 
+http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
+
+### Spark Dynamic Allocation Verification
+After above configurations, start Kylin and monitor current executor numbers 
in Spark Executor page.
+
+![](images/spark_executor_original.jpg)
+
+The executors will keep idle, so they will be reduced after a while until 
reaching the minimum number in configuration item.
+
+![](images/spark_executor_min.jpg)
+
+Submit multi-thread queries to Kylin via Restful API. The executors will be 
increase but never exceed the maximum number in configuration item. 
+
+![](images/spark_executor_max.jpg)
diff --git a/website/docs/configuration/spark_rpc_encryption.md 
b/website/docs/configuration/spark_rpc_encryption.md
new file mode 100644
index 0000000000..0e696b829e
--- /dev/null
+++ b/website/docs/configuration/spark_rpc_encryption.md
@@ -0,0 +1,43 @@
+---
+title: Spark RPC Communication Encryption
+language: en
+sidebar_label: Spark RPC Communication Encryption
+pagination_label: Spark RPC Communication Encryption
+toc_min_heading_level: 2
+toc_max_heading_level: 6
+pagination_prev: null
+pagination_next: null
+keywords:
+    - spark rpc communication encryption
+draft: false
+last_update:
+    date: 08/16/2022
+---
+
+Kylin supports the configuration of communication encryption between Spark 
nodes, which can improve the security of internal communication and prevent 
specific security attacks.
+
+For more details about Spark RPC communication encryption, please see [Spark 
Security](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation).
+
+This function is disabled by default. If you need to enable it, please refer 
to the following method for configuration.
+
+### Spark RPC Communication Encryption Configuration
+1、1. Please refer to [Spark 
Security](http://spark.apache.org/docs/1.6.2/job-scheduling.html#dynamic-resource-allocation)
 to ensure that RPC communication encryption is enabled in the Spark cluster.
+2、Add the following configurations in `$KYLIN_HOME/conf/kylin.properties`, to 
To enable Kylin nodes and Spark cluster communication encryption
+```
+### spark rpc encryption for build jobs
+kylin.storage.columnar.spark-conf.spark.authenticate=true
+kylin.storage.columnar.spark-conf.spark.authenticate.secret=kylin
+kylin.storage.columnar.spark-conf.spark.network.crypto.enabled=true
+kylin.storage.columnar.spark-conf.spark.network.crypto.keyLength=256
+kylin.storage.columnar.spark-conf.spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA256
+
+### spark rpc encryption for query jobs
+kylin.engine.spark-conf.spark.authenticate=true
+kylin.engine.spark-conf.spark.authenticate.secret=kylin
+kylin.engine.spark-conf.spark.network.crypto.enabled=true
+kylin.engine.spark-conf.spark.network.crypto.keyLength=256
+kylin.engine.spark-conf.spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA256
+```
+
+### Spark RPC Communication Encryption Cerification
+After the configuration is complete, start Kylin and verify that the query and 
build tasks can be executed normally.

[kylin] 02/07: KYLIN-5221 add configuration docs

Reply via email to