This is an automated email from the ASF dual-hosted git repository. xxyu pushed a commit to branch doc5.0 in repository https://gitbox.apache.org/repos/asf/kylin.git
The following commit(s) were added to refs/heads/doc5.0 by this push: new 1954e12c72 KYLIN-5221 add apache hadoop installation 1954e12c72 is described below commit 1954e12c72dc594a011b5a7d7efc471407a45998 Author: Mukvin <boyboys...@163.com> AuthorDate: Tue Aug 23 19:17:34 2022 +0800 KYLIN-5221 add apache hadoop installation --- .../platform/install_on_apache_hadoop.md | 52 ++++ .../docs/deployment/installation/platform/intro.md | 18 ++ website/docs/development/how_to_package.md | 24 +- website/docs/quickstart/expert_mode_tutorial.md | 8 +- website/docs/quickstart/images/gss_negotiate.png | Bin 0 -> 19292 bytes .../images/installation_query_result.png | Bin 0 -> 127355 bytes website/docs/quickstart/images/list.png | Bin 0 -> 153847 bytes website/docs/quickstart/quick_start.md | 268 +++++++++++++++++++++ website/sidebars.js | 38 ++- 9 files changed, 399 insertions(+), 9 deletions(-) diff --git a/website/docs/deployment/installation/platform/install_on_apache_hadoop.md b/website/docs/deployment/installation/platform/install_on_apache_hadoop.md new file mode 100644 index 0000000000..2a5bdcc28e --- /dev/null +++ b/website/docs/deployment/installation/platform/install_on_apache_hadoop.md @@ -0,0 +1,52 @@ +--- +title: Install on Apache Hadoop Platform +language: en +sidebar_label: Install on Apache Hadoop Platform +pagination_label: Install on Apache Hadoop Platform +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - install + - hadoop +draft: false +last_update: + date: 08/12/2022 +--- + + +### Prepare Environment + +First, **make sure you allocate sufficient resources for the environment**. Please refer to [Prerequisites](../../../deployment/on-premises/prerequisite.md) for detailed resource requirements for Kylin. Moreover, please ensure that `HDFS`, `YARN`, `Hive`, `ZooKeeper` and other components are in normal state without any warning information. + + + +#### Apache Hadoop Supported Version + +Following Apache Hadoop versions are supported by Kylin: + +- Apache Hadoop 3.2.1 + +**Note**:The Apache Hadoop 3.2.1 environment with Kerberos is not currently supported. + +#### Additional configuration required for Apache Hadoop version + +Add the following two configurations in `$KYLIN_HOME/conf/kylin.properties`: + +- `kylin.env.apache-hadoop-conf-dir` Hadoop conf directory in Hadoop environment +- `kylin.env.apache-hive-conf-dir` Hive conf directory in Hadoop environment + + + +#### Jar package required by Apache Hadoop version + +In Apache Hadoop 3.2.1, you also need to prepare the MySQL JDBC driver in the operating environment of Kylin. + +Here is a download link for the jar file package of the MySQL 5.1 JDBC driver:https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.41/mysql-connector-java-5.1.41.jar. You need to prepare the other versions of the driver yourself.Please place the JDBC driver of the corresponding version of MySQL in the `$KYLIN_HOME/lib/ext` directory. + + + +### Install Kylin + +After setting up the environment, please refer to [Quick Start](../../../quickstart/quick_start.md) to continue. diff --git a/website/docs/deployment/installation/platform/intro.md b/website/docs/deployment/installation/platform/intro.md new file mode 100644 index 0000000000..fb44cb572c --- /dev/null +++ b/website/docs/deployment/installation/platform/intro.md @@ -0,0 +1,18 @@ +--- +title: Install On Platforms +language: en +sidebar_label: Install On Platforms +pagination_label: Install On Platforms +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - install + - platforms +draft: false +last_update: + date: 08/12/2022 +--- + +This chapter will introduce how to install Kylin on different platforms. diff --git a/website/docs/development/how_to_package.md b/website/docs/development/how_to_package.md index 72d0708f45..a52c95a8c0 100644 --- a/website/docs/development/how_to_package.md +++ b/website/docs/development/how_to_package.md @@ -1,5 +1,17 @@ --- -sidebar_position: 1 +title: How to package +language: en +sidebar_label: How to package +pagination_label: How to package +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - package +draft: false +last_update: + date: 08/22/2022 --- # How to package @@ -24,6 +36,11 @@ sidebar_position: 1 | -skipFront | If add this option, front-end won't be build and packaging | | -skipCompile | Add this option will assume java source code no need be compiled again | +### Other Options for Packaging Script +| Option | Comment | +|-------------------- | ---------------------------------------------------| +| -P hadoop3 | Packaging a Kylin 5.0 software package for running on Hadoop 3.0 + platform.| + ### Package Content | Option | Comment | @@ -46,6 +63,9 @@ For example, an unofficial package could be `apache-kylin-5.0.0-SNAPSHOT.2022081 ## Case 2: Official apache release, kylin binary for deploy on Hadoop3+ and Hive2.3+, # and third party cannot be distributed because of apache distribution policy(size and license) ./build/release/release.sh -noSpark -official + +## Case 3: A package for runing on Apache Hadoop 3 platform +./build/release/release.sh -P hadoop3 ``` ### How to switch to older node.js @@ -60,4 +80,4 @@ nvm use 12.14.0 ## switch to original version nvm use system -``` \ No newline at end of file +``` diff --git a/website/docs/quickstart/expert_mode_tutorial.md b/website/docs/quickstart/expert_mode_tutorial.md index 66905f9150..cbafa0496e 100644 --- a/website/docs/quickstart/expert_mode_tutorial.md +++ b/website/docs/quickstart/expert_mode_tutorial.md @@ -1,14 +1,14 @@ --- -title: Quick Start +title: Expert Mode Tutorial language: en -sidebar_label: Quick Start -pagination_label: Quick Start +sidebar_label: Expert Mode Tutorial +pagination_label: Expert Mode Tutorial toc_min_heading_level: 2 toc_max_heading_level: 6 pagination_prev: null pagination_next: null keywords: - - quick start + - expert mode tutorial draft: true last_update: date: 08/12/2022 diff --git a/website/docs/quickstart/images/gss_negotiate.png b/website/docs/quickstart/images/gss_negotiate.png new file mode 100644 index 0000000000..2eca44b918 Binary files /dev/null and b/website/docs/quickstart/images/gss_negotiate.png differ diff --git a/website/docs/quickstart/images/installation_query_result.png b/website/docs/quickstart/images/installation_query_result.png new file mode 100644 index 0000000000..f2bd43f594 Binary files /dev/null and b/website/docs/quickstart/images/installation_query_result.png differ diff --git a/website/docs/quickstart/images/list.png b/website/docs/quickstart/images/list.png new file mode 100644 index 0000000000..937e7782c0 Binary files /dev/null and b/website/docs/quickstart/images/list.png differ diff --git a/website/docs/quickstart/quick_start.md b/website/docs/quickstart/quick_start.md new file mode 100644 index 0000000000..69c91df583 --- /dev/null +++ b/website/docs/quickstart/quick_start.md @@ -0,0 +1,268 @@ +--- +title: Quick Start +language: en +sidebar_label: Quick Start +pagination_label: Quick Start +toc_min_heading_level: 2 +toc_max_heading_level: 6 +pagination_prev: null +pagination_next: null +keywords: + - quick start +draft: true +last_update: + date: 08/12/2022 +--- + +In this guide, we will explain how to quickly install and start Kylin 5. + +Before proceeding, please make sure the [Prerequisite](../deployment/on-premises/prerequisite.md) is met. + + +### <span id="install">Download and Install</span> + +1. Get Kylin installation package. + + Please refer to [How To Package](../development/how_to_package.md). + +2. Decide the installation location and the Linux account to run Kylin. All the examples below are based on the following assumptions: + + - The installation location is `/usr/local/` + - Linux account to run Kylin is `KyAdmin`. It is called the **Linux account** hereafter. + - **For all commands in the rest of the document**, please replace the above parameters with your real installation location and Linux account. + +3. Copy and uncompress Kylin software package to your server or virtual machine. + + ```shell + cd /usr/local + tar -zxvf Kylin5.0-Beta-[Version].tar.gz + ``` + The decompressed directory is referred to as **$KYLIN_HOME** or **root directory**. + +5. Prepare RDBMS metastore. + + If PostgreSQL or MySQL has been installed already in your environment, you can choose one of them as the metastore. + + **Note**: + + + For the production environment, we recommend to setup a dedicated metastore. You can use PostgreSQL which is shipped with Kylin 5.x. + + The database name of metastore **must start with an English character**. + + Please refer to the below links for complete steps to install and configure: + + * [Use PostgreSQL as Metastore](../deployment/on-premises/rdbms_metastore/postgresql/default_metastore.md). + * [Use MySQL as Metastore](../deployment/on-premises/rdbms_metastore/mysql/mysql_metastore.md). + +6. (optional) Install InfluxDB. + + Kylin uses InfluxDB to save various system monitoring information. If you do not need to view related information, you can skip this step. It is strongly recommended to complete this step in a production environment and use related monitoring functions. + + ```sh + cd $KYLIN_HOME/influxdb + + # install influxdb + rpm -ivh influxdb-1.6.5.x86_64.rpm + ``` + + For more details, please refer to [Use InfluxDB as Time-Series Database](../operations/monitoring/influxdb/influxdb.md). + +6. Create a working directory on HDFS and grant permissions. + + The default working directory is `/kylin`. Also ensure the Linux account has access to its home directory on HDFS. Meanwhile, create directory `/kylin/spark-history` to store the spark log files. + + ```sh + hadoop fs -mkdir -p /kylin + hadoop fs -chown root /kylin + hadoop fs -mkdir -p /kylin/spark-history + hadoop fs -chown root /kylin/spark-history + ``` + + If necessary, you can modify the path of the Kylin working directory in `$KYLIN_HOME/conf/kylin.properties`. + + **Note**: If you do not have the permission to create `/kylin/spark-history`, you can configure `kylin.engine.spark-conf.spark.eventLog.dir` and `kylin.engine.spark-conf.spark.history.fs.logDirectory` with an available directory. + +### <span id="configuration">Quick Configuration</span> + +In the `conf` directory under the root directory of the installation package, you should configure the parameters in the file `kylin.properties` as follows: + +1. According to the PostgreSQL configuration, configure the following metadata parameters. Pay attention to replace the corresponding ` {metadata_name} `, `{host} `, ` {port} `, ` {user} `, ` {password} ` value, the maximum length of `metadata_name` allowed is 28. + + ```properties + kylin.metadata.url={metadata_name}@jdbc,driverClassName=org.postgresql.Driver,url=jdbc:postgresql://{host}:{port}/kylin,username={user},password={password} + ``` + For more PostgreSQL configuration, please refer to [Use PostgreSQL as Metastore](../deployment/on-premises/rdbms_metastore/postgresql/default_metastore.md). For information for MySQL configuration, please refer to [Use MySQL as Metastore](../deployment/on-premises/rdbms_metastore/mysql/mysql_metastore.md). + + > **Note**: please name the `{metadata_name}` with letters, numbers, or underscores. The name can't start with numbers, such as `1a` is illegal and `a1` is legal. + +2. When executing jobs, Kylin will submit the build task to Yarn. You can set and replace `{queue}` in the following parameters as the queue you actually use, and require the build task to be submitted to the specified queue. + + ```properties + kylin.engine.spark-conf.spark.yarn.queue={queue_name} + ``` + + +3. Configure the ZooKeeper service. + + Kylin uses ZooKeeper for service discovery, which will ensure that when an instance starts, stops, or unexpectedly interrupts communication during cluster deployment, other instances in the cluster can automatically discover and update the status. For more details, pleaser refer to [Service Discovery](../deployment/on-premises/deploy_mode/service_discovery.md). + + Please add ZooKeeper's connection configuration `kylin.env.zookeeper-connect-string=host:port`. You can modify the cluster address and port according to the following example. + + ```properties + kylin.env.zookeeper-connect-string=10.1.2.1:2181,10.1.2.2:2181,10.1.2.3:2181 + ``` + +4. (optional) Configure Spark Client node information + Since Spark is started in yarn-client mode, if the IP information of Kylin is not configured in the hosts file of the Hadoop cluster, please add the following configurations in `kylin.properties`: + `kylin.storage.columnar.spark-conf.spark.driver.host={hostIp}` + `kylin.engine.spark-conf.spark.driver.host={hostIp}` + + You can modify the {hostIp} according to the following example: + ```properties + kylin.storage.columnar.spark-conf.spark.driver.host=10.1.3.71 + kylin.engine.spark-conf.spark.driver.host=10.1.3.71 + ``` + + + + +### <span id="start">Start Kylin</span> + +1. Check the version of `curl`. + + Since `check-env.sh` needs to rely on the support of GSS-Negotiate during the installation process, it is recommended that you check the relevant components of your curl first. You can use the following commands in your environment: + + ```shell + curl --version + ``` + If GSS-Negotiate is displayed in the interface, the curl version is available. If not, you can reinstall curl or add GSS-Negotiate support. +  + +2. Start Kylin with the startup script. + Run the following command to start Kylin. When it is first started, the system will run a series of scripts to check whether the system environment has met the requirements. For details, please refer to the [Environment Dependency Check](../operations/system-operation/cli_tool/environment_dependency_check.md) chapter. + + ```shell + ${KYLIN_HOME}/bin/kylin.sh start + ``` + > **Note**:If you want to observe the detailed startup progress, run: + > + > ```shell + > tail -f $KYLIN_HOME/logs/kylin.log + > ``` + + +Once the startup is completed, you will see information prompt in the console. Run the command below to check the Kylin process at any time. + + ```shell + ps -ef | grep kylin + ``` + +3. Get login information. + + After the startup script has finished, the random password of the default user `ADMIN` will be displayed on the console. You are highly recommended to save this password. If this password is accidentally lost, please refer to [ADMIN User Reset Password](../operations/access-control/user_management.md). + +### <span id="use">How to Use</span> + +After Kylin is started, open web GUI at `http://{host}:7070/kylin`. Please replace `host` with your host name, IP address, or domain name. The default port is `7070`. + +The default user name is `ADMIN`. The random password generated by default will be displayed on the console when Kylin is started for the first time. After the first login, please reset the administrator password according to the password rules. + +- At least 8 characters. +- Contains at least one number, one letter, and one special character ```(~!@#$%^&*(){}|:"<>?[];',./`)```. + +Kylin uses the open source **SSB** (Star Schema Benchmark) dataset for star schema OLAP scenarios as a test dataset. You can verify whether the installation is successful by running a script to import the SSB dataset into Hive. The SSB dataset is from multiple CSV files. + +**Import Sample Data** + +Run the following command to import the sample data: + +```shell +$KYLIN_HOME/bin/sample.sh +``` + +The script will create 1 database **SSB** and 6 Hive tables then import data into it. + +After running successfully, you should be able to see the following information in the console: + +```shell +Sample hive tables are created successfully +``` + + +We will be using SSB dataset as the data sample to introduce Kylin in several sections of this product manual. The SSB dataset simulates transaction data for the online store, see more details in [Sample Dataset](sample_dataset.md). Below is a brief introduction. + + +| Table | Description | Introduction | +| ----------- | ------------------------------------- | ------------------------------------------------------------ | +| CUSTOMER | customer information | includes customer name, address, contact information .etc. | +| DATES | order date | includes a order's specific date, week, month, year .etc. | +| LINEORDER | order information | includes some basic information like order date, order amount, order revenue, supplier ID, commodity ID, customer Id .etc. | +| PART | product information | includes some basic information like product name, category, brand .etc. | +| P_LINEORDER | view based on order information table | includes all content in the order information table and new content in the view | +| SUPPLIER | supplier information | includes supplier name, address, contact information .etc. | + + +**Validate Product Functions** + +You can create a sample project and model according to [Expert Mode Tutorial](expert_mode_tutorial.md). The project should validate basic features such as source table loading, model creation, index build etc. + +On the **Data Asset -> Model** page, you should see an example model with some storage over 0.00 KB, this indicates the data has been loaded for this model. + + + +On the **Monitor** page, you can see all jobs have been completed successfully in **Batch Job** and **Streaming Job** pages. + + + +**Validate Query Analysis** + +When the metadata is loaded successfully, at the **Insight** page, 6 sample hive tables would be shown at the left panel. User could input query statements against these tables. For example, the SQL statement queries different product group by order date, and in descending order by total revenue: + +```sql +SELECT LO_PARTKEY, SUM(LO_REVENUE) AS TOTAL_REVENUE +FROM SSB.P_LINEORDER +WHERE LO_ORDERDATE between '19930601' AND '19940601' +group by LO_PARTKEY +order by SUM(LO_REVENUE) DESC +``` + + +The query result will be displayed at the **Insight** page, showing that the query hit the sample model. + + + +You can also use the same SQL statement to query on Hive to verify the result and performance. + + + +### <span id="stop">Stop Kylin</span> + +Run the following command to stop Kylin: + +```shell +$KYLIN_HOME/bin/kylin.sh stop +``` + +You can run the following command to check if the Kylin process has stopped. + +```shell +ps -ef | grep kylin +``` + +### <span id="faq">FAQ</span> + +**Q: How do I change the service default port?** + +You can modify the following configuration in the `$KYLIN_HOME/conf/kylin.properties`, here is an example for setting the server port to 7070. + +```properties +server.port=7070 +``` + +**Q: Does Kylin support Kerberos integration?** + +Yes, if your cluster enables Kerberos authentication protocol, the Spark embedded in Kylin needs proper configuration to access your cluster resource securely. For more information, please refer to [Integrate with Kerberos](#TODO)(Details doc will come soon). + +**Q: Is the query pushdown engine turned on by default?** + +Yes, if you want to turn it off, please refer to [Pushdown to SparkSQL](../query/pushdown/pushdown_to_embedded_spark.md). + diff --git a/website/sidebars.js b/website/sidebars.js index fd175df36d..7307aa5b33 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -35,6 +35,10 @@ const sidebars = { id: 'quickstart/intro', }, items: [ + { + type: 'doc', + id: 'quickstart/quick_start', + }, { type: 'doc', id: 'quickstart/expert_mode_tutorial', @@ -214,9 +218,37 @@ const sidebars = { ], }, { - type: 'doc', - id: 'deployment/installation/uninstallation' - } + type: 'category', + label: 'Install and Uninstall', + link: { + type: 'doc', + id: 'deployment/installation/intro', + }, + items: [ + { + type: 'category', + label: 'Install On Platforms', + link: { + type: 'doc', + id: 'deployment/installation/platform/intro', + }, + items: [ + { + type: 'doc', + id: 'deployment/installation/platform/install_on_apache_hadoop', + }, + ], + }, + { + type: 'doc', + id: 'deployment/installation/uninstallation', + }, + { + type: 'doc', + id: 'deployment/installation/install_validation', + }, + ], + }, ], }, {