This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git

commit 5a0dfdb95eb892fc9d3f51a2cd33331b40fd24f6
Author: xuekaiqi <kaiqi....@qq.com>
AuthorDate: Wed Nov 11 13:32:54 2020 +0800

    add sample dataset introduction
---
 website/_data/docs-cn.yml                |   1 +
 website/_data/docs.yml                   |   1 +
 website/_docs/howto/sample_dataset.cn.md |  98 +++++++++++++++++++++++++++++++
 website/_docs/howto/sample_dataset.md    |  84 ++++++++++++++++++++++++++
 website/images/SampleDataset/dataset.png | Bin 0 -> 67621 bytes
 5 files changed, 184 insertions(+)

diff --git a/website/_data/docs-cn.yml b/website/_data/docs-cn.yml
index 290f255..764477c 100644
--- a/website/_data/docs-cn.yml
+++ b/website/_data/docs-cn.yml
@@ -73,3 +73,4 @@
   - howto/howto_cleanup_storage
   - howto/howto_use_cli
   - howto/howto_use_hive_mr_dict
+  - howto/sample_dataset
diff --git a/website/_data/docs.yml b/website/_data/docs.yml
index 75ddcee..90af718 100644
--- a/website/_data/docs.yml
+++ b/website/_data/docs.yml
@@ -91,6 +91,7 @@
   - howto/howto_enable_zookeeper_acl
   - howto/howto_use_health_check_cli
   - howto/howto_use_hive_mr_dict
+  - howto/sample_dataset
 
 - title: Security
   docs:
diff --git a/website/_docs/howto/sample_dataset.cn.md 
b/website/_docs/howto/sample_dataset.cn.md
new file mode 100644
index 0000000..d974d38
--- /dev/null
+++ b/website/_docs/howto/sample_dataset.cn.md
@@ -0,0 +1,98 @@
+---
+layout: docs-cn
+title:  样例数据集
+categories: howto
+permalink: /cn/docs/howto/sample_dataset.html
+---
+
+# 样例数据集
+
+Kylin 的二进制包中包含了一份样例数据集,共计 5 张表,其中事实表有 10000 条数据。用户可以在 Kylin 
部署完成后,利用样例数据集进行测试。用户可通过执行脚本方式,将 Kylin 中自带的样例数据导入至 Hive。
+
+### 将样例数据集导入至 Hive
+
+导入样例数据集的可执行脚本为 **sample.sh** ,其默认存放路径为系统安装目录下的 **/bin** 目录:
+
+```sh
+$KYLIN_HOME/bin/sample.sh
+```
+
+脚本执行成功之后,可在服务器终端执行 **hive** 命令行,进入 hive,然后执行查询语句验证导入正常:
+
+```she
+hive
+```
+
+系统默认将 5 张表导入 Hive 的 `default` 数据库中,用户可以检查导入 Hive 的表清单或查询具体表:
+
+```sql
+hive> use default;
+hive> show tables;
+hive> select count(*) from kylin_sales;
+```
+
+> 提示:如果需要将表导入至 Hive 中指定的数据库,您可以修改 Kylin 配置文件 
`$KYLIN_HOME/conf/kylin.properties` 中的配置项 
`kylin.source.hive.database-for-flat-table` 至指定的 Hive 数据库。
+
+### 数据表介绍
+
+本产品支持星型数据模型和雪花模型。本文中用到的样例数据集是一个规范的雪花模型结构,它总共包含了 5 个数据表:
+
+- **KYLIN_SALES**
+
+  事实表,保存了销售订单的明细信息,每一行对应着一笔交易订单。交易记录包含了卖家、商品分类、订单金额、商品数量等信息,
+
+- **KYLIN_CATEGORY_GROUPINGS**
+
+  维度表,保存了商品分类的详细介绍,例如商品分类名称等。
+
+- **KYLIN_CAL_DT**
+
+  维度表,保存了时间的扩展信息。如单个日期所在的年始、月始、周始、年份、月份等。
+
+- **KYLIN_ACCOUNT**
+
+  维度表,用户账户表,每行是一个用户。用户在事实表中可以是买方(Buyer)或者卖方(Seller)。通过 ACCOUNT_ID 链接到 
**KYLIN_SALES** 的 BUYER_ID 或者 SELLER_ID 上。
+
+- **KYLIN_COUNTRY**
+
+  维度表,用户所在的国家表,链接到 **KYLIN_ACCOUNT**。
+
+这5张表一起构成了整个雪花模型的结构,下图是实例-关系(ER)图:
+
+![样例数据表](/images/SampleDataset/dataset.png)
+
+### 数据表与关系
+
+通过脚本 `sample.sh`  生成的 Hive 表中包含的列较多,下面以样例项目 `learn_kylin` 中模型 
`kylin_sales_models` 中被定义为维度的列为主,介绍一些主要的列。
+
+| 表                       | 字段                | 意义           |
+| :----------------------- | :------------------ | :------------- |
+| KYLIN_SALES              | TRANS_ID             | 订单 ID      |
+| KYLIN_SALES              | PART_DT             | 订单日期       |
+| KYLIN_SALES              | LEAF_CATEG_ID       | 商品分类 ID    |
+| KYLIN_SALES              | LSTG_SITE_ID       | 网站 ID    |
+| KYLIN_SALES              | SELLER_ID           | 卖家 ID        |
+| KYLIN_SALES              | BUYER_ID            | 买家 ID        |
+| KYLIN_SALES              | PRICE               | 订单金额       |
+| KYLIN_SALES              | ITEM_COUNT          | 购买商品个数   |
+| KYLIN_SALES              | LSTG_FORMAT_NAME    | 订单交易类型   |
+| KYLIN_SALES              | OPS_USER_ID          | 系统用户 ID  |
+| KYLIN_SALES              | OPS_REGION    | 系统用户地区   |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD1 | 用户定义字段 1 |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD3 | 用户定义字段 3 |
+| KYLIN_CATEGORY_GROUPINGS | UPD_DATE            | 更新日期       |
+| KYLIN_CATEGORY_GROUPINGS | UPD_USER            | 更新负责人     |
+| KYLIN_CATEGORY_GROUPINGS | META_CATEG_NAME     | 一级分类       |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL2_NAME     | 二级分类       |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL3_NAME     | 三级分类       |
+| KYLIN_CAL_DT             | CAL_DT              | 日期           |
+| KYLIN_CAL_DT             | WEEK_BEG_DT         | 周始日期       |
+| KYLIN_CAL_DT             | MONTH_BEG_DT        | 月始日期       |
+| KYLIN_CAL_DT             | YEAR_BEG_DT         | 年始日期       |
+| KYLIN_ACCOUNT            | ACCOUNT_ID          | 用户账户 ID    |
+| KYLIN_ACCOUNT            | ACCOUNT_COUNTRY     | 账户所在国家 ID |
+| KYLIN_ACCOUNT            | ACCOUNT_BUYER_LEVEL          | 买家账户等级 |
+| KYLIN_ACCOUNT            | ACCOUNT_SELLER_LEVEL     | 卖家账户等级 |
+| KYLIN_ACCOUNT            | ACCOUNT_CONTACT     | 账户联系方式 |
+| KYLIN_COUNTRY            | COUNTRY             | 国家 ID        |
+| KYLIN_COUNTRY            | NAME                | 国家名称       |
\ No newline at end of file
diff --git a/website/_docs/howto/sample_dataset.md 
b/website/_docs/howto/sample_dataset.md
new file mode 100644
index 0000000..e8726bb
--- /dev/null
+++ b/website/_docs/howto/sample_dataset.md
@@ -0,0 +1,84 @@
+---
+layout: docs
+title:  Sample Dataset
+categories: tutorial
+permalink: /docs/howto/sample_dataset.html
+---
+
+## Sample Dataset
+
+Kylin binary package contains a sample dataset for testing. It consists of 
five tables, including the fact table which has 10,000 rows. Because of the 
small data size, it is convenient to carry out as a test in the virtual 
machine. You can import the Kylin built-in sample data into Hive using 
executable script.
+
+### Import Sample Dataset into Hive
+
+The script is `sample.sh`. Its default storage path is the bin directory under 
`$KYLIN_HOME/bin`
+
+```sh
+$KYLIN_HOME/bin/sample.sh
+```
+
+Once the script is complete, execute the following commands to enter Hive. 
Then you can confirm whether the tables are imported successfully.
+
+```she
+hive
+```
+
+By default, the script imports 5 tables into Hive's `default` database. You 
can check the tables imported into Hive or query some tables:
+
+```sql
+hive> use default;
+hive> show tables;
+hive> select count(*) from kylin_sales;
+```
+
+> Tip: If you need to import the table to the specified database in Hive, you 
can modify the configuration item `kylin.source.hive.database-for-flat-table` 
in the Kylin configuration file `$KYLIN_HOME/conf/kylin.properties` to the 
specified Hive database.
+
+### Table Introduction
+
+Kylin supports both star schema and snowflake data model. In this manual, we 
will use a typical snowflake data model as our sample data set which contains 
five tables:
+
+- **KYLIN_SALES** This is the fact table, it contains detail information of 
sales orders. Each row holds information such as the seller, the commodity 
classification, the amount of orders, the quantity of goods, etc. Each row 
corresponds to a transaction.
+- **KYLIN_CATEGORY_GROUPINGS** This is a dimension table, it represents 
details of commodity classification, such as, name of commodity category, etc.
+- **KYLIN_CAL_DT** This is another dimension table which extends information 
of dates, such as beginning date of the year, beginning date of the month, 
beginning date of the week.
+- **KYLIN_ACCOUNT** This is the user account table. Each row represents a user 
who could be a buyer and/or a seller of a specific transaction, which links to 
**KYLIN_SALES** through the BUYER_ID or SELLER_ID.
+- **KYLIN_COUNTRY** This is the country dimension table linking to 
**KYLIN_ACCOUNT**.
+
+The five tables together constitute the structure of the entire snowflake data 
model. Below is a relational diagram of them.
+
+![Sample Table](/images/SampleDataset/dataset.png)
+
+### Data Dictionary
+
+The tables generated by the script `sample.sh` contains many columns. Below, 
we will introduce some key columns which are focus on the columns defined as 
dimensions in the model `kylin_sales_models` of the sample project 
`learn_kylin`.
+
+| Table                    | Field                | Description                
      |
+| :----------------------- | :------------------- | 
:------------------------------- |
+| KYLIN_SALES              | TRANS_ID             | Order ID                   
      |
+| KYLIN_SALES              | PART_DT              | Order Date                 
      |
+| KYLIN_SALES              | LEAF_CATEG_ID        | ID Of Commodity Category   
      |
+| KYLIN_SALES              | LSTG_SITE_ID         | Site ID                    
      |
+| KYLIN_SALES              | SELLER_ID            | Account ID Of Seller       
      |
+| KYLIN_SALES              | BUYER_ID             | Account ID Of Buyer        
      |
+| KYLIN_SALES              | PRICE                | Order Amount               
      |
+| KYLIN_SALES              | ITEM_COUNT           | The Number Of Purchased 
Goods    |
+| KYLIN_SALES              | LSTG_FORMAT_NAME     | Order Transaction Type     
      |
+| KYLIN_SALES              | OPS_USER_ID          | System User ID             
      |
+| KYLIN_SALES              | OPS_REGION           | System User Region         
      |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD1  | User Defined Fields 1      
      |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD3  | User Defined Fields 3      
      |
+| KYLIN_CATEGORY_GROUPINGS | UPD_DATE             | Update Date                
      |
+| KYLIN_CATEGORY_GROUPINGS | UPD_USER             | Update User                
      |
+| KYLIN_CATEGORY_GROUPINGS | META_CATEG_NAME      | Level 1 Category           
      |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL2_NAME      | Level 2 Category           
      |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL3_NAME      | Level 3 Category           
      |
+| KYLIN_CAL_DT             | CAL_DT               | Date                       
      |
+| KYLIN_CAL_DT             | WEEK_BEG_DT          | Week Beginning Date        
      |
+| KYLIN_CAL_DT             | MONTH_BEG_DT         | Month Beginning Date       
      |
+| KYLIN_CAL_DT             | YEAR_BEG_DT          | Year Beginning Date        
      |
+| KYLIN_ACCOUNT            | ACCOUNT_ID           | ID Number Of Account       
      |
+| KYLIN_ACCOUNT            | ACCOUNT_COUNTRY      | Country ID Where Account 
Resides |
+| KYLIN_ACCOUNT            | ACCOUNT_BUYER_LEVEL  | Buyer Account Level        
      |
+| KYLIN_ACCOUNT            | ACCOUNT_SELLER_LEVEL | Seller Account Level       
      |
+| KYLIN_ACCOUNT            | ACCOUNT_CONTACT      | Contact of Account         
      |
+| KYLIN_COUNTRY            | COUNTRY              | Country ID                 
      |
+| KYLIN_COUNTRY            | NAME                 | Descriptive Name Of 
Country      |
\ No newline at end of file
diff --git a/website/images/SampleDataset/dataset.png 
b/website/images/SampleDataset/dataset.png
new file mode 100644
index 0000000..2c20652
Binary files /dev/null and b/website/images/SampleDataset/dataset.png differ

Reply via email to