kylin git commit: minor changes on documents

mahongbin Tue, 16 Feb 2016 18:38:10 -0800

Repository: kylin
Updated Branches:
  refs/heads/document fe0d56898 -> ed810ebea



minor changes on documents


Project: http://git-wip-us.apache.org/repos/asf/kylin/repo
Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/ed810ebe
Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/ed810ebe
Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/ed810ebe

Branch: refs/heads/document
Commit: ed810ebea8f06bbeeb432469866da56a43762caf
Parents: fe0d568
Author: honma <ho...@ebay.com>
Authored: Thu Feb 11 20:44:56 2016 +0800
Committer: honma <ho...@ebay.com>
Committed: Wed Feb 17 10:36:03 2016 +0800

----------------------------------------------------------------------
 website/_dev/howto_test.md                      |  35 +++++++++++++++----
 .../_posts/blog/2016-02-03-streaming-cubing.md  |  28 +++++++++++++++
 website/images/develop/streaming.png            | Bin 0 -> 211683 bytes
 3 files changed, 56 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kylin/blob/ed810ebe/website/_dev/howto_test.md
----------------------------------------------------------------------
diff --git a/website/_dev/howto_test.md b/website/_dev/howto_test.md
index 150ece8..6aaa056 100644
--- a/website/_dev/howto_test.md
+++ b/website/_dev/howto_test.md
@@ -7,21 +7,19 @@ permalink: /development/howto_test.html
 
 In general, there should be unit tests to cover individual classes; there must 
be integration test to cover end-to-end scenarios like build, merge, and query. 
Unit test must run independently (does not require an external sandbox).
 
-
-## 2.x branches
+## Test 2.x branches
 
 * `mvn test` to run unit tests, which has a limited test coverage.
     * Unit tests has no external dependency and can run on any machine.
     * The unit tests do not cover end-to-end scenarios like build, merge, and 
query.
     * The unit tests take a few minutes to complete.
 * `dev-support/test_all_against_hdp_2_2_4_2_2.sh` to run integration tests, 
which has the best test coverage.
-    * Integration tests __must run on a Hadoop sandbox__. Make sure all 
changes you want to test are avaiable on sandbox.
+    * Integration tests __better be run on a Hadoop sandbox__. We suggest to 
checkout a copy of code in your sandbox and direct run the 
test_all_against_hdp_2_2_4_2_2.sh in it. If you don't want to put codes on 
sandbox, refer to __More on 2.x UT/IT separation__
     * As the name indicates, the script is only for hdp 2.2.4.2, but you get 
the idea of how integration test run from it.
     * The integration tests start from generate random data, then build cube, 
merge cube, and finally query the result and compare to H2 DB.
-    * The integration tests take a few hours to complete.
-
+    * The integration tests take one to two hours to complete.
 
-## 1.x branches
+## Test 1.x branches
 
 * `mvn test` to run unit tests, which has a limited test coverage.
     * What's special about 1.x is that a hadoop/hbase mini cluster is used to 
cover queries in unit test.
@@ -32,12 +30,34 @@ In general, there should be unit tests to cover individual 
classes; there must b
     * `mvn test  -fae -P sandbox`
     * `mvn test  -fae  -Dtest=org.apache.kylin.query.test.IIQueryTest 
-Dhdp.version=2.2.0.0-2041 -DfailIfNoTests=false -P sandbox`
 
+## More on 2.x UT/IT separation
+
+From Kylin 2.0 you can run UT(Unit test), environment cube provision and 
IT(Integration test) separately. 
+Running `mvn verify -Dhdp.version=2.2.4.2-2`  (assume you're on your sandbox) 
is all you need to run a complete all the test suites.
+
+It will execute the following steps sequentially:
+ 
+    1. Build Artifacts 
+    2. Run all UTs (takes few minutes) 
+    3. Provision cubes on the sandbox environment for IT uasge (takes 1~2 
hours) 
+    4. Run all ITs (takes few tens of minutes) 
+    5. verify jar stuff 
+
+If your code change is minor and it merely requires running UT, use: 
+`mvn test`
+If your sandbox is already provisioned and your code change will not affect 
the result of sandbox provision, (and you don't want to wait hours of 
provision) just run the following commands to separately run UT and IT: 
+`mvn test`
+`mvn failsafe:integration-test`
+
+### Cube Provision
+
+Environment cube provision is indeed running kylin cubing jobs to prepare 
example cubes in the sandbox. These prepared cubes will be used by the ITs. 
Currently provision step is bound with the maven pre-integration-test phase, 
and it contains running BuildCubeWithEngine (HBase required), 
BuildCubeWithStream(Kafka required) and BuildIIWithStream(Kafka Required). You 
can run the mvn commands on you sandbox or your develop computer. For the 
latter case you need to set kylin.job.run.as.remote.cmd=true in 
__$KYLIN_HOME/examples/test_case_data/sandbox/kylin.properties__. 
+Try appending `-DfastBuildMode=true` to mvn verify command to speed up 
provision by skipping incremental cubing. 
 
 ## More on 1.x Mini Cluster
 
 Kylin 1.x used to move as many as possible unit test cases from sandbox to 
HBase mini cluster (not any more in 2.x), so that user can run tests easily in 
local without a hadoop sandbox. Two maven profiles are created in the root 
pom.xml, "default" and "sandbox". The default profile will startup a HBase Mini 
Cluster to prepare the test data and run the unit tests (the test cases that 
are not supported by Mini cluster have been added in the "exclude" list). If 
you want to keep using Sandbox to run test, just run `mvn test -P sandbox`
 
-
 ### When use the "default" profile, Kylin will
 
 * Startup a HBase minicluster and update KylinConfig with the dynamic HBase 
configurations
@@ -46,4 +66,5 @@ Kylin 1.x used to move as many as possible unit test cases 
from sandbox to HBase
 * After all test cases be completed, shutdown minicluster and cleanup 
KylinConfig cache
 
 ### To ensure Mini cluster can run successfully, you need
+
 * Make sure JAVA_HOME is properly set

http://git-wip-us.apache.org/repos/asf/kylin/blob/ed810ebe/website/_posts/blog/2016-02-03-streaming-cubing.md
----------------------------------------------------------------------
diff --git a/website/_posts/blog/2016-02-03-streaming-cubing.md 
b/website/_posts/blog/2016-02-03-streaming-cubing.md
new file mode 100644
index 0000000..525f4b8
--- /dev/null
+++ b/website/_posts/blog/2016-02-03-streaming-cubing.md
@@ -0,0 +1,28 @@
+---
+layout: post-blog
+title:  Streaming cubing (Prototype)
+date:   2016-02-03 16:30:00
+author: Hongbin Ma
+categories: blog
+---
+
+
+One of the most important features in 2.x branches is streaming cubing which 
enables OLAP analysis on streaming data. Streaming cubing delivers faster 
insights on the data to help more promptly business decisions. Even though 
there are already many real time analysis tools in open source community, Kylin 
Streaming cubing still differs from them in multiple angles:
+
+Firstly, Kylin Streaming Cubing aligns with Kylin traditional cubing to 
provided unified, ANSI SQL interface. Actually Kylin Streaming shares the 
storage engine and query engine with traditional Kylin cubes, so in theory all 
of the optimization techniques to save storage and speed up query performance 
can also be applied on streaming cubes. Besides, all the supported 
aggregations/filters/UDFs still work for streaming cubes. By unifying the 
storage engine and query engine we also get freed from double amount of 
maintaince work. 
+    
+Secondly, Kylin Streaming Cubing does not require large amount of memory to 
store real time data, nor does it attempts to provide truly "real time" 
analysis. By our customer survey we found that minutes of visualization latency 
is acceptable for OLAP analysts. So our streaming cubing adopts the micro batch 
approach. Incoming streaming data are partitioned into different time windows 
and we build a micro batch for each time window. The cube output for each micro 
batch is directly saved to HBase. The query engine goes to HBase for data 
retrieving instead of the data ingestion server. The benefit of such design is 
that we don't have to maintain large amount of in-memory index which could 
easily require tens of gigabytes of memory. In the future Kylin might need to 
consider truly "real time" support, too.  
+
+Thirdly, Kylin Streaming Cubing data will be persistent and gradually be 
converted to traditional cubes, thus customers can still query "cold data" 
without any compromise on performance. As discussed above the output of 
streaming cubing is directly saved to HBase as a new segment. The traditional 
job engine will be notified of the new segment and take over to schedule merge 
jobs when then segments accumulates. Day after day the segments of the 
streaming cube got merged and become a very large traditional cube. 
+   
+![Kylin Streaming Framework Architecture](/images/develop/streaming.png)
+      
+With the major difference in mind we will introduce the modules for Kylin 
Streaming cubing. Kylin Streaming cubing consist of three major parts:
+
+* Streaming Input to retrieve data from a replayable data queue (currently it 
is Kafka) within given time window. Streaming Input is also responsible for 
primary data cleaning and normalization. By default Kylin Streaming provides a 
default implementation to parse the messages from the source queue. Customers 
can choose to configure the parser or provide a brand new one based on their 
requirements.  
+* Streaming Batch Ingestion to ingest the incoming data batch and transform it 
into a micro cube. Thanks to the latest Kylin In-memory cubing technology, this 
step is now times faster and space-saving than previous. The micro cube is 
directly saved to HBase.
+* Job Scheduling Module to trigger Streaming Batch Ingestion. Kylin does not 
put too much efforts in job scheduling, streaming cubing is not a exception. 
Currently we provided a simple implementation based on Linux Crontab.
+    
+We'll publish more detailed documents on how to use Kylin Streaming soon. In 
latest 2.x branch we are also working on more complicated load balancing 
schemes for streaming cubing. Please stay tuned.
+
+   
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kylin/blob/ed810ebe/website/images/develop/streaming.png
----------------------------------------------------------------------
diff --git a/website/images/develop/streaming.png 
b/website/images/develop/streaming.png
new file mode 100644
index 0000000..1123e14
Binary files /dev/null and b/website/images/develop/streaming.png differ

kylin git commit: minor changes on documents

Reply via email to