Author: lidong Date: Thu Jul 19 14:07:53 2018 New Revision: 1836274 URL: http://svn.apache.org/viewvc?rev=1836274&view=rev Log: update spark cubing cn doc
Modified: kylin/site/cn/docs/tutorial/cube_spark.html kylin/site/feed.xml Modified: kylin/site/cn/docs/tutorial/cube_spark.html URL: http://svn.apache.org/viewvc/kylin/site/cn/docs/tutorial/cube_spark.html?rev=1836274&r1=1836273&r2=1836274&view=diff ============================================================================== --- kylin/site/cn/docs/tutorial/cube_spark.html (original) +++ kylin/site/cn/docs/tutorial/cube_spark.html Thu Jul 19 14:07:53 2018 @@ -183,34 +183,26 @@ export KYLIN_HOME=/usr/local/apache-kyli <h2 id="kylinenvhadoop-conf-dir">åå¤ âkylin.env.hadoop-conf-dirâ</h2> -<p>为使 Spark è¿è¡å¨ Yarn ä¸ï¼éæå® <strong>HADOOP_CONF_DIR</strong> ç¯å¢åéï¼å ¶æ¯ä¸ä¸ªå å« Hadoopï¼å®¢æ·ç«¯) é ç½®æä»¶çç®å½ãè®¸å¤ Hadoop åå¸å¼çç®å½è®¾ç½®ä¸º â/etc/hadoop/confâï¼ä½ Kylin ä¸ä» éè¦è®¿é® HDFSï¼Yarn å Hiveï¼è¿æ HBaseï¼å æ¤é»è®¤çç®å½å¯è½å¹¶æªå 嫿æéè¦çæä»¶ã卿¤ç¨ä¾ä¸ï¼æ¨éè¦å建ä¸ä¸ªæ°çç®å½ç¶åæ·è´æè è¿æ¥è¿äºå®¢æ·ç«¯æä»¶ (core-site.xmlï¼hdfs-site.xmlï¼yarn-site.xmlï¼hive-site.xml å hbase-site.xml) å°è¿ä¸ªç®å½ä¸ãå¨ HDP 2.4 ä¸ï¼hive-tez å Spark ä¹é´æä ¸ªå²çªï¼å æ¤å½ä¸º Kylin è¿è¡å¤å¶æ¶ï¼éè¦å°é»è®¤ç engine ç± âtezâ æ¢ä¸º âmrâã</p> +<p>为使 Spark è¿è¡å¨ Yarn ä¸ï¼éæå® <strong>HADOOP_CONF_DIR</strong> ç¯å¢åéï¼å ¶æ¯ä¸ä¸ªå å« Hadoopï¼å®¢æ·ç«¯) é ç½®æä»¶çç®å½ï¼éå¸¸æ¯ <code class="highlighter-rouge">/etc/hadoop/conf</code>ã</p> -<div class="highlight"><pre><code class="language-groff" data-lang="groff">mkdir $KYLIN_HOME/hadoop-conf -ln -s /etc/hadoop/conf/core-site.xml $KYLIN_HOME/hadoop-conf/core-site.xml -ln -s /etc/hadoop/conf/hdfs-site.xml $KYLIN_HOME/hadoop-conf/hdfs-site.xml -ln -s /etc/hadoop/conf/yarn-site.xml $KYLIN_HOME/hadoop-conf/yarn-site.xml -ln -s /etc/hbase/2.4.0.0-169/0/hbase-site.xml $KYLIN_HOME/hadoop-conf/hbase-site.xml -cp /etc/hive/2.4.0.0-169/0/hive-site.xml $KYLIN_HOME/hadoop-conf/hive-site.xml -vi $KYLIN_HOME/hadoop-conf/hive-site.xml (change "hive.execution.engine" value from "tez" to "mr")</code></pre></div> +<p>é常 Kylin ä¼å¨å¯å¨æ¶ä» Java classpath 䏿£æµ Hadoop é ç½®ç®å½ï¼å¹¶ä½¿ç¨å®æ¥å¯å¨ Sparkã 妿æ¨çç¯å¢ä¸æªè½æ£ç¡®åç°æ¤ç®å½ï¼é£ä¹å¯ä»¥æ¾å¼å°æå®æ¤ç®å½ï¼å¨ <code class="highlighter-rouge">kylin.properties</code> ä¸è®¾ç½®å±æ§ âkylin.env.hadoop-conf-dirâ 好让 Kylin ç¥éè¿ä¸ªç®å½:</p> -<p>ç°å¨ï¼å¨ kylin.properties ä¸è®¾ç½®å±æ§ âkylin.env.hadoop-conf-dirâ 好让 Kylin ç¥éè¿ä¸ªç®å½:</p> - -<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.env.hadoop-conf-dir=/usr/local/apache-kylin-2.1.0-bin-hbase1x/hadoop-conf</code></pre></div> - -<p>妿è¿ä¸ªå±æ§æ²¡æè®¾ç½®ï¼Kylin å°ä¼ä½¿ç¨ âhive-site.xmlâ ä¸çé»è®¤ç®å½ï¼ç¶èé£ä¸ªæä»¶å¤¹å¯è½å¹¶æ²¡æ âhbase-site.xmlâï¼ä¼å¯¼è´ Spark ç HBase/ZK è¿æ¥é误ã</p> +<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.env.hadoop-conf-dir=/etc/hadoop/conf</code></pre></div> <h2 id="spark-">æ£æ¥ Spark é ç½®</h2> -<p>Kylin å¨ $KYLIN_HOME/spark ä¸åµå ¥ä¸ä¸ª Spark binary (v2.1.0)ï¼ææä½¿ç¨ <em>âkylin.engine.spark-conf.â</em> ä½ä¸ºåç¼ç Spark é ç½®å±æ§é½è½å¨ $KYLIN_HOME/conf/kylin.properties ä¸è¿è¡ç®¡çãè¿äºå±æ§å½è¿è¡æäº¤ Spark job æ¶ä¼è¢«æåå¹¶åºç¨ï¼ä¾å¦ï¼å¦ææ¨é ç½® âkylin.engine.spark-conf.spark.executor.memory=4Gâï¼Kylin å°ä¼å¨æ§è¡ âspark-submitâ æä½æ¶ä½¿ç¨ ââconf spark.executor.memory=4Gâ ä½ä¸ºåæ°ã</p> +<p>Kylin å¨ $KYLIN_HOME/spark ä¸åµå ¥ä¸ä¸ª Spark binary (v2.1.2)ï¼ææä½¿ç¨ <em>âkylin.engine.spark-conf.â</em> ä½ä¸ºåç¼ç Spark é ç½®å±æ§é½è½å¨ $KYLIN_HOME/conf/kylin.properties ä¸è¿è¡ç®¡çãè¿äºå±æ§å½è¿è¡æäº¤ Spark job æ¶ä¼è¢«æåå¹¶åºç¨ï¼ä¾å¦ï¼å¦ææ¨é ç½® âkylin.engine.spark-conf.spark.executor.memory=4Gâï¼Kylin å°ä¼å¨æ§è¡ âspark-submitâ æä½æ¶ä½¿ç¨ ââconf spark.executor.memory=4Gâ ä½ä¸ºåæ°ã</p> <p>è¿è¡ Spark cubing åï¼å»ºè®®æ¥çä¸ä¸è¿äºé ç½®å¹¶æ ¹æ®æ¨éç¾¤çæ åµè¿è¡èªå®ä¹ãä¸é¢æ¯é»è®¤é ç½®ï¼ä¹æ¯ sandbox æä½è¦æ±çé ç½® (1 个 1GB memory ç executor)ï¼é常ä¸ä¸ªé群ï¼éè¦æ´å¤ç executors 䏿¯ä¸ä¸ªè³å°æ 4GB memory å 2 cores:</p> <div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster kylin.engine.spark-conf.spark.yarn.queue=default -kylin.engine.spark-conf.spark.executor.memory=1G +kylin.engine.spark-conf.spark.executor.memory=4G +kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024 kylin.engine.spark-conf.spark.executor.cores=2 -kylin.engine.spark-conf.spark.executor.instances=1 +kylin.engine.spark-conf.spark.executor.instances=40 +kylin.engine.spark-conf.spark.shuffle.service.enabled=true kylin.engine.spark-conf.spark.eventLog.enabled=true kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history @@ -222,9 +214,9 @@ kylin.engine.spark-conf.spark.history.fs #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current #kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current</code></pre></div> -<p>为äºå¨ Hortonworks å¹³å°ä¸è¿è¡ï¼éè¦å° âhdp.versionâ æå®ä¸º Yarn 容å¨ç Java é项ï¼å æ¤è¯·åæ¶ kylin.properties çæåä¸è¡ã</p> +<p>为äºå¨ Hortonworks å¹³å°ä¸è¿è¡ï¼éè¦å° âhdp.versionâ æå®ä¸º Yarn 容å¨ç Java é项ï¼å æ¤è¯·åæ¶ kylin.properties çæåä¸è¡ç注éã</p> -<p>餿¤ä¹å¤ï¼ä¸ºäºé¿å éå¤ä¸ä¼ Spark jar å å° Yarnï¼æ¨å¯ä»¥æå¨ä¸ä¼ 䏿¬¡ï¼ç¶åé ç½® jar å ç HDFS è·¯å¾ï¼è¯·æ³¨æï¼HDFS è·¯å¾å¿ é¡»æ¯å ¨éå®åã</p> +<p>餿¤ä¹å¤ï¼ä¸ºäºé¿å éå¤ä¸ä¼ Spark jar å å° Yarnï¼æ¨å¯ä»¥æå¨ä¸ä¼ 䏿¬¡ï¼ç¶åé ç½® jar å ç HDFS è·¯å¾ï¼è¯·æ³¨æï¼HDFS è·¯å¾å¿ é¡»æ¯å ¨è·¯å¾åã</p> <div class="highlight"><pre><code class="language-groff" data-lang="groff">jar cv0f spark-libs.jar -C $KYLIN_HOME/spark/jars/ . hadoop fs -mkdir -p /kylin/spark/ @@ -232,12 +224,9 @@ hadoop fs -put spark-libs.jar /kylin/spa <p>ç¶åï¼è¦å¨ kylin.properties ä¸è¿è¡å¦ä¸é ç½®:</p> -<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.yarn.archive=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-libs.jar -kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current -kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current -kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current</code></pre></div> +<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.yarn.archive=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-libs.jar</code></pre></div> -<p>ææ âkylin.engine.spark-conf.*â åæ°é½å¯ä»¥å¨ Cube æ Project 级å«è¿è¡éåï¼è¿ä¸ºç¨æ·æä¾äºæå¤§ççµæ´»æ§ã</p> +<p>ææ âkylin.engine.spark-conf.*â åæ°é½å¯ä»¥å¨ Cube æ Project 级å«è¿è¡éåï¼è¿ä¸ºç¨æ·æä¾äºçµæ´»æ§ã</p> <h2 id="cube">å建åä¿®æ¹æ ·ä¾ cube</h2> @@ -254,7 +243,9 @@ $KYLIN_HOME/bin/kylin.sh start</code></p <p><img src="/images/tutorial/2.0/Spark-Cubing-Tutorial/2_overwrite_partition.png" alt="" /></p> -<p>æ ·ä¾ cube æä¸¤ä¸ªèå°½å åç度é: âCOUNT DISTINCTâ å âTOPN(100)âï¼å½æºæ°æ®è¾å°æ¶ï¼ä»ä»¬ç大å°ä¼°è®¡çä¸å¤ªåç¡®: é¢ä¼°ç大å°ä¼æ¯çå®ç大å¾å¤ï¼å¯¼è´äºæ´å¤ç RDD partitions 被ååï¼ä½¿å¾ build çé度éä½ã100 对äºå ¶æ¯ä¸ä¸ªè¾ä¸ºåççæ°åãç¹å» âNextâ å âSaveâ ä¿å cubeã</p> +<p>æ ·ä¾ cube æä¸¤ä¸ªèå°½å åç度é: âCOUNT DISTINCTâ å âTOPN(100)âï¼å½æºæ°æ®è¾å°æ¶ï¼ä»ä»¬ç大å°ä¼°è®¡çä¸å¤ªåç¡®: é¢ä¼°ç大å°ä¼æ¯çå®ç大å¾å¤ï¼å¯¼è´äºæ´å¤ç RDD partitions 被ååï¼ä½¿å¾ build çé度éä½ã500 对äºå ¶æ¯ä¸ä¸ªè¾ä¸ºåççæ°åãç¹å» âNextâ å âSaveâ ä¿å cubeã</p> + +<p>å¯¹äºæ²¡æâCOUNT DISTINCTâ å âTOPNâ ç cubeï¼è¯·ä¿çé»è®¤é ç½®ã</p> <h2 id="spark--cube">ç¨ Spark æå»º Cube</h2> @@ -294,7 +285,7 @@ $KYLIN_HOME/bin/kylin.sh start</code></p <h2 id="section-2">è¿ä¸æ¥</h2> -<p>å¦ææ¨æ¯ Kylin ç管çå使¯å¯¹äº Spark æ¯æ°æï¼å»ºè®®æ¨æµè§ <a href="https://spark.apache.org/docs/2.1.0/">Spark ææ¡£</a>ï¼å«å¿è®°ç¸åºå°å»æ´æ°é ç½®ãæ¨å¯ä»¥è®© Spark ç <a href="https://spark.apache.org/docs/2.1.0/job-scheduling.html#dynamic-resource-allocation">Dynamic Resource Allocation</a> çæä»¥ä¾¿å ¶å¯¹äºä¸åçå·¥ä½è´è½½è½èªå¨ä¼¸ç¼©ãSpark æ§è½ä¾èµäºé群çå åå CPU èµæºï¼å½æå¤ææ°æ®æ¨¡ååå·¨å¤§çæ°æ®é䏿¬¡æå»ºæ¶ Kylin ç Cube æå»ºå°ä¼æ¯ä¸é¡¹ç¹éçä»»å¡ã妿æ¨çéç¾¤èµæºä¸è½å¤æ§è¡ï¼Spark executors å°±ä¼ æåºå¦ âOutOfMemorryâ è¿æ ·çé误ï¼å æ¤è¯·åçç使ç¨ãå¯¹äºæ UHC dimensionï¼è¿å¤ç»å (ä¾å¦ï¼ä¸ä¸ª cube è¶ è¿ 12 dimensions)ï¼æèå°½å åç度é (Count Distinctï¼Top-N) ç Cubeï¼å»ºè®®æ¨ä½¿ç¨ MapReduce engineã妿æ¨ç Cube 模åè¾ä¸ºç®åï¼ææç齿¯ SUM/MIN/MAX/COUNTï¼æºæ°æ®è§æ¨¡å°è³ä¸çï¼Spark engine å°ä¼æ¯ä¸ªå¥½çéæ©ã餿¤ä¹å¤ï¼Streaming æå»ºå¨ engine ä¸ç®åè¿ä¸æ¯æ(KYLIN-2484)ã</p> +<p>å¦ææ¨æ¯ Kylin ç管çå使¯å¯¹äº Spark æ¯æ°æï¼å»ºè®®æ¨æµè§ <a href="https://spark.apache.org/docs/2.1.2/">Spark ææ¡£</a>ï¼å«å¿è®°ç¸åºå°å»æ´æ°é ç½®ãæ¨å¯ä»¥å¼å¯ Spark ç <a href="https://spark.apache.org/docs/2.1.2/job-scheduling.html#dynamic-resource-allocation">Dynamic Resource Allocation</a> ï¼ä»¥ä¾¿å ¶å¯¹äºä¸åçå·¥ä½è´è½½è½èªå¨ä¼¸ç¼©ãSpark æ§è½ä¾èµäºé群çå åå CPU èµæºï¼å½æå¤ææ°æ®æ¨¡ååå·¨å¤§çæ°æ®é䏿¬¡æå»ºæ¶ Kylin ç Cube æå»ºå°ä¼æ¯ä¸é¡¹ç¹éçä»»å¡ã妿æ¨çéç¾¤èµæºä¸è½å¤æ§è¡ï¼Spark executors å°±ä¼ æåºå¦ âOutOfMemorryâ è¿æ ·çé误ï¼å æ¤è¯·åçç使ç¨ãå¯¹äºæ UHC dimensionï¼è¿å¤ç»å (ä¾å¦ï¼ä¸ä¸ª cube è¶ è¿ 12 dimensions)ï¼æèå°½å åç度é (Count Distinctï¼Top-N) ç Cubeï¼å»ºè®®æ¨ä½¿ç¨ MapReduce engineã妿æ¨ç Cube 模åè¾ä¸ºç®åï¼ææåº¦é齿¯ SUM/MIN/MAX/COUNTï¼æºæ°æ®è§æ¨¡å°è³ä¸çï¼Spark engine å°ä¼æ¯ä¸ªå¥½çéæ©ã</p> <p>å¦ææ¨æä»»ä½é®é¢ï¼æè§ï¼æ bug ä¿®å¤ï¼æ¬¢è¿å¨ d...@kylin.apache.org ä¸è®¨è®ºã</p> Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1836274&r1=1836273&r2=1836274&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Thu Jul 19 14:07:53 2018 @@ -19,8 +19,8 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Thu, 19 Jul 2018 00:27:24 -0700</pubDate> - <lastBuildDate>Thu, 19 Jul 2018 00:27:24 -0700</lastBuildDate> + <pubDate>Thu, 19 Jul 2018 06:59:26 -0700</pubDate> + <lastBuildDate>Thu, 19 Jul 2018 06:59:26 -0700</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item>