Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1899035&r1=1899034&r2=1899035&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Fri Mar 18 14:13:30 2022 @@ -19,11 +19,739 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Thu, 10 Mar 2022 20:07:16 -0800</pubDate> - <lastBuildDate>Thu, 10 Mar 2022 20:07:16 -0800</lastBuildDate> + <pubDate>Fri, 18 Mar 2022 06:59:44 -0700</pubDate> + <lastBuildDate>Fri, 18 Mar 2022 06:59:44 -0700</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item> + <title>宿ï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</title> + <description><h2 id="emr--kylin--glue-">为ä»ä¹å¨ EMR é¨ç½² Kylin éè¦æ¯æ Glue ï¼</h2> + +<h3 id="aws-glue">ä»ä¹æ¯ AWS Glueï¼</h3> + +<p>AWS Glue æ¯ä¸é¡¹å®å ¨æç®¡ç ETLï¼æåã转æ¢åå è½½ï¼æå¡ï¼ä½¿ AWS ç¨æ·è½å¤è½»æ¾èç»æµé«æå°å¯¹æ°æ®è¿è¡åç±»ãæ¸ çåæ©å ï¼å¹¶å¨åç§æ°æ®åå¨ä¹é´å¯é å°ç§»å¨æ°æ®ãAWS Glue ç±ä¸ä¸ªç§°ä¸º AWS Glue æ°æ®ç®å½çä¸å¤®å æ°æ®åå¨åºãä¸ä¸ªèªå¨çæä»£ç ç ETL 弿以åä¸ä¸ªå¤çä¾èµé¡¹è§£æãä½ä¸çæ§åéè¯ççµæ´»è®¡åç¨åºç»æãAWS Glue æ¯æ æå¡å¨æå¡ï¼å æ¤æ é设置æç®¡çåºç¡è®¾æ½ã</p> + +<h3 id="kylin--aws-glue-catalog">Kylin 为ä»ä¹éè¦æ¯æ AWS Glue Catalogï¼</h3> + +<p>ç®åç¤¾åºæå¾å¤ Kylin ç¨æ·å¨ä½¿ç¨ AWS EMRï¼ç»ä»¶ä¸»è¦å æ¬ HadoopãSparkãHiveãPresto çï¼å¦ææ²¡æé ç½®ä½¿ç¨ AWS Glue data Catalogï¼é£ä¹å¨åä¸ªæ°æ®ä»åºç»ä»¶å¦ HiveãSparkãPresto å»ºçæ°æ®è¡¨ï¼å¨å ¶å®ç»ä»¶ä¸æ¯æ¾ä¸å°çï¼ä¹å°±ä¸è½ä½¿ç¨ï¼å ¬å¸åºå±çæ°æ®ä»åºæ¯æä¾ç»å个ä¸å¡é¨é¨æ¥è¿è¡ä½¿ç¨ï¼ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼å¨å建 AWS EMR é群æ¶å°±å¯ä»¥ä½¿ç¨ AWS Glue data Catalog æ¥åå¨å æ°æ®ï¼å¯¹å个ç»ä»¶å ±äº«æ°æ®æºï¼å¯¹å个ä¸å¡é¨é¨è¿è¡å ±äº«æ� �°æ®æºï¼å°å个ä¸å¡é¨é¨çæ°æ®æå»ºæä¸ä¸ªå¤§çæ°æ®ç«æ¹ä½ï¼è½å¤å¿«éååºå ¬å¸é«éåå±çä¸å¡éæ±ã<br /> +ç°ä»£å ¬å¸çæ°æ®é½æ¯åºäºäºå¹³å°æå»ºï¼å¤§æ°æ®å¢é使ç¨ç AWS EMR æ¥è¿è¡æ°æ®å å·¥ãæ°æ®åæã以忍¡åè®ç»ï¼éçæ°æ®æ´å¢å¸¦æ¥ææ°æ ¢ãææ°é¾ï¼EMR/Spark/Hive å¾é¾æ»¡è¶³æ°æ®åæå¸ãè¿è¥äººåãéå®çå¿«éæ¥è¯¢æ°æ®çéæ±ï¼äºæ¯ä¸äºç¨æ·éæ©äº Apache Kylin ä½ä¸ºå¼æº OLAP è§£å³æ¹æ¡ã<br /> +使¯æè¿ç¤¾åºç¨æ·èç³»å°æä»¬ï¼åç¥ Kylin 4 è¿ä¸æ¯æä» Glue 读åè¡¨å æ°æ®ï¼æä»¥æä»¬å社åºç¨æ·åä½ä¸èµ·æ£æ¥è¿ééå°çé®é¢å¹¶æç»è§£å³äºé®é¢ï¼ä»èä½¿å¾ Kylin 4 æ¯æäº AWS Glue Catalogï¼è¿æ ·å¸¦æ¥ç好å¤å¨äº HiveãPrestoãSparkãKylin ä¸å¯ä»¥å ±äº«è¡¨åæ°æ®ï¼ä½¿å¾æ¯ä¸ªä¸»é¢é½ä¸²èèµ·æ¥å½¢æä¸ä¸ªå¤§çæ°æ®åæå¹³å°ï¼æç ´å æ°æ®éç¢ã</p> + +<h3 id="apache-kylin--aws-glue-">Apache Kylin æ¯æ AWS Glue åï¼</h3> + +<table> + <thead> + <tr> + <th> </th> + <th>æ¯æ Glue ç Kylin çæ¬</th> + <th>Issue Link</th> + </tr> + </thead> + <tbody> + <tr> + <td>Kylin on HBase (Before Kylin 4)</td> + <td>2.6.6 or higher<br /> 3.1.0 or higher</td> + <td>https://issues.apache.org/jira/browse/KYLIN-4206<br />https://zhuanlan.zhihu.com/p/99481373</td> + </tr> + <tr> + <td>Kylin on Parquet</td> + <td>4.0.1 or higher</td> + <td>æ¬æã</td> + </tr> + </tbody> +</table> + +<h2 id="section">é¨ç½²ååå¤</h2> + +<h3 id="section-1">软件信æ¯ä¸è§</h3> + +<table> + <thead> + <tr> + <th><strong>Software</strong></th> + <th><strong>Version</strong></th> + <th>Reference</th> + </tr> + </thead> + <tbody> + <tr> + <td>Apache Kylin</td> + <td>4.0.1 or higher</td> + <td>å¿ é¡»æ¯ 4.0.1 以åä¸ï¼è¯¦æ åè <a href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP 10 refactor hive and hadoop dependency</a>.</td> + </tr> + <tr> + <td>AWS EMR</td> + <td>6.5.0 or higher<br />5.33.1 or higher</td> + <td>è¦çEMR 6 / EMR 5 çè¾æ°çæ¬ï¼<a href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon EMR release 6.5.0 - Amazon EMR</a>.</td> + </tr> + </tbody> +</table> + +<h3 id="glue-">åå¤ Glue æ°æ®åºå表</h3> + +<p><img src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png" alt="" /></p> + +<ul> + <li>å建 AWS EMR é群ã</li> +</ul> + +<p>è¿éå¯å¨ä¸ä¸ª EMR çé群ï¼éè¦æ³¨æçæ¯ï¼è¿ééè¿é ç½® <code class="highlighter-rouge">hive.metastore.client.factory.class</code> å¯å¨äº Glue å¤é¨å æ°æ®ã以ä¸å½ä»¤å¯ä»¥ä½ä¸ºåèã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications <span class="nv">Name</span><span class="o">=</span>Hadoop <span class="nv">Name</span><span class="o">=</span>Hive <span class="nv">Name</span><span class="o">=</span>Spark <span class="nv">Name</span><span class="o">=</span>ZooKeeper <span class="nv">Name</span><span class="o">=</span>Tez <span class="nv">Name</span><span class="o">=</span>Ganglia <span class="se">\</span> + --ec2-attributes <span class="k">${}</span> <span class="se">\</span> + --release-label emr-6.5.0 <span class="se">\</span> + --log-uri <span class="k">${}</span> <span class="se">\</span> + --instance-groups <span class="k">${}</span> <span class="se">\</span> + --configurations <span class="s1">'[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'</span> <span class="se">\</span> + --auto-scaling-role EMR_AutoScaling_DefaultRole <span class="se">\</span> + --ebs-root-volume-size 100 <span class="se">\</span> + --service-role EMR_DefaultRole <span class="se">\</span> + --enable-debugging <span class="se">\</span> + --name <span class="s1">'Kylin4_on_EMR65_with_Glue'</span> <span class="se">\</span> + --region cn-northwest-1 +</code></pre> +</div> + +<ul> + <li>ç»å½ Master èç¹ï¼å¹¶ä¸æ£æ¥ Hadoop çæ¬ å Hadoop é群æ¯å¦å¯å¨æåã</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png" alt="" /></p> + +<h3 id="optional">è·åç¯å¢ä¿¡æ¯ï¼Optionalï¼</h3> + +<blockquote> + <p>å¦æä½ ä½¿ç¨ RDS æè å ¶ä»å æ°æ®åå¨ï¼è¯·é æ è·³è¿æ¤æ¥ã</p> +</blockquote> + +<p>ç±äº Kylin 4.X æ¨èä½¿ç¨ RDBMS ä½ä¸ºå æ°æ®åå¨ï¼å¤äºæµè¯ç®çï¼è¿éä½¿ç¨ Master èç¹èªå¸¦ç MariaDB ä½ä¸ºå æ°æ®åå¨ï¼å ³äº MariaDB ç主æºåç§°ãè´¦å·ãå¯ç çä¿¡æ¯ï¼å¯ä»¥ä» <code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code> è·åã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>kylin.metadata.url<span class="o">=</span>kylin4_on_cloud@jdbc,url<span class="o">=</span>jdbc:mysql://<span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span class="o">=</span>hive,password<span class="o">=</span><span class="k">${</span><span class="nv">PASSWORD</span><span class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span class="o">=</span>10,driverClassName<span class="o">=</span>org.mariadb.jdbc.Driver +kylin.env.zookeeper-connect-string<span class="o">=</span><span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span> +</code></pre> +</div> + +<p>è·åè¿äºä¿¡æ¯åï¼å¹¶ä¸æ¿æ¢ä»¥ä¸ Kylin é 置项éé¢çåéï¼å¦ <code class="highlighter-rouge">${PASSWORD}</code>ï¼ä¿åå°æ¬å°ï¼ä¾ä¸ä¸æ¥å¯å¨ Kylin è¿ç¨ä½¿ç¨ã</p> + +<h3 id="spark-sql--aws-glue-">æµè¯ Spark SQL å AWS Glue çè¿éæ§</h3> + +<p>éè¿ spark-sql æ¥æµè¯ AWS ç Spark SQL æ¯å¦è½å¤éè¿ Glue è·åæ°æ®åºå表çå æ°æ®ï¼é¦æ¬¡ä¼åç°å¯å¨æ¥é失败ã</p> + +<p><img src="/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png" alt="" /></p> + +<p>å ¶éè¿ä»¥ä¸å½ä»¤æ¿æ¢ Spark 使ç¨ç <code class="highlighter-rouge">hive-site.xml</code>ã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> /etc/spark/conf +sudo mv hive-site.xml hive-site.xml.bak +sudo cp /etc/hive/conf/hive-site.xml . +</code></pre> +</div> + +<p>å¹¶ä¸ä¿®æ¹ <code class="highlighter-rouge">/etc/spark/conf/hive-site.xml</code> æä»¶ä¸ <code class="highlighter-rouge">hive.execution.engine</code> çå¼ä¸º<code class="highlighter-rouge">mr</code>ï¼å次å°è¯å¯å¨ Spark-SQL CLIï¼éªè¯å¯¹ Glue çè¡¨æ°æ®æ§è¡æ¥è¯¢æåã</p> + +<p><img src="/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png" alt="" /></p> + +<h3 id="kylin-spark-enginejaroptional">åå¤ kylin-spark-engine.jarï¼Optionalï¼</h3> + +<blockquote> + <p>妿 Apache Kylin 4.0.2 å·²ç»åå¸ï¼é£ä¹åºè¯¥å·²ç»ä¿®æ¹è¯¥é®é¢ï¼å¯ä»¥è·³è¿æ¤æ¥ãå¦å请åè以䏿¥éª¤ï¼æ¿æ¢ <code class="highlighter-rouge">kylin-spark-engine.jar</code>ï¼</p> +</blockquote> + +<p>åèä¸é¢çå½ä»¤ï¼å é kylin ä»åºï¼æ§è¡ <code class="highlighter-rouge">mvn clean package -DskipTests</code>ï¼è·å <code class="highlighter-rouge">kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar</code> ã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>git clone https://github.com/hit-lacus/kylin.git +<span class="nb">cd </span>kylin +git checkout KYLIN-5160 +mvn clean package -DskipTests + +<span class="c"># find -name kylin-spark-engine-4.0.0-SNAPSHOT.jar kylin-spark-project/kylin-spark-engine/target</span> +</code></pre> +</div> + +<p>Patch link: <a href="https://github.com/apache/kylin/pull/1819">https://github.com/apache/kylin/pull/1819</a></p> + +<h2 id="kylin--glue">é¨ç½² Kylin å¹¶è¿æ¥ Glue</h2> + +<h3 id="kylin">ä¸è½½ Kylin</h3> + +<ol> + <li> + <p>ä¸è½½å¹¶è§£å Kylin ï¼è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Kylin packageï¼å ·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ spark2 ç packageï¼EMR 6.X ä½¿ç¨ spark3 ç packageã<br /> + <code class="highlighter-rouge">shell + # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz . + # wget apache-kylin-4.0.1-bin-spark3.tar.gz + tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz . + cd apache-kylin-4.0.1-bin-spark3 + export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3 +</code></p> + </li> + <li> + <p>è·å RDBMS ç é©±å¨ jarï¼Optionalï¼</p> + + <blockquote> + <p>å¦æä½ æ¯ç¨å«ç RDBMS ä½ä¸ºå æ°æ®åå¨ï¼è¯·è·³è¿æ¤æ¥éª¤ã</p> + </blockquote> + + <p><code class="highlighter-rouge">shell + cd $KYLIN_HOME + mkdir ext + cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext +</code></p> + </li> +</ol> + +<h3 id="spark">åå¤ Spark</h3> + +<p>ç±äº AWS Spark å 置对 AWS Glue çæ¯æï¼æä»¥ <strong>å è½½è¡¨å æ°æ®åæ§è¡æå»ºéè¦ä½¿ç¨ AWS Spark</strong>ï¼ä½æ¯èèå° Kylin 4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark ææ¯è¾å¤§ç代ç ä¿®æ¹ï¼ä¸¤è å ¼å®¹æ§è¾å·®ï¼æä»¥<strong>æ¥è¯¢ Cube éè¦ä½¿ç¨ Apache Spark</strong>ãç»¼ä¸æè¿°ï¼éè¦æ ¹æ® Kylin éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æå»ºä»»å¡ï¼æ¥åæ¢æä½¿ç¨çç Sparkã</p> + +<ul> + <li>åå¤ AWS Spark</li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +mkdir ext +cp /usr/lib/hive/lib/mariadb-connector-java.jar <span class="nv">$KYLIN_HOME</span>/ext +</code></pre> +</div> + +<ul> + <li>åå¤ Apache Spark + <ul> + <li>è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Spark çæ¬å®è£ å ï¼å ·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ <code class="highlighter-rouge">Spark 2.4.7</code> ç Spark å®è£ å ï¼EMR 6.X ä½¿ç¨ <code class="highlighter-rouge">Spark 3.1.2</code> ç Spark å®è£ å ã<br /> +<code class="highlighter-rouge">shell +cd $KYLIN_HOME +aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website +tar zxvf spark-2.4.7-bin-hadoop2.7.tgz +mv spark-2.4.7-bin-hadoop2.7 spark-apache +</code></li> + </ul> + </li> + <li>å 为è¦å å è½½ Glue è¡¨ï¼æä»¥è¿ééè¿è½¯é¾æ¥å°<code class="highlighter-rouge">$KYLIN_HOME/spark</code>æå AWS Sparkï¼è¯·æ³¨ææ é设置 <code class="highlighter-rouge">SPARK_HOME</code>ï¼å ä¸ºå¨ <code class="highlighter-rouge">$KYLIN_HOME/spark</code> åå¨å¹¶ä¸ <code class="highlighter-rouge">SPARK_HOME</code> æªè®¾ç½®çæ åµä¸ï¼Kylin ä¼é»è®¤ä½¿ç¨ <code class="highlighter-rouge">$KYLIN_HOME/spark</code> ã</li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code>ln -s spark-aws spark +</code></pre> +</div> + +<h3 id="kylin-">ä¿®æ¹ Kylin å¯å¨èæ¬</h3> + +<ol> + <li>å¯å¨ Spark SQL CLIï¼ä¸éåº</li> + <li> + <p>éè¿ <code class="highlighter-rouge">jps -ml ${PID}</code> è·å <code class="highlighter-rouge">SparkSQLCLIDriver</code> ç PIDï¼ç¶åè·å Driver ç <code class="highlighter-rouge">spark.driver.extraClasspath</code>ãæè ä¹å¯ä»¥ä» <code class="highlighter-rouge">/etc/spark/conf/spark-defaults.conf</code> è·åã<br /> + <code class="highlighter-rouge">shell + jps -ml | grep SparkSubmit + jinfo ${PID} | grep "spark.driver.extraClassPath" +</code><br /> + <img src="/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png" alt="" /></p> + </li> + <li>ç¼è¾ <code class="highlighter-rouge">bin/kylin.sh</code>ï¼ä¿®æ¹ <code class="highlighter-rouge">KYLIN_TOMCAT_CLASSPATH</code> åéï¼è¿½å <code class="highlighter-rouge">kylin_driver_classpath</code> ï¼ä¿å好 <code class="highlighter-rouge">bin/kylin.sh</code> åéåº Spark SQL CLI</li> +</ol> + +<ul> + <li>ä¿®æ¹åç kylin.sh</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png" alt="" /></p> + +<ul> + <li>é对 EMR 6.5.0ï¼ä¿®æ¹åç kylin.shï¼<code class="highlighter-rouge">kylin_driver_classpath</code> æ¾å°æåã</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png" alt="" /></p> + +<ul> + <li>é对 EMR 5.33.1ï¼ä¿®æ¹åç kylin.shï¼<code class="highlighter-rouge">kylin_driver_classpath</code> æ¾å° <code class="highlighter-rouge">$SPARK_HOME/jars</code> ä¹åã</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png" alt="" /></p> + +<h3 id="kylin-1">é ç½® Kylin</h3> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +vim conf/kylin.properties +</code></pre> +</div> + +<h4 id="minimal-kylin-configuration">Minimal Kylin Configuration</h4> + +<table> + <thead> + <tr> + <th>Property Key</th> + <th>Property Value(Example)</th> + <th>Notes</th> + </tr> + </thead> + <tbody> + <tr> + <td>kylin.metadata.url</td> + <td>kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver</td> + <td>N/A</td> + </tr> + <tr> + <td>kylin.env.zookeeper-connect-string</td> + <td>${HOSTNAME}</td> + <td>N/A</td> + </tr> + <tr> + <td>kylin.engine.spark-conf.spark.driver.extraClassPath</td> + <td>/usr/lib/hadoop-lzo/lib/<em>:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/</em>:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar</td> + <td>Copied from spark.driver.extraClasspath in /etc/spark/conf/spark-default.conf</td> + </tr> + </tbody> +</table> + +<h3 id="kylin--1">å¯å¨ Kylin å¹¶éªè¯æå»º</h3> + +<h4 id="kylin-2">å¯å¨ Kylin</h4> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +ln -s spark spark_aws <span class="c"># skip this step if soft link 'spark' exists </span> +bin/kylin.sh restart +</code></pre> +</div> + +<p><img src="/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png" alt="" /></p> + +<h4 id="kylin-spark-enginejar-optional">æ¿æ¢ kylin-spark-engine.jar (Optional)</h4> + +<blockquote> + <p>ä» å¯¹äº 4.0.1 éè¦æä½è¯¥æ¥éª¤ã</p> +</blockquote> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>/tomcat/webapps/kylin/WEB-INF/lib/ +mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak <span class="c"># remove old one </span> +cp kylin-spark-engine-4.0.0-SNAPSHOT.jar . + +bin/kylin.sh restart <span class="c"># restart kylin to make new jar be loaded</span> +</code></pre> +</div> + +<h4 id="glue--1">å è½½ Glue 表ãæå»º</h4> + +<ul> + <li>å è½½ Glue è¡¨å æ°æ®</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png" alt="" /></p> + +<ul> + <li>å建 Model å Cubeï¼ç¶å触åæå»º</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png" alt="" /></p> + +<h3 id="section-2">éªè¯æ¥è¯¢</h3> + +<p>忢 Kylin 使ç¨ç Sparkï¼éå¯ Kylinã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +rm spark <span class="c"># 'spark' is a soft link, it is point to aws spark</span> +ln -s spark_apache spark <span class="c"># switch from aws spark to apache spark</span> +bin/kylin.sh restart +</code></pre> +</div> + +<p>æ§è¡æµè¯æ¥è¯¢ï¼æ¥è¯¢æå</p> + +<p><img src="/images/blog/kylin4_support_aws_glue/17_verify_query_en.png" alt="" /></p> + +<h2 id="section-3">讨论åé®ç</h2> + +<h3 id="sparkaws-spark--apache-spark">为ä»ä¹å¿ 须使ç¨ä¸¤ä¸ª Sparkï¼AWS Spark &amp; Apache Sparkï¼ï¼</h3> + +<p>ç±äº AWS Spark å 置对 AWS Glue Catalog çæ¯æï¼å¹¶ä¸å 载表åæå»ºå¼æéè¦è·åè¡¨ï¼æä»¥<strong>å è½½è¡¨å æ°æ®åæ§è¡æå»ºéè¦ä½¿ç¨ AWS Spark</strong>ï¼ä½æ¯èèå° Kylin 4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark ææ¯è¾å¤§ç代ç ä¿®æ¹ï¼é æä¸¤è å ¼å®¹æ§è¾å·®ï¼æä»¥<strong>æ¥è¯¢ Cube éè¦ä½¿ç¨ Apache Spark</strong>ãç»¼ä¸æè¿°ï¼éè¦æ ¹æ® Kylin éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æå»ºä»»å¡ï¼æ¥åæ¢æä½¿ç¨çç Sparkã<br /> +å¨å®é 使ç¨è¿ç¨ä¸ï¼å¯ä»¥èè Job Nodeï¼æå»ºä»»å¡ï¼ä½¿ç¨ AWS Sparkï¼Query Nodeï¼æ¥è¯¢ä»»å¡ï¼ä½¿ç¨ Apache Sparkã</p> + +<h3 id="kylinsh">为ä»ä¹éè¦ä¿®æ¹ kylin.shï¼</h3> + +<p>Kylin è¿ç¨ä½ä¸º Spark Driver éè¦éè¿<code class="highlighter-rouge">aws-glue-datacatalog-spark-client.jar</code>å è½½è¡¨å æ°æ®ï¼æä»¥è¿åéè¦ä¿®æ¹ kylin.shï¼å°ç¸å ³ jar å è½½å° Kylin è¿ç¨ç classpathã</p> +</description> + <pubDate>Thu, 17 Mar 2022 04:00:00 -0700</pubDate> + <link>http://kylin.apache.org/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</guid> + + + <category>cn_blog</category> + + </item> + + <item> + <title>Kylin 4 now is supporting AWS Glue Catalog</title> + <description><h2 id="why-does-installing-kylin-on-emr-need-to-support-aws-glue">Why does installing Kylin on EMR need to support AWS Glue?</h2> + +<h3 id="what-is-aws-glue">What is AWS Glue?</h3> + +<p>AWS Glue is a fully hosted ETL (Extract, Transform, and Load) service that enables AWS users to easily and cost-effectively classify, cleanse, enrich data and move data between various data storages. AWS Glue consists of a central metastore called AWS Glue Data Catalog, an ETL engine that can automatically generate code and a flexible scheduler that can handle dependency resolution, monitor jobs and retry. AWS Glue is a serverless service, so there is no infrastructure to set up or manage.</p> + +<h3 id="why-does-kylin-need-aws-glue-catalog">Why does Kylin need AWS Glue Catalog?</h3> + +<p>At present, many users in the Kylin community use AWS EMR for running large-scale distributed data processing jobs on Hadoop, Spark, Hive, Presto, etc. Without AWS Glue Data Catalog, tables built on these data warehouse components (like Hive, Spark and Presto) can not be used by any other components. As the data warehouse needs to answer requirements from various business departments, they use AWS Glue Data Catalog for metadata storage when creating the AWS EMR clusters, to share the data sources among different components and business departments. That is, to build one data cube with data from each business department, so they can provide quick responses to different business requirements.<br /> +In modern companies, data is saved on cloud object storage and big data teams use AWS EMR for data processing, data analysis and model training. But with data explosion, it becomes really difficult to extract data and the response time is too long. In other words, the solution of EMR + Spark/Hive cannot meet the speedy data query requirements from data analysts, O&amp;M personnel and sales. So some users turn to Apache Kylin as their open-source OLAP solution.<br /> +Recently, our users approached us with the request that Kylin 4 could directly read table metadata from AWS Glue. After some collaboration, now Kylin 4 supports AWS Glue Catalog, making it possible for tables and data to be shared among Hive, Presto, Spark and Kylin. This helps to break down the metadata barrier, so different topics can be combined to form a big data analysis platform.</p> + +<h3 id="does-kylin-support-aws-glue">Does Kylin support AWS Glue?</h3> + +<table> + <thead> + <tr> + <th> </th> + <th>Kylin version which supports Glue</th> + <th>Issue Link</th> + </tr> + </thead> + <tbody> + <tr> + <td>Kylin on HBase (Before Kylin 4)</td> + <td>2.6.6 or higher<br />3.1.0 or higher</td> + <td>https://issues.apache.org/jira/browse/KYLIN-4206<br />https://zhuanlan.zhihu.com/p/99481373</td> + </tr> + <tr> + <td>Kylin on Parquet</td> + <td>4.0.1 or higher</td> + <td>This article.</td> + </tr> + </tbody> +</table> + +<h2 id="prerequisites-for-deployment">Prerequisites for deployment</h2> + +<h3 id="software-version">Software Version</h3> + +<table> + <thead> + <tr> + <th><strong>Software</strong></th> + <th><strong>Version</strong></th> + <th>Reference</th> + </tr> + </thead> + <tbody> + <tr> + <td>Apache Kylin</td> + <td>4.0.1 or higher</td> + <td><a href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP 10 refactor hive and hadoop dependency</a>.</td> + </tr> + <tr> + <td>AWS EMR</td> + <td>6.5.0 or higher<br />5.33.1 or higher</td> + <td><a href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon EMR release 6.5.0 - Amazon EMR</a>.</td> + </tr> + </tbody> +</table> + +<h3 id="prepare-aws-glue-database-and-tables">Prepare AWS Glue database and tables</h3> + +<p><img src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png" alt="" /></p> + +<ul> + <li>Create an EMR cluster.</li> +</ul> + +<p>Note: Parameter hive.metastore.client.factory.class is configured to enable AWS Glue. For details, you may refer to the commands below.</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications <span class="nv">Name</span><span class="o">=</span>Hadoop <span class="nv">Name</span><span class="o">=</span>Hive <span class="nv">Name</span><span class="o">=</span>Spark <span class="nv">Name</span><span class="o">=</span>ZooKeeper <span class="nv">Name</span><span class="o">=</span>Tez <span class="nv">Name</span><span class="o">=</span>Ganglia <span class="se">\</span> + --ec2-attributes <span class="k">${}</span> <span class="se">\</span> + --release-label emr-6.5.0 <span class="se">\</span> + --log-uri <span class="k">${}</span> <span class="se">\</span> + --instance-groups <span class="k">${}</span> <span class="se">\</span> + --configurations <span class="s1">'[{"Classification":"hive-site","Properties":{"hive.metastore.client.factory.class":"com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]'</span> <span class="se">\</span> + --auto-scaling-role EMR_AutoScaling_DefaultRole <span class="se">\</span> + --ebs-root-volume-size 100 <span class="se">\</span> + --service-role EMR_DefaultRole <span class="se">\</span> + --enable-debugging <span class="se">\</span> + --name <span class="s1">'Kylin4_on_EMR65_with_Glue'</span> <span class="se">\</span> + --region cn-northwest-1 +</code></pre> +</div> + +<ul> + <li>Log in to the Master node. Check the Hadoop version and whether the Hadoop cluster is successfully started.</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png" alt="" /></p> + +<h3 id="optionalget-environmental-information">(Optional)Get environmental information</h3> + +<blockquote> + <p>If you are using RDS or other metadata storage, you may skip this step.</p> +</blockquote> + +<p>RDBMS is recommended for metastore in Kylin 4. So for testing purposes, in this article, we use MariaDB which comes with the Master node for metastore; for hostname, account and password of MariaDB, see <code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>.</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>kylin.metadata.url<span class="o">=</span>kylin4_on_cloud@jdbc,url<span class="o">=</span>jdbc:mysql://<span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span class="o">=</span>hive,password<span class="o">=</span><span class="k">${</span><span class="nv">PASSWORD</span><span class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span class="o">=</span>10,driverClassName<span class="o">=</span>org.mariadb.jdbc.Driver +kylin.env.zookeeper-connect-string<span class="o">=</span><span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span> +</code></pre> +</div> + +<p>Configure the variables as per the actual information, for example, replace ${PASSWORD} with the real password, save it locally and it will be used to start Kylin.</p> + +<h3 id="test-the-connectivity-between-spark-sql-and-aws-glue">Test the connectivity between Spark SQL and AWS Glue</h3> + +<p>Test whether AWS Spark SQL can access databases and table metadata through AWS Glue with Spark-SQL. For the first test, you will find that the startup fails with an error.</p> + +<p><img src="/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png" alt="" /></p> + +<p>Replace <code class="highlighter-rouge">hive-site.xml</code> used by Spark with the following commands.</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> /etc/spark/conf +sudo mv hive-site.xml hive-site.xml.bak +sudo cp /etc/hive/conf/hive-site.xml . +</code></pre> +</div> + +<p>Then change the value of <code class="highlighter-rouge">hive.execution.engine</code> in file <code class="highlighter-rouge">/etc/spark/conf/hive-site.xml</code> to <code class="highlighter-rouge">mr</code>, restart Spark-SQL CLI and verify whether the query for AWS Glueâs table data is successful.</p> + +<p><img src="/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png" alt="" /></p> + +<h3 id="optional-prepare-kylin-spark-enginejar">(Optional) Prepare kylin-spark-engine.jar</h3> + +<blockquote> + <p>This issue will be fixed in Apache Kylin 4.0.2. So you can skip this step after updating to Apache Kylin 4.0.2. For users with Kylin 4.0.1, please refer to the following steps to replace kylin-spark-engine.jar:</p> +</blockquote> + +<p>Clone Kylin git repository, execute <code class="highlighter-rouge">mvn clean package -DskipTests</code> to build a new <code class="highlighter-rouge">kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar</code> .</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>git clone https://github.com/hit-lacus/kylin.git +<span class="nb">cd </span>kylin +git checkout KYLIN-5160 +mvn clean package -DskipTests + +<span class="c"># find -name kylin-spark-engine-4.0.0-SNAPSHOT.jar kylin-spark-project/kylin-spark-engine/target</span> +</code></pre> +</div> + +<p>Patch link: <a href="https://github.com/apache/kylin/pull/1819">https://github.com/apache/kylin/pull/1819</a></p> + +<h2 id="deploy-kylin-and-connect-to-aws-glue">Deploy Kylin and connect to AWS Glue</h2> + +<h3 id="download-kylin">Download Kylin</h3> + +<ol> + <li> + <p>Download and decompress Kylin. Please download the corresponding Kylin package according to your EMR version. That is, with EMR 5.X you can download Spark 2 package; with EMR 6.X you can download Spark 3 package.<br /> + <code class="highlighter-rouge">shell + # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz . + # wget apache-kylin-4.0.1-bin-spark3.tar.gz + tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz . + cd apache-kylin-4.0.1-bin-spark3 + export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3 +</code></p> + </li> + <li> + <p>(Optional) Get MariaDB driver jar<br /> + &gt; If you are using other databases for metastore, please skip this step.</p> + + <p><code class="highlighter-rouge">shell + cd $KYLIN_HOME + mkdir ext + cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext +</code></p> + </li> +</ol> + +<h3 id="prepare-spark">Prepare Spark</h3> + +<p>AWS Spark has built-in support of AWS Glue, so you will use AWS Spark when loading table metadata and building jobs. Kylin 4.0.1 supports Apache Spark officially. Because the compatibility between Apache Spark and AWS Spark is not very good, we will use Apache Spark for cube queries. To sum up, you need to switch between AWS Spark and Apache Spark according to your task (query task or build task).</p> + +<ul> + <li>Prepare AWS Spark</li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +mkdir ext +cp /usr/lib/hive/lib/mariadb-connector-java.jar <span class="nv">$KYLIN_HOME</span>/ext +</code></pre> +</div> + +<ul> + <li>Download Apache Spark + <ul> + <li>Please download the corresponding Spark installation package according to your EMR version. That is, with EMR 5.X you can download Spark 2.4.7 and with EMR 6.X you can download Spark 3.1.2.<br /> +<code class="highlighter-rouge">shell +cd $KYLIN_HOME +aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website +tar zxvf spark-2.4.7-bin-hadoop2.7.tgz +mv spark-2.4.7-bin-hadoop2.7 spark-apache +</code></li> + </ul> + </li> + <li>First, you need to load AWS Glue table, so direct <code class="highlighter-rouge">$KYLIN_HOME/spark</code> to AWS Spark with soft link. Note: you do not need to set up <code class="highlighter-rouge">SPARK_HOME</code>, because if <code class="highlighter-rouge">$KYLIN_HOME/spark</code> exists and <code class="highlighter-rouge">SPARK_HOME</code> is not set up, Kylin will use <code class="highlighter-rouge">$KYLIN_HOME/spark</code> as <code class="highlighter-rouge">SPARK_HOME</code> by default.</li> +</ul> + +<div class="highlighter-rouge"><pre class="highlight"><code>ln -s spark-aws spark +</code></pre> +</div> + +<h3 id="modify-kylin-startup-script">Modify Kylin startup script</h3> + +<ol> + <li>Start Spark SQL CLI and keep it in running status.</li> + <li> + <p>Acquire PID of <code class="highlighter-rouge">SparkSQLCLIDriver</code> with <code class="highlighter-rouge">jps -ml ${PID}</code>. Then acquire <code class="highlighter-rouge">spark.driver.extraClasspath</code> of <strong>Driver</strong>. Or, you can acquire these from /etc/spark/conf/spark-defaults.conf.<br /> + <code class="highlighter-rouge">shell + jps -ml | grep SparkSubmit + jinfo ${PID} | grep "spark.driver.extraClassPath" +</code><br /> + <img src="/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png" alt="" /></p> + </li> + <li>Edit <code class="highlighter-rouge">bin/kylin.sh</code>, modify <code class="highlighter-rouge">KYLIN_TOMCAT_CLASSPATH</code> and add <code class="highlighter-rouge">kylin_driver_classpath</code>; save bin/kylin.sh, then exit Spark SQL CLI.</li> +</ol> + +<ul> + <li>kylin.sh before modifying</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png" alt="" /></p> + +<ul> + <li>For EMR 6.5.0, in the modified <code class="highlighter-rouge">kylin.sh</code>, <code class="highlighter-rouge">kylin_driver_classpath</code> is at the end of the code.</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png" alt="" /></p> + +<ul> + <li>For EMR 5.33.1, in the modified <code class="highlighter-rouge">kylin.sh</code>, <code class="highlighter-rouge">kylin_driver_classpath</code> is placed before <code class="highlighter-rouge">$SPARK_HOME/jars</code>.</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png" alt="" /></p> + +<h3 id="configure-kylin">Configure Kylin</h3> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +vim conf/kylin.properties +</code></pre> +</div> + +<h4 id="minimal-kylin-configuration">Minimal Kylin Configuration</h4> + +<table> + <thead> + <tr> + <th>Property Key</th> + <th>Property Value(Example)</th> + <th>Notes</th> + </tr> + </thead> + <tbody> + <tr> + <td>kylin.metadata.url</td> + <td>kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver</td> + <td>N/A</td> + </tr> + <tr> + <td>kylin.env.zookeeper-connect-string</td> + <td>${HOSTNAME}</td> + <td>N/A</td> + </tr> + <tr> + <td>kylin.engine.spark-conf.spark.driver.extraClassPath</td> + <td>/usr/lib/hadoop-lzo/lib/<em>:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/</em>:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar</td> + <td>Copied from spark.driver.extraClasspath in /etc/spark/conf/spark-default.conf</td> + </tr> + </tbody> +</table> + +<h3 id="start-kylin-and-verify-the-building-job">Start Kylin and verify the building job</h3> + +<h4 id="start-kylin">Start Kylin</h4> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +ln -s spark spark_aws <span class="c"># skip this step if soft link 'spark' exists </span> +bin/kylin.sh restart +</code></pre> +</div> + +<p><img src="/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png" alt="" /></p> + +<h4 id="optional-replace-kylin-spark-enginejar">(Optional) Replace kylin-spark-engine.jar</h4> + +<blockquote> + <p>This step is only required for Kylin 4.0.1 users.</p> +</blockquote> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span>/tomcat/webapps/kylin/WEB-INF/lib/ +mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak <span class="c"># remove old one </span> +cp kylin-spark-engine-4.0.0-SNAPSHOT.jar . + +bin/kylin.sh restart <span class="c"># restart kylin to make new jar be loaded</span> +</code></pre> +</div> + +<h4 id="load-aws-glue-table-and-build">Load AWS Glue table and build</h4> + +<ul> + <li>Load AWS Glue table metadata</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png" alt="" /></p> + +<ul> + <li>Create Model and Cube, then trigger a building job.</li> +</ul> + +<p><img src="/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png" alt="" /></p> + +<h3 id="verify-the-query">Verify the query</h3> + +<p>Switch the Spark used by Kylin and restart Kylin.</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +rm spark <span class="c"># 'spark' is a soft link, it is point to aws spark</span> +ln -s spark_apache spark <span class="c"># switch from aws spark to apache spark</span> +bin/kylin.sh restart +</code></pre> +</div> + +<p>Perform a test query and this query is successful.</p> + +<p><img src="/images/blog/kylin4_support_aws_glue/17_verify_query_en.png" alt="" /></p> + +<h2 id="discussion-and-qa">Discussion and Q&amp;A</h2> + +<h3 id="why-we-must-use-both-aws-spark-and-apache-spark">Why we must use both AWS Spark and Apache Sparkï¼</h3> + +<p>AWS Spark has built-in support for AWS Glue so you will use AWS Spark when loading table metadata and building jobs; Kylin 4.0.1 supports Apache Spark. Because the compatibility between Apache Spark and AWS Spark is not very good, we will use Apache Spark for cube query. To sum up, you need to switch between AWS Spark and Apache Spark according to your task (query task or build task).</p> + +<h3 id="why-do-users-need-to-modify-kylinsh">Why do users need to modify kylin.sh?</h3> + +<p>As Spark Driver, Kylin needs to load table metadata through <code class="highlighter-rouge">aws-glue-datacatalog-spark-client.jar</code>, so you need to modify kylin.sh and load the relevant jar into classpath of Kylin process.</p> + +<h3 id="if-i-faced-more-questions-where-should-i-asked">If I faced more questions, where should I asked?</h3> + +<p>If you have any questions about using Kylin on AWS, please contact us via mailling list(<a href="&#109;&#097;&#105;&#108;&#116;&#111;:&#117;&#115;&#101;&#114;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;">&#117;&#115;&#101;&#114;&#064;&#107;&#121;&#108;&#105;&#110;&#046;&#097;&#112;&#097;&#099;&#104;&#101;&#046;&#111;&#114;&#103;</a>), please check for detail <a href="https://kylin.apache.org/community/">https://kylin.apache.org/community/</a> .</p> +</description> + <pubDate>Thu, 17 Mar 2022 04:00:00 -0700</pubDate> + <link>http://kylin.apache.org/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</link> + <guid isPermaLink="true">http://kylin.apache.org/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</guid> + + + <category>blog</category> + + </item> + + <item> <title>The future of Apache Kylinï¼More powerful and easy-to-use OLAP</title> <description><h2 id="apache-kylin-today">01 Apache Kylin Today</h2> @@ -287,6 +1015,137 @@ If users use cloud object storage as Kyl </item> <item> + <title>How Meituan Dominates Online Shopping with Apache Kylin</title> + <description><p>Letâs face it, online shopping now affects nearly every part of our shopping lives. From ordering groceries to <a href="https://www.carvana.com/">purchasing a car</a>, weâre living in an age of limitless choices when it comes to online commerce. Nowhere is this more the case than with the worldâs 2nd largest consumer market: China.</p> + +<p>Leading the online shopping revolution in China is Meituan, who since 2016 has grown to support nearly 460 million consumers from over 2,000 industries, regularly processing hundreds of $billions in transactions. To support these staggering operations, Meituan has invested heavily in its data analytics system and employs more than 10,000 engineers to ensure a stable and reliable experience for their customers.</p> + +<p>But the driving force behind Meituanâs success is not simply a robust analytics system. While the organizationâs executives might think so, its engineers understand that it is the OLAP engine that system is built upon that has empowered the company to move quickly and win in the market.</p> + +<h2 id="meituans-secret-weapon-apache-kylin"><strong>Meituanâs Secret Weapon: Apache Kylin</strong></h2> + +<p>Since 2016, Meituanâs technical team has relied on<a href="https://kyligence.io/apache-kylin-overview/"> Apache Kylin</a> to power their<a href="https://kyligence.io/resources/extreme-olap-with-apache-kylin/"> OLAP engine</a>. Apache Kylin, an open source OLAP engine built on the Hadoop platform, resolves complex queries at sub-second speeds through multidimensional precomputation, allowing for blazing-fast analysis on even the largest datasets.</p> + +<p>However, the limitations of this open source solution became apparent as the companyâs business grew, becoming less and less efficient as cubes and queries became larger and more complex. To solve this problem, the engineering team leveraged Kylinâs open source foundations to dig into the engine, understand its underlying principles, and develop an implementation strategy that other organizations using Kylin can adopt to greatly improve their data output efficiency.</p> + +<p>Meituanâs technical team has graciously shared their story of this process below so that you can apply it toward solving your own big data challenges.</p> + +<h2 id="a-global-pandemic-and-a-new-normal-for-business"><strong>A Global Pandemic and a New Normal for Business</strong></h2> + +<p>For the last four years, Meituanâs Qingtian sales system has served as the companyâs data processing workhorse, handling massive amounts of daily sales data involving a wide range of highly complex technical scenarios. The stability and efficiency of this system is paramount, and itâs why Meituanâs engineers have made significant investments in optimizing the OLAP engine Qingtian is built upon.</p> + +<p>After a thorough investigation, the team identified Apache Kylin as the only OLAP engine that could meet their needs and scale with anticipated growth. The engine was rolled out in 2016 and, over the next few years, Kylin played an important role in the companyâs evolving data analytics system.</p> + +<p>Growth expectations, however, turned out to be severely underestimated, as a global pandemic quickly drove major changes in how consumers shopped and how businesses sold their goods. Such a massive shift in online shopping led to even faster growth for Meituan as well as a nearly untenable amount of new business data.</p> + +<p>This caused efficiency bottlenecks that even their Kylin-based system started to struggle with. Cube building and query performance was unable to keep up with these changes in consumer behaviors, slowing down data analysis and decision-making and creating a major obstacle towards addressing user experiences.</p> + +<p>Meituanâs technical team would spend the next six months carrying out optimizations and iterations for Kylin, including dimension pruning, model design, resource adaptation, and improving SLA compliance.</p> + +<h2 id="responding-to-new-consumer-behaviors-with-apache-kylin"><strong>Responding to New Consumer Behaviors with Apache Kylin</strong></h2> + +<p>In order to understand the approach taken when optimizing Meituanâs data architecture, itâs important to understand how the business is managed. The companyâs sales force operates with two business models â in-store sales and phone sales â and is then further broken down by various territories and corporate departments. All analytics data must be communicated across both business models.</p> + +<p>With this in mind, Meituan engineers incorporated Kylin into their design of the data architecture as follows:</p> + +<p><img src="/images/blog/meituan/chart-01.jpeg" alt="" /></p> + +<p>Figure 3. Apache Kylinâs layer-by-layer building data flow</p> + +<p>While this design addressed many of Meituanâs initial concerns around scalability and efficiency, continued shifts in consumer behaviors and the organizationâs response to dramatic changes in the market put enormous pressure on Kylin when it came to building cubes. This lead to an unsustainable level of consumption of both resources and time.</p> + +<p>It became clear that Kylinâs MOLAP model was presenting the following challenges:</p> + +<ul> + <li>The build process involved many steps that were highly correlated, making it difficult to root cause problems.</li> + <li>MapReduce - instead of the more efficient Spark - was still being used as the build engine for historical tasks.</li> + <li>The platformâs default dynamic resource adaption method demanded considerable resources for small tasks. Data was sharded unnecessarily and a large number of small files were generated, resulting in a waste of resources.</li> + <li>Data volumes Meituan was now having to work with were well beyond the original architectural plan, resulting in two hours of cube building every day.</li> + <li>The overall SLA fulfillment rate remained lower than expected.</li> +</ul> + +<p>Recognizing these problems, the team set a goal of improving the platformâs efficiency (you can see the quantitative targets below). Finding a solution would involve classifying Kylinâs build process, digging into how Kylin worked under the hood, breaking down that process, and finally implementing a solution.</p> + +<p><img src="/images/blog/meituan/chart-02.png" alt="" /></p> + +<p>Figure 4. Implementation path diagram</p> + +<h2 id="optimization-understanding-how-apache-kylin-builds-cubes"><strong>Optimization: Understanding How Apache Kylin Builds Cubes</strong></h2> + +<p>Understanding the cube building process is critical for pinpointing efficiency and performance issues. In the case of Kylin, a solid grasp of its precomputation approach and its âby layerâ cubing algorithm are necessary when formulating a solution.</p> + +<p><strong>Precomputation with Apache Kylin</strong></p> + +<p>Apache Kylin generates all possible dimensional combinations and pre-calculates the metrics that may be used in future multidimensional analysis, saving the results as a cube. Metric aggregation results are saved on <em>cuboids</em> (a logical branch of the cube), and during queries relevant cuboids are found through SQL statements, and then read and quickly returned as metric values.</p> + +<p><img src="/images/blog/meituan/chart-03.jpeg" alt="" /></p> + +<p>Figure 5. Precomputation across four dimensions example</p> + +<p><strong>Apache Kylinâs By-Layer Cubing Algorithm</strong></p> + +<p>An N-dimensional cube is composed of 1 N-dimensional sub-cube, N (N-1)-dimensional sub-cubes, N*(N-1)/2 (N-2)-dimensional sub-cubes, â¦, N 1-dimensional sub-cubes, and one 0-dimensional sub-cube, consisting of a total of 2^N sub-cubes. In Kylinâs by-layer cubing algorithm, the number of dimensions decreases with the calculation of each layer, and each layerâs calculation is based on the calculation result of its parent layer (except the first layer, which bases it on the source data).</p> + +<p><img src="/images/blog/meituan/chart-04.png" alt="" /></p> + +<p>Figure 6. Cuboid example</p> + +<h2 id="the-proof-is-in-the-process"><strong>The Proof Is in the Process</strong></h2> + +<p>Understanding the principles outlined above, the Meituan team identified five key areas to focus on for optimization: engine selection, data reading, dictionary building, layer-by-layer build, and file conversion. Addressing these areas would lead to the greatest gains in reducing the required resources for calculation and shortening processing time.</p> + +<p>The team outlined the challenges, their solutions, and key objectives in the following table:</p> + +<p><img src="/images/blog/meituan/chart-05.jpeg" alt="" /></p> + +<p>Figure 7. Breakdown of Apache Kylinâs process</p> + +<h2 id="putting-apache-kylin-to-the-test"><strong>Putting Apache Kylin to the Test</strong></h2> + +<p>With their solutions in place, the next step was to test if Kylinâs build process had actually improved. To do this, the team selected a set of critical sales tasks and ran a pilot (outlined below):</p> + +<p><img src="/images/blog/meituan/chart-06.jpeg" alt="" /></p> + +<p>Figure 8. Meituanâs pilot program for their Apache Kylin optimizations</p> + +<p>The results of the pilot were astonishing. Ultimately, the team was able to realize a significant reduction in resource consumption as seen in the following chart:</p> + +<p><img src="/images/blog/meituan/chart-07.jpeg" alt="" /></p> + +<p>Figure 9. Resource usage and performance of Apache Kylin before and after pilot</p> + +<h2 id="analytics-optimized"><strong>Analytics Optimize</strong>d</h2> + +<p>Today, Meituanâs Qingtian system is processing over 20 different Kylin tasks, and after six months of constant optimization, the monthly CU usage for Kylinâs resource queue and the CU usage for pending tasks have seen significant reductions.</p> + +<p><img src="/images/blog/meituan/chart-08.jpeg" alt="" /></p> + +<p>Figure 10. Current performance of Apache Kylin after solution implementation</p> + +<p>Resource usage isnât the only area of impressive improvement. The Qingtian systemâs SLA compliance also was able to reach 100% as of June 2020.</p> + +<p><img src="/images/blog/meituan/chart-09.jpeg" alt="" /></p> + +<p>Figure 11. Meituan SLA compliance after Apache Kylin optimization</p> + +<h2 id="taking-on-the-future-with-apache-kylin"><strong>Taking on the Future with Apache Kylin</strong></h2> + +<p>Over the past four years, Meituanâs technical team has accumulated a great deal of experience in optimizing query performance and build efficiency with Apache Kylin. But Meituanâs success is also the story of open sourceâs success.</p> + +<p>The<a href="http://kylin.apache.org/community/"> Apache Kylin community</a> has many active and outstanding code contributors (<a href="https://kyligence.io/comparing-kylin-vs-kyligence/">including Kyligence</a>), who are relentlessly working to expand the Kylin ecosystem and add more new features. Itâs in sharing success stories like this that Apache Kylin is able to remain the leading open source solution for analytics on massive datasets.</p> + +<p>Together, with the entire Apache Kylin community, Meituan is making sure critical analytics work can remain unburdened by growing datasets, and that when the next major shift in business takes place, industry leaders like Meituan will be able to analyze whatâs happening and quickly take action.</p> +</description> + <pubDate>Tue, 03 Aug 2021 08:00:00 -0700</pubDate> + <link>http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</link> + <guid isPermaLink="true">http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</guid> + + + <category>blog</category> + + </item> + + <item> <title>Kylin å¨ç¾å¢å°åºé¤é¥®çå®è·µåä¼å</title> <description><p>ä»2016å¹´å¼å§ï¼ç¾å¢å°åºé¤é¥®ææ¯å¢éå°±å¼å§ä½¿ç¨Apache Kylinä½ä¸ºOLAP弿ï¼ä½æ¯éçä¸å¡çé«éåå±ï¼å¨æå»ºåæ¥è¯¢å±é¢é½åºç°äºæçé®é¢ãäºæ¯ï¼ææ¯å¢éä»åç解读å¼å§ï¼ç¶å对è¿ç¨è¿è¡å±å±æè§£ï¼å¹¶å¶å®äºç±ç¹åé¢ç宿½è·¯çº¿ãæ¬ææ»ç»äºä¸äºç»éªåå¿å¾ï¼å¸æè½å¤å¸®å©ä¸çæ´å¤çææ¯å¢éæé«æ°æ®çäº§åºæçã</p> @@ -516,137 +1375,6 @@ If users use cloud object storage as Kyl </item> <item> - <title>How Meituan Dominates Online Shopping with Apache Kylin</title> - <description><p>Letâs face it, online shopping now affects nearly every part of our shopping lives. From ordering groceries to <a href="https://www.carvana.com/">purchasing a car</a>, weâre living in an age of limitless choices when it comes to online commerce. Nowhere is this more the case than with the worldâs 2nd largest consumer market: China.</p> - -<p>Leading the online shopping revolution in China is Meituan, who since 2016 has grown to support nearly 460 million consumers from over 2,000 industries, regularly processing hundreds of $billions in transactions. To support these staggering operations, Meituan has invested heavily in its data analytics system and employs more than 10,000 engineers to ensure a stable and reliable experience for their customers.</p> - -<p>But the driving force behind Meituanâs success is not simply a robust analytics system. While the organizationâs executives might think so, its engineers understand that it is the OLAP engine that system is built upon that has empowered the company to move quickly and win in the market.</p> - -<h2 id="meituans-secret-weapon-apache-kylin"><strong>Meituanâs Secret Weapon: Apache Kylin</strong></h2> - -<p>Since 2016, Meituanâs technical team has relied on<a href="https://kyligence.io/apache-kylin-overview/"> Apache Kylin</a> to power their<a href="https://kyligence.io/resources/extreme-olap-with-apache-kylin/"> OLAP engine</a>. Apache Kylin, an open source OLAP engine built on the Hadoop platform, resolves complex queries at sub-second speeds through multidimensional precomputation, allowing for blazing-fast analysis on even the largest datasets.</p> - -<p>However, the limitations of this open source solution became apparent as the companyâs business grew, becoming less and less efficient as cubes and queries became larger and more complex. To solve this problem, the engineering team leveraged Kylinâs open source foundations to dig into the engine, understand its underlying principles, and develop an implementation strategy that other organizations using Kylin can adopt to greatly improve their data output efficiency.</p> - -<p>Meituanâs technical team has graciously shared their story of this process below so that you can apply it toward solving your own big data challenges.</p> - -<h2 id="a-global-pandemic-and-a-new-normal-for-business"><strong>A Global Pandemic and a New Normal for Business</strong></h2> - -<p>For the last four years, Meituanâs Qingtian sales system has served as the companyâs data processing workhorse, handling massive amounts of daily sales data involving a wide range of highly complex technical scenarios. The stability and efficiency of this system is paramount, and itâs why Meituanâs engineers have made significant investments in optimizing the OLAP engine Qingtian is built upon.</p> - -<p>After a thorough investigation, the team identified Apache Kylin as the only OLAP engine that could meet their needs and scale with anticipated growth. The engine was rolled out in 2016 and, over the next few years, Kylin played an important role in the companyâs evolving data analytics system.</p> - -<p>Growth expectations, however, turned out to be severely underestimated, as a global pandemic quickly drove major changes in how consumers shopped and how businesses sold their goods. Such a massive shift in online shopping led to even faster growth for Meituan as well as a nearly untenable amount of new business data.</p> - -<p>This caused efficiency bottlenecks that even their Kylin-based system started to struggle with. Cube building and query performance was unable to keep up with these changes in consumer behaviors, slowing down data analysis and decision-making and creating a major obstacle towards addressing user experiences.</p> - -<p>Meituanâs technical team would spend the next six months carrying out optimizations and iterations for Kylin, including dimension pruning, model design, resource adaptation, and improving SLA compliance.</p> - -<h2 id="responding-to-new-consumer-behaviors-with-apache-kylin"><strong>Responding to New Consumer Behaviors with Apache Kylin</strong></h2> - -<p>In order to understand the approach taken when optimizing Meituanâs data architecture, itâs important to understand how the business is managed. The companyâs sales force operates with two business models â in-store sales and phone sales â and is then further broken down by various territories and corporate departments. All analytics data must be communicated across both business models.</p> - -<p>With this in mind, Meituan engineers incorporated Kylin into their design of the data architecture as follows:</p> - -<p><img src="/images/blog/meituan/chart-01.jpeg" alt="" /></p> - -<p>Figure 3. Apache Kylinâs layer-by-layer building data flow</p> - -<p>While this design addressed many of Meituanâs initial concerns around scalability and efficiency, continued shifts in consumer behaviors and the organizationâs response to dramatic changes in the market put enormous pressure on Kylin when it came to building cubes. This lead to an unsustainable level of consumption of both resources and time.</p> - -<p>It became clear that Kylinâs MOLAP model was presenting the following challenges:</p> - -<ul> - <li>The build process involved many steps that were highly correlated, making it difficult to root cause problems.</li> - <li>MapReduce - instead of the more efficient Spark - was still being used as the build engine for historical tasks.</li> - <li>The platformâs default dynamic resource adaption method demanded considerable resources for small tasks. Data was sharded unnecessarily and a large number of small files were generated, resulting in a waste of resources.</li> - <li>Data volumes Meituan was now having to work with were well beyond the original architectural plan, resulting in two hours of cube building every day.</li> - <li>The overall SLA fulfillment rate remained lower than expected.</li> -</ul> - -<p>Recognizing these problems, the team set a goal of improving the platformâs efficiency (you can see the quantitative targets below). Finding a solution would involve classifying Kylinâs build process, digging into how Kylin worked under the hood, breaking down that process, and finally implementing a solution.</p> - -<p><img src="/images/blog/meituan/chart-02.png" alt="" /></p> - -<p>Figure 4. Implementation path diagram</p> - -<h2 id="optimization-understanding-how-apache-kylin-builds-cubes"><strong>Optimization: Understanding How Apache Kylin Builds Cubes</strong></h2> - -<p>Understanding the cube building process is critical for pinpointing efficiency and performance issues. In the case of Kylin, a solid grasp of its precomputation approach and its âby layerâ cubing algorithm are necessary when formulating a solution.</p> - -<p><strong>Precomputation with Apache Kylin</strong></p> - -<p>Apache Kylin generates all possible dimensional combinations and pre-calculates the metrics that may be used in future multidimensional analysis, saving the results as a cube. Metric aggregation results are saved on <em>cuboids</em> (a logical branch of the cube), and during queries relevant cuboids are found through SQL statements, and then read and quickly returned as metric values.</p> - -<p><img src="/images/blog/meituan/chart-03.jpeg" alt="" /></p> - -<p>Figure 5. Precomputation across four dimensions example</p> - -<p><strong>Apache Kylinâs By-Layer Cubing Algorithm</strong></p> - -<p>An N-dimensional cube is composed of 1 N-dimensional sub-cube, N (N-1)-dimensional sub-cubes, N*(N-1)/2 (N-2)-dimensional sub-cubes, â¦, N 1-dimensional sub-cubes, and one 0-dimensional sub-cube, consisting of a total of 2^N sub-cubes. In Kylinâs by-layer cubing algorithm, the number of dimensions decreases with the calculation of each layer, and each layerâs calculation is based on the calculation result of its parent layer (except the first layer, which bases it on the source data).</p> - -<p><img src="/images/blog/meituan/chart-04.png" alt="" /></p> - -<p>Figure 6. Cuboid example</p> - -<h2 id="the-proof-is-in-the-process"><strong>The Proof Is in the Process</strong></h2> - -<p>Understanding the principles outlined above, the Meituan team identified five key areas to focus on for optimization: engine selection, data reading, dictionary building, layer-by-layer build, and file conversion. Addressing these areas would lead to the greatest gains in reducing the required resources for calculation and shortening processing time.</p> - -<p>The team outlined the challenges, their solutions, and key objectives in the following table:</p> - -<p><img src="/images/blog/meituan/chart-05.jpeg" alt="" /></p> - -<p>Figure 7. Breakdown of Apache Kylinâs process</p> - -<h2 id="putting-apache-kylin-to-the-test"><strong>Putting Apache Kylin to the Test</strong></h2> - -<p>With their solutions in place, the next step was to test if Kylinâs build process had actually improved. To do this, the team selected a set of critical sales tasks and ran a pilot (outlined below):</p> - -<p><img src="/images/blog/meituan/chart-06.jpeg" alt="" /></p> - -<p>Figure 8. Meituanâs pilot program for their Apache Kylin optimizations</p> - -<p>The results of the pilot were astonishing. Ultimately, the team was able to realize a significant reduction in resource consumption as seen in the following chart:</p> - -<p><img src="/images/blog/meituan/chart-07.jpeg" alt="" /></p> - -<p>Figure 9. Resource usage and performance of Apache Kylin before and after pilot</p> - -<h2 id="analytics-optimized"><strong>Analytics Optimize</strong>d</h2> - -<p>Today, Meituanâs Qingtian system is processing over 20 different Kylin tasks, and after six months of constant optimization, the monthly CU usage for Kylinâs resource queue and the CU usage for pending tasks have seen significant reductions.</p> - -<p><img src="/images/blog/meituan/chart-08.jpeg" alt="" /></p> - -<p>Figure 10. Current performance of Apache Kylin after solution implementation</p> - -<p>Resource usage isnât the only area of impressive improvement. The Qingtian systemâs SLA compliance also was able to reach 100% as of June 2020.</p> - -<p><img src="/images/blog/meituan/chart-09.jpeg" alt="" /></p> - -<p>Figure 11. Meituan SLA compliance after Apache Kylin optimization</p> - -<h2 id="taking-on-the-future-with-apache-kylin"><strong>Taking on the Future with Apache Kylin</strong></h2> - -<p>Over the past four years, Meituanâs technical team has accumulated a great deal of experience in optimizing query performance and build efficiency with Apache Kylin. But Meituanâs success is also the story of open sourceâs success.</p> - -<p>The<a href="http://kylin.apache.org/community/"> Apache Kylin community</a> has many active and outstanding code contributors (<a href="https://kyligence.io/comparing-kylin-vs-kyligence/">including Kyligence</a>), who are relentlessly working to expand the Kylin ecosystem and add more new features. Itâs in sharing success stories like this that Apache Kylin is able to remain the leading open source solution for analytics on massive datasets.</p> - -<p>Together, with the entire Apache Kylin community, Meituan is making sure critical analytics work can remain unburdened by growing datasets, and that when the next major shift in business takes place, industry leaders like Meituan will be able to analyze whatâs happening and quickly take action.</p> -</description> - <pubDate>Tue, 03 Aug 2021 08:00:00 -0700</pubDate> - <link>http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</link> - <guid isPermaLink="true">http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</guid> - - - <category>blog</category> - - </item> - - <item> <title>Apache kylin4 æ°æ¶æå享</title> <description><p>è¿ç¯æç« 主è¦å为以ä¸å 个é¨åï¼<br /> - Apache Kylin 使ç¨åºæ¯<br /> @@ -836,314 +1564,6 @@ For example, a query joins two subquerie <category>blog</category> - - </item> - - <item> - <title>æèµä¸ºä»ä¹éæ© Kylin4</title> - <description><p>å¨ 2021å¹´5æ29æ¥ä¸¾åç QCon å ¨ç软件å¼åè 大ä¼ä¸ï¼æ¥èªæèµçæ°æ®åºç¡å¹³å°è´è´£äºº éçä¿ å¨å¤§æ°æ®å¼æºæ¡æ¶ä¸åºç¨ä¸é¢ä¸åäº«äºæèµå é¨å¯¹ Kylin 4.0 ç使ç¨ç»ååä¼åå®è·µï¼å¯¹äºä¼å¤ Kylin èç¨æ·æ¥è¯´ï¼è¿ä¹æ¯å级 Kylin 4 çå®ç¨æ»ç¥ã</p> - -<p>æ¬æ¬¡å享主è¦å为以ä¸å个é¨åï¼</p> - -<ul> - <li>æèµéç¨ Kylin 4 çåå </li> - <li>Kylin 4 åçä»ç»</li> - <li>Kylin 4 æ§è½ä¼å</li> - <li>Kylin 4 卿èµçå®è·µ</li> -</ul> - -<h2 id="kylin-4-">01 æèµéç¨ Kylin 4 çåå </h2> -<p>é¦å å享æèµä¸ºä»ä¹ä¼éæ©å级为 Kylin 4ï¼è¿éå ç®åå顾ä¸ä¸æèµ OLAP çåå±åç¨ï¼æèµåæä¸ºäºå¿«éè¿ä»£ï¼éæ©äºé¢è®¡ç® + MySQL çæ¹å¼ï¼2018å¹´ï¼å 为æ¥è¯¢çµæ´»åå¼åæçå¼å ¥äº Druidï¼ä½æ¯åå¨é¢èå度ä¸é«ã䏿¯æç²¾ç¡®å»éåæç» OLAP çé®é¢ï¼å¨è¿æ ·çèæ¯ä¸ï¼æèµå¼å ¥äºæ»¡è¶³èå度é«ãæ¯æç²¾ç¡®å»éå RT æä½ç Apache Kylin åæ¥è¯¢éå¸¸çµæ´»ç ROLAP ClickHouseã</p> - -<p>ä»2018å¹´å¼å ¥ Kylin å°ç°å¨ï¼æèµå·²ç»ä½¿ç¨ Kylin ä¸å¹´å¤äºãéçä¸å¡åºæ¯ç䏿䏰å¯åæ°æ®éçä¸æç§¯ç´¯ï¼æèµç®åæ 600 ä¸çåéåå®¶ï¼2020å¹´ GMV æ¯ 1073äº¿ï¼æ¥æå»ºé为 100 亿+ï¼ç®å Kylin å·²ç»åºæ¬è¦çäºæèµææçä¸å¡èå´ã</p> - -<p>éçæèµèªèº«çè¿ éåå±åä¸ææ·±å ¥å°ä½¿ç¨ Kylinï¼æä»¬ä¹éå°ä¸äºææï¼<br /> -- é¦å Kylin on HBase çæå»ºæ§è½æ æ³æ»¡è¶³æèµçé¢æï¼æå»ºæ§è½ä¼å½±åå°ç¨æ·çæ 鿢夿¶é´åç¨³å®æ§çä½éªï¼<br /> -- å ¶æ¬¡ï¼éçæ´å¤å¤§åå®¶ï¼ååºåä¸çº§å«ä¼åãæ°åä¸ååï¼çæ¥å ¥ï¼å¯¹æä»¬çæ¥è¯¢ä¹å¸¦æ¥äºå¾å¤§çææãKylin on HBase åéäº QueryServer åç¹æ¥è¯¢çå±éï¼æ æ³å¾å¥½å°æ¯æè¿äºå¤æçåºæ¯ï¼<br /> -- æåï¼å 为 HBase 䏿¯ä¸ä¸ªäºåçç³»ç»ï¼å¾é¾åå°å¼¹æ§çèµæºä¼¸ç¼©ï¼éçæ°æ®éç䏿å¢é¿ï¼è¿ä¸ªç³»ç»å¯¹äºåå®¶èè¨ï¼ä½¿ç¨æ¶é´æ¯åå¨é«å³°åä½è°·çï¼è¿å°±é æå¹³åçèµæºä½¿ç¨çä¸å¤é«ã</p> - -<p>é¢å¯¹è¿äºææï¼æèµéæ©å»åæ´äºåçç Apache Kylin 4 å»é æ¢åå级ã</p> - -<h2 id="kylin-4--1">02 Kylin 4 åçä»ç»</h2> -<p>é¦å ä»ç»ä¸ä¸ Kylin 4 ç主è¦ä¼å¿ãApache Kylin 4 æ¯å®å ¨åºäº Spark å»åæå»ºåæ¥è¯¢çï¼è½å¤å åå°å©ç¨ Sparkçå¹¶è¡åãåéååå ¨å±å¨æä»£ç çæçææ¯ï¼å»æé«å¤§æ¥è¯¢çæçã<br />
[... 282 lines stripped ...]