images...

lidong Fri, 18 Mar 2022 07:13:35 -0700

Modified: kylin/site/feed.xml
URL: 
http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1899035&r1=1899034&r2=1899035&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Fri Mar 18 14:13:30 2022
@@ -19,11 +19,739 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml"; rel="self" 
type="application/rss+xml"/>
-    <pubDate>Thu, 10 Mar 2022 20:07:16 -0800</pubDate>
-    <lastBuildDate>Thu, 10 Mar 2022 20:07:16 -0800</lastBuildDate>
+    <pubDate>Fri, 18 Mar 2022 06:59:44 -0700</pubDate>
+    <lastBuildDate>Fri, 18 Mar 2022 06:59:44 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>
+        <title>å®æï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</title>
+        <description>&lt;h2 id=&quot;emr--kylin--glue-&quot;&gt;ä¸ºä»ä¹å¨ 
EMR é¨ç½² Kylin éè¦æ¯æ Glue ï¼&lt;/h2&gt;
+
+&lt;h3 id=&quot;aws-glue&quot;&gt;ä»ä¹æ¯ AWS Glueï¼&lt;/h3&gt;
+
+&lt;p&gt;AWS Glue æ¯ä¸é¡¹å®å¨æç®¡ç ETLï¼æåãè½¬æ¢åå 
è½½ï¼æå¡ï¼ä½¿ AWS 
ç¨æ·è½å¤è½»æ¾èç»æµé«æå°å¯¹æ°æ®è¿è¡åç±»ãæ¸çåæ©å
ï¼å¹¶å¨åç§æ°æ®åå¨ä¹é´å¯é å°ç§»å¨æ°æ®ãAWS Glue 
ç±ä¸ä¸ªç§°ä¸º AWS Glue æ°æ®ç®å½çä¸å¤®å
æ°æ®åå¨åºãä¸ä¸ªèªå¨çæä»£ç ç ETL 
å¼æä»¥åä¸ä¸ªå¤çä¾èµé¡¹è§£æãä½ä¸çæ§åéè¯ççµæ´»è®¡åç¨åºç»æãAWS
 Glue æ¯æ æå¡å¨æå¡ï¼å æ¤æ 
éè®¾ç½®æç®¡çåºç¡è®¾æ½ã&lt;/p&gt;
+
+&lt;h3 id=&quot;kylin--aws-glue-catalog&quot;&gt;Kylin ä¸ºä»ä¹éè¦æ¯æ 
AWS Glue Catalogï¼&lt;/h3&gt;
+
+&lt;p&gt;ç®åç¤¾åºæå¾å¤ Kylin ç¨æ·å¨ä½¿ç¨ AWS EMRï¼ç»ä»¶ä¸»è¦å
æ¬ HadoopãSparkãHiveãPresto çï¼å¦ææ²¡æéç½®ä½¿ç¨ AWS Glue data 
Catalogï¼é£ä¹å¨åä¸ªæ°æ®ä»åºç»ä»¶å¦ HiveãSparkãPresto 
å»ºçæ°æ®è¡¨ï¼å¨å¶å®ç»ä»¶ä¸æ¯æ¾ä¸å°çï¼ä¹å°±ä¸è½ä½¿ç¨ï¼å
¬å¸åºå±çæ°æ®ä»åºæ¯æä¾ç»åä¸ªä¸å¡é¨é¨æ¥è¿è¡ä½¿ç¨ï¼ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼å¨åå»º
 AWS EMR éç¾¤æ¶å°±å¯ä»¥ä½¿ç¨ AWS Glue data Catalog æ¥åå¨å
æ°æ®ï¼å¯¹åä¸ªç»ä»¶å±äº«æ°æ®æºï¼å¯¹åä¸ªä¸å¡é¨é¨è¿è¡å±äº«æ�
 
�°æ®æºï¼å°åä¸ªä¸å¡é¨é¨çæ°æ®æå»ºæä¸ä¸ªå¤§çæ°æ®ç«æ¹ä½ï¼è½å¤å¿«éååºå
¬å¸é«éåå±çä¸å¡éæ±ã&lt;br /&gt;
+ç°ä»£å¬å¸çæ°æ®é½æ¯åºäºäºå¹³å°æå»ºï¼å¤§æ°æ®å¢éä½¿ç¨ç 
AWS EMR æ¥è¿è¡æ°æ®å 
å·¥ãæ°æ®åæãä»¥åæ¨¡åè®ç»ï¼éçæ°æ®æ´å¢å¸¦æ¥ææ°æ
¢ãææ°é¾ï¼EMR/Spark/Hive 
å¾é¾æ»¡è¶³æ°æ®åæå¸ãè¿è¥äººåãéå®çå¿«éæ¥è¯¢æ°æ®çéæ±ï¼äºæ¯ä¸äºç¨æ·éæ©äº
 Apache Kylin ä½ä¸ºå¼æº OLAP è§£å³æ¹æ¡ã&lt;br /&gt;
+ä½æ¯æè¿ç¤¾åºç¨æ·èç³»å°æä»¬ï¼åç¥ Kylin 4 è¿ä¸æ¯æä» Glue 
è¯»åè¡¨å
æ°æ®ï¼æä»¥æä»¬åç¤¾åºç¨æ·åä½ä¸èµ·æ£æ¥è¿ééå°çé®é¢å¹¶æç»è§£å³äºé®é¢ï¼ä»èä½¿å¾
 Kylin 4 æ¯æäº AWS Glue Catalogï¼è¿æ ·å¸¦æ¥çå¥½å¤å¨äº 
HiveãPrestoãSparkãKylin ä¸å¯ä»¥å
±äº«è¡¨åæ°æ®ï¼ä½¿å¾æ¯ä¸ªä¸»é¢é½ä¸²èèµ·æ¥å½¢æä¸ä¸ªå¤§çæ°æ®åæå¹³å°ï¼æç
 ´åæ°æ®éç¢ã&lt;/p&gt;
+
+&lt;h3 id=&quot;apache-kylin--aws-glue-&quot;&gt;Apache Kylin æ¯æ AWS Glue 
åï¼&lt;/h3&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th&gt;Â &lt;/th&gt;
+      &lt;th&gt;æ¯æ Glue ç Kylin çæ¬&lt;/th&gt;
+      &lt;th&gt;Issue Link&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Kylin on HBase (Before Kylin 4)&lt;/td&gt;
+      &lt;td&gt;2.6.6 or higher&lt;br /&gt; 3.1.0 or higher&lt;/td&gt;
+      &lt;td&gt;https://issues.apache.org/jira/browse/KYLIN-4206&lt;br 
/&gt;https://zhuanlan.zhihu.com/p/99481373&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Kylin on Parquet&lt;/td&gt;
+      &lt;td&gt;4.0.1 or higher&lt;/td&gt;
+      &lt;td&gt;æ¬æã&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;h2 id=&quot;section&quot;&gt;é¨ç½²ååå¤&lt;/h2&gt;
+
+&lt;h3 id=&quot;section-1&quot;&gt;è½¯ä»¶ä¿¡æ¯ä¸è§&lt;/h3&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th&gt;&lt;strong&gt;Software&lt;/strong&gt;&lt;/th&gt;
+      &lt;th&gt;&lt;strong&gt;Version&lt;/strong&gt;&lt;/th&gt;
+      &lt;th&gt;Reference&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Apache Kylin&lt;/td&gt;
+      &lt;td&gt;4.0.1 or higher&lt;/td&gt;
+      &lt;td&gt;å¿é¡»æ¯ 4.0.1 ä»¥åä¸ï¼è¯¦æåè &lt;a 
href=&quot;https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency&quot;&gt;KIP
 10 refactor hive and hadoop dependency&lt;/a&gt;.&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;AWS EMR&lt;/td&gt;
+      &lt;td&gt;6.5.0 or higher&lt;br /&gt;5.33.1 or higher&lt;/td&gt;
+      &lt;td&gt;è¦çEMR 6 / EMR 5 çè¾æ°çæ¬ï¼&lt;a 
href=&quot;https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html&quot;&gt;Amazon
 EMR release 6.5.0 - Amazon EMR&lt;/a&gt;.&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;h3 id=&quot;glue-&quot;&gt;åå¤ Glue æ°æ®åºåè¡¨&lt;/h3&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;åå»º AWS EMR éç¾¤ã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;è¿éå¯å¨ä¸ä¸ª EMR çéç¾¤ï¼éè¦æ³¨æçæ¯ï¼è¿ééè¿é
ç½® &lt;code 
class=&quot;highlighter-rouge&quot;&gt;hive.metastore.client.factory.class&lt;/code&gt;
 å¯å¨äº Glue å¤é¨åæ°æ®ãä»¥ä¸å½ä»¤å¯ä»¥ä½ä¸ºåèã&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;aws emr create-cluster 
--applications &lt;span class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Hadoop &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Hive &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Spark &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;ZooKeeper &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Tez &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Ganglia &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --ec2-attributes &lt;span class=&quot;k&quot;&gt;${}&lt;/span&gt; &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --release-label emr-6.5.0 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --log-uri &lt;span class=&quot;k&quot;&gt;${}&lt;/span&gt; &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --instance-groups &lt;span class=&quot;k&quot;&gt;${}&lt;/span&gt; &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --configurations &lt;span 
class=&quot;s1&quot;&gt;&#39;[{&quot;Classification&quot;:&quot;hive-site&quot;,&quot;Properties&quot;:{&quot;hive.metastore.client.factory.class&quot;:&quot;com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory&quot;}}]&#39;&lt;/span&gt;
 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --auto-scaling-role EMR_AutoScaling_DefaultRole &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --ebs-root-volume-size 100 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --service-role EMR_DefaultRole &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --enable-debugging &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --name &lt;span 
class=&quot;s1&quot;&gt;&#39;Kylin4_on_EMR65_with_Glue&#39;&lt;/span&gt; 
&lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --region cn-northwest-1
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;ç»å½ Master èç¹ï¼å¹¶ä¸æ£æ¥ Hadoop çæ¬ å Hadoop 
éç¾¤æ¯å¦å¯å¨æåã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;optional&quot;&gt;è·åç¯å¢ä¿¡æ¯ï¼Optionalï¼&lt;/h3&gt;
+
+&lt;blockquote&gt;
+  &lt;p&gt;å¦æä½ ä½¿ç¨ RDS æèå¶ä»åæ°æ®åå¨ï¼è¯·éæ
è·³è¿æ¤æ¥ã&lt;/p&gt;
+&lt;/blockquote&gt;
+
+&lt;p&gt;ç±äº Kylin 4.X æ¨èä½¿ç¨ RDBMS ä½ä¸ºå
æ°æ®åå¨ï¼å¤äºæµè¯ç®çï¼è¿éä½¿ç¨ Master èç¹èªå¸¦ç 
MariaDB ä½ä¸ºåæ°æ®åå¨ï¼å³äº MariaDB çä¸»æºåç§°ãè´¦å·ãå¯ç 
çä¿¡æ¯ï¼å¯ä»¥ä» &lt;code 
class=&quot;highlighter-rouge&quot;&gt;/etc/hive/conf/hive-site.xml&lt;/code&gt;
 è·åã&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;kylin.metadata.url&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;kylin4_on_cloud@jdbc,url&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;jdbc:mysql://&lt;span 
class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span 
class=&quot;nv&quot;&gt;HOSTNAME&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;}&lt;/span&gt;:3306/hue,username&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;hive,password&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span 
class=&quot;nv&quot;&gt;PASSWORD&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;}&lt;/span&gt;,maxActive&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;10,maxIdle&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;10,driverClassName&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;org.mariadb.jdbc.Driver  
+kylin.env.zookeeper-connect-string&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span 
class=&quot;nv&quot;&gt;HOSTNAME&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;}&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;è·åè¿äºä¿¡æ¯åï¼å¹¶ä¸æ¿æ¢ä»¥ä¸ Kylin é
ç½®é¡¹éé¢çåéï¼å¦ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;${PASSWORD}&lt;/code&gt;ï¼ä¿åå°æ¬å°ï¼ä¾ä¸ä¸æ¥å¯å¨
 Kylin è¿ç¨ä½¿ç¨ã&lt;/p&gt;
+
+&lt;h3 id=&quot;spark-sql--aws-glue-&quot;&gt;æµè¯ Spark SQL å AWS Glue 
çè¿éæ§&lt;/h3&gt;
+
+&lt;p&gt;éè¿ spark-sql æ¥æµè¯ AWS ç Spark SQL æ¯å¦è½å¤éè¿ Glue 
è·åæ°æ®åºåè¡¨çå
æ°æ®ï¼é¦æ¬¡ä¼åç°å¯å¨æ¥éå¤±è´¥ã&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;å¶éè¿ä»¥ä¸å½ä»¤æ¿æ¢ Spark ä½¿ç¨ç &lt;code 
class=&quot;highlighter-rouge&quot;&gt;hive-site.xml&lt;/code&gt;ã&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; /etc/spark/conf
+sudo mv hive-site.xml hive-site.xml.bak
+sudo cp /etc/hive/conf/hive-site.xml .
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;å¹¶ä¸ä¿®æ¹ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;/etc/spark/conf/hive-site.xml&lt;/code&gt;
 æä»¶ä¸ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;hive.execution.engine&lt;/code&gt; 
çå¼ä¸º&lt;code 
class=&quot;highlighter-rouge&quot;&gt;mr&lt;/code&gt;ï¼åæ¬¡å°è¯å¯å¨ 
Spark-SQL CLIï¼éªè¯å¯¹ Glue çè¡¨æ°æ®æ§è¡æ¥è¯¢æåã&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;kylin-spark-enginejaroptional&quot;&gt;åå¤ 
kylin-spark-engine.jarï¼Optionalï¼&lt;/h3&gt;
+
+&lt;blockquote&gt;
+  &lt;p&gt;å¦æ Apache Kylin 4.0.2 
å·²ç»åå¸ï¼é£ä¹åºè¯¥å·²ç»ä¿®æ¹è¯¥é®é¢ï¼å¯ä»¥è·³è¿æ¤æ¥ãå¦åè¯·åèä»¥ä¸æ¥éª¤ï¼æ¿æ¢
 &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin-spark-engine.jar&lt;/code&gt;ï¼&lt;/p&gt;
+&lt;/blockquote&gt;
+
+&lt;p&gt;åèä¸é¢çå½ä»¤ï¼åé kylin ä»åºï¼æ§è¡ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;mvn clean package 
-DskipTests&lt;/code&gt;ï¼è·å &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar&lt;/code&gt;
 ã&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;git clone 
https://github.com/hit-lacus/kylin.git
+&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;kylin
+git checkout KYLIN-5160
+mvn clean package -DskipTests
+
+&lt;span class=&quot;c&quot;&gt;# find -name 
kylin-spark-engine-4.0.0-SNAPSHOT.jar 
kylin-spark-project/kylin-spark-engine/target&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Patch link: &lt;a 
href=&quot;https://github.com/apache/kylin/pull/1819&quot;&gt;https://github.com/apache/kylin/pull/1819&lt;/a&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;kylin--glue&quot;&gt;é¨ç½² Kylin å¹¶è¿æ¥ Glue&lt;/h2&gt;
+
+&lt;h3 id=&quot;kylin&quot;&gt;ä¸è½½ Kylin&lt;/h3&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;
+    &lt;p&gt;ä¸è½½å¹¶è§£å Kylin ï¼è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç 
Kylin packageï¼å·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ spark2 ç packageï¼EMR 6.X 
ä½¿ç¨ spark3 ç packageã&lt;br /&gt;
+ &lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+ # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz .
+ # wget apache-kylin-4.0.1-bin-spark3.tar.gz
+ tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz .
+ cd apache-kylin-4.0.1-bin-spark3
+ export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3
+&lt;/code&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;è·å RDBMS ç é©±å¨ jarï¼Optionalï¼&lt;/p&gt;
+
+    &lt;blockquote&gt;
+      &lt;p&gt;å¦æä½ æ¯ç¨å«ç RDBMS ä½ä¸ºå
æ°æ®åå¨ï¼è¯·è·³è¿æ¤æ¥éª¤ã&lt;/p&gt;
+    &lt;/blockquote&gt;
+
+    &lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+ cd $KYLIN_HOME
+ mkdir ext
+ cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext
+&lt;/code&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;h3 id=&quot;spark&quot;&gt;åå¤ Spark&lt;/h3&gt;
+
+&lt;p&gt;ç±äº AWS Spark åç½®å¯¹ AWS Glue çæ¯æï¼æä»¥ 
&lt;strong&gt;å è½½è¡¨åæ°æ®åæ§è¡æå»ºéè¦ä½¿ç¨ AWS 
Spark&lt;/strong&gt;ï¼ä½æ¯èèå° Kylin 4.0.1 æ¯æ¯æ Apache 
Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark ææ¯è¾å¤§çä»£ç 
ä¿®æ¹ï¼ä¸¤èå¼å®¹æ§è¾å·®ï¼æä»¥&lt;strong&gt;æ¥è¯¢ Cube éè¦ä½¿ç¨ 
Apache Spark&lt;/strong&gt;ãç»¼ä¸æè¿°ï¼éè¦æ ¹æ® Kylin 
éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æå»ºä»»å¡ï¼æ¥åæ¢æä½¿ç¨çç 
Sparkã&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;åå¤ AWS Spark&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+mkdir ext
+cp /usr/lib/hive/lib/mariadb-connector-java.jar &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;/ext
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;åå¤ Apache Spark
+    &lt;ul&gt;
+      &lt;li&gt;è¯·æ ¹æ® EMR ççæ¬éæ©å¯¹åºç Spark  çæ¬å®è£å
ï¼å·ä½æ¥è¯´ï¼EMR 5.X ä½¿ç¨ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;Spark 2.4.7&lt;/code&gt; ç Spark å®è£
åï¼EMR 6.X ä½¿ç¨ &lt;code class=&quot;highlighter-rouge&quot;&gt;Spark 
3.1.2&lt;/code&gt; ç Spark å®è£åã&lt;br /&gt;
+&lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+cd $KYLIN_HOME
+aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or 
downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website
+tar zxvf spark-2.4.7-bin-hadoop2.7.tgz
+mv spark-2.4.7-bin-hadoop2.7 spark-apache
+&lt;/code&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;å ä¸ºè¦åå è½½ Glue 
è¡¨ï¼æä»¥è¿ééè¿è½¯é¾æ¥å°&lt;code 
class=&quot;highlighter-rouge&quot;&gt;$KYLIN_HOME/spark&lt;/code&gt;æå AWS 
Sparkï¼è¯·æ³¨ææ éè®¾ç½® &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SPARK_HOME&lt;/code&gt;ï¼å ä¸ºå¨ 
&lt;code class=&quot;highlighter-rouge&quot;&gt;$KYLIN_HOME/spark&lt;/code&gt; 
åå¨å¹¶ä¸ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SPARK_HOME&lt;/code&gt; æªè®¾ç½®çæ
åµä¸ï¼Kylin ä¼é»è®¤ä½¿ç¨ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;$KYLIN_HOME/spark&lt;/code&gt; 
ã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;ln -s spark-aws spark
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h3 id=&quot;kylin-&quot;&gt;ä¿®æ¹ Kylin å¯å¨èæ¬&lt;/h3&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;å¯å¨ Spark SQL CLIï¼ä¸éåº&lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;éè¿ &lt;code class=&quot;highlighter-rouge&quot;&gt;jps -ml 
${PID}&lt;/code&gt; è·å &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SparkSQLCLIDriver&lt;/code&gt; ç 
PIDï¼ç¶åè·å Driver ç &lt;code 
class=&quot;highlighter-rouge&quot;&gt;spark.driver.extraClasspath&lt;/code&gt;ãæè
ä¹å¯ä»¥ä» &lt;code 
class=&quot;highlighter-rouge&quot;&gt;/etc/spark/conf/spark-defaults.conf&lt;/code&gt;
 è·åã&lt;br /&gt;
+ &lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+ jps -ml | grep SparkSubmit
+ jinfo ${PID} | grep &quot;spark.driver.extraClassPath&quot;
+&lt;/code&gt;&lt;br /&gt;
+ &lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;ç¼è¾ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;bin/kylin.sh&lt;/code&gt;ï¼ä¿®æ¹ 
&lt;code 
class=&quot;highlighter-rouge&quot;&gt;KYLIN_TOMCAT_CLASSPATH&lt;/code&gt; 
åéï¼è¿½å  &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin_driver_classpath&lt;/code&gt; 
ï¼ä¿åå¥½ &lt;code 
class=&quot;highlighter-rouge&quot;&gt;bin/kylin.sh&lt;/code&gt; åéåº 
Spark SQL CLI&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;ä¿®æ¹åç kylin.sh&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;éå¯¹ EMR 6.5.0ï¼ä¿®æ¹åç kylin.shï¼&lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin_driver_classpath&lt;/code&gt; 
æ¾å°æåã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;éå¯¹ EMR 5.33.1ï¼ä¿®æ¹åç kylin.shï¼&lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin_driver_classpath&lt;/code&gt; 
æ¾å° &lt;code 
class=&quot;highlighter-rouge&quot;&gt;$SPARK_HOME/jars&lt;/code&gt; 
ä¹åã&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;kylin-1&quot;&gt;éç½® Kylin&lt;/h3&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+vim conf/kylin.properties 
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h4 id=&quot;minimal-kylin-configuration&quot;&gt;Minimal Kylin 
Configuration&lt;/h4&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th&gt;Property Key&lt;/th&gt;
+      &lt;th&gt;Property Value(Example)&lt;/th&gt;
+      &lt;th&gt;Notes&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;kylin.metadata.url&lt;/td&gt;
+      
&lt;td&gt;kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver&lt;/td&gt;
+      &lt;td&gt;N/A&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;kylin.env.zookeeper-connect-string&lt;/td&gt;
+      &lt;td&gt;${HOSTNAME}&lt;/td&gt;
+      &lt;td&gt;N/A&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;kylin.engine.spark-conf.spark.driver.extraClassPath&lt;/td&gt;
+      
&lt;td&gt;/usr/lib/hadoop-lzo/lib/&lt;em&gt;:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/&lt;/em&gt;:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar&lt;/td&gt;
+      &lt;td&gt;Copied from spark.driver.extraClasspath in 
/etc/spark/conf/spark-default.conf&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;h3 id=&quot;kylin--1&quot;&gt;å¯å¨ Kylin å¹¶éªè¯æå»º&lt;/h3&gt;
+
+&lt;h4 id=&quot;kylin-2&quot;&gt;å¯å¨ Kylin&lt;/h4&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+ln -s spark spark_aws &lt;span class=&quot;c&quot;&gt;# skip this step if soft 
link &#39;spark&#39; exists &lt;/span&gt;
+bin/kylin.sh restart
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h4 id=&quot;kylin-spark-enginejar-optional&quot;&gt;æ¿æ¢ 
kylin-spark-engine.jar (Optional)&lt;/h4&gt;
+
+&lt;blockquote&gt;
+  &lt;p&gt;ä»å¯¹äº 4.0.1 éè¦æä½è¯¥æ¥éª¤ã&lt;/p&gt;
+&lt;/blockquote&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;/tomcat/webapps/kylin/WEB-INF/lib/
+mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak &lt;span 
class=&quot;c&quot;&gt;# remove old one &lt;/span&gt;
+cp kylin-spark-engine-4.0.0-SNAPSHOT.jar  .
+
+bin/kylin.sh restart &lt;span class=&quot;c&quot;&gt;# restart kylin to make 
new jar be loaded&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h4 id=&quot;glue--1&quot;&gt;å è½½ Glue è¡¨ãæå»º&lt;/h4&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;å è½½ Glue è¡¨åæ°æ®&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;åå»º Model å Cubeï¼ç¶åè§¦åæå»º&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;section-2&quot;&gt;éªè¯æ¥è¯¢&lt;/h3&gt;
+
+&lt;p&gt;åæ¢ Kylin ä½¿ç¨ç Sparkï¼éå¯ Kylinã&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+rm spark &lt;span class=&quot;c&quot;&gt;# &#39;spark&#39; is a soft link, it 
is point to aws spark&lt;/span&gt;
+ln -s spark_apache spark &lt;span class=&quot;c&quot;&gt;# switch from aws 
spark to apache spark&lt;/span&gt;
+bin/kylin.sh restart
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;æ§è¡æµè¯æ¥è¯¢ï¼æ¥è¯¢æå&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/17_verify_query_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;section-3&quot;&gt;è®¨è®ºåé®ç&lt;/h2&gt;
+
+&lt;h3 id=&quot;sparkaws-spark--apache-spark&quot;&gt;ä¸ºä»ä¹å¿
é¡»ä½¿ç¨ä¸¤ä¸ª Sparkï¼AWS Spark &amp;amp; Apache Sparkï¼ï¼&lt;/h3&gt;
+
+&lt;p&gt;ç±äº AWS Spark åç½®å¯¹ AWS Glue Catalog çæ¯æï¼å¹¶ä¸å 
è½½è¡¨åæå»ºå¼æéè¦è·åè¡¨ï¼æä»¥&lt;strong&gt;å è½½è¡¨å
æ°æ®åæ§è¡æå»ºéè¦ä½¿ç¨ AWS Spark&lt;/strong&gt;ï¼ä½æ¯èèå° 
Kylin 4.0.1 æ¯æ¯æ Apache Sparkï¼å¹¶ä¸ AWS Spark ç¸å¯¹ Apache Spark 
ææ¯è¾å¤§çä»£ç ä¿®æ¹ï¼é æä¸¤èå
¼å®¹æ§è¾å·®ï¼æä»¥&lt;strong&gt;æ¥è¯¢ Cube éè¦ä½¿ç¨ Apache 
Spark&lt;/strong&gt;ãç»¼ä¸æè¿°ï¼éè¦æ ¹æ® Kylin 
éè¦æ§è¡æ¥è¯¢ä»»å¡è¿æ¯æå»ºä»»å¡ï¼æ¥åæ¢æä½¿ç¨çç 
Sparkã&lt;br /&gt;
+å¨å®éä½¿ç¨è¿ç¨ä¸ï¼å¯ä»¥èè Job Nodeï¼æå»ºä»»å¡ï¼ä½¿ç¨ AWS 
Sparkï¼Query Nodeï¼æ¥è¯¢ä»»å¡ï¼ä½¿ç¨ Apache Sparkã&lt;/p&gt;
+
+&lt;h3 id=&quot;kylinsh&quot;&gt;ä¸ºä»ä¹éè¦ä¿®æ¹ kylin.shï¼&lt;/h3&gt;
+
+&lt;p&gt;Kylin è¿ç¨ä½ä¸º Spark Driver éè¦éè¿&lt;code 
class=&quot;highlighter-rouge&quot;&gt;aws-glue-datacatalog-spark-client.jar&lt;/code&gt;å
 è½½è¡¨åæ°æ®ï¼æä»¥è¿åéè¦ä¿®æ¹ kylin.shï¼å°ç¸å³ jar å è½½å° 
Kylin è¿ç¨ç classpathã&lt;/p&gt;
+</description>
+        <pubDate>Thu, 17 Mar 2022 04:00:00 -0700</pubDate>
+        
<link>http://kylin.apache.org/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</guid>
+        
+        
+        <category>cn_blog</category>
+        
+      </item>
+    
+      <item>
+        <title>Kylin 4 now is supporting AWS Glue Catalog</title>
+        <description>&lt;h2 
id=&quot;why-does-installing-kylin-on-emr-need-to-support-aws-glue&quot;&gt;Why 
does installing Kylin on EMR need to support AWS Glue?&lt;/h2&gt;
+
+&lt;h3 id=&quot;what-is-aws-glue&quot;&gt;What is AWS Glue?&lt;/h3&gt;
+
+&lt;p&gt;AWS Glue is a fully hosted ETL (Extract, Transform, and Load) service 
that enables AWS users to easily and cost-effectively classify, cleanse, enrich 
data and move data between various data storages. AWS Glue consists of a 
central metastore called AWS Glue Data Catalog, an ETL engine that can 
automatically generate code and a flexible scheduler that can handle dependency 
resolution, monitor jobs and retry. AWS Glue is a serverless service, so there 
is no infrastructure to set up or manage.&lt;/p&gt;
+
+&lt;h3 id=&quot;why-does-kylin-need-aws-glue-catalog&quot;&gt;Why does Kylin 
need AWS Glue Catalog?&lt;/h3&gt;
+
+&lt;p&gt;At present, many users in the Kylin community use AWS EMR for running 
large-scale distributed data processing jobs on Hadoop, Spark, Hive, Presto, 
etc. Without AWS Glue Data Catalog, tables built on these data warehouse 
components (like Hive, Spark and Presto) can not be used by any other 
components. As the data warehouse needs to answer requirements from various 
business departments, they use AWS Glue Data Catalog for metadata storage when 
creating the AWS EMR clusters, to share the data sources among different 
components and business departments. That is, to build one data cube with data 
from each business department, so they can provide quick responses to different 
business requirements.&lt;br /&gt;
+In modern companies, data is saved on cloud object storage and big data teams 
use AWS EMR for data processing, data analysis and model training. But with 
data explosion, it becomes really difficult to extract data and the response 
time is too long. In other words, the solution of EMR + Spark/Hive cannot meet 
the speedy data query requirements from data analysts, O&amp;amp;M personnel 
and sales. So some users turn to Apache Kylin as their open-source OLAP 
solution.&lt;br /&gt;
+Recently, our users approached us with the request that Kylin 4 could directly 
read table metadata from AWS Glue. After some collaboration, now Kylin 4 
supports AWS Glue Catalog, making it possible for tables and data to be shared 
among Hive, Presto, Spark and Kylin. This helps to break down the metadata 
barrier, so different topics can be combined to form a big data analysis 
platform.&lt;/p&gt;
+
+&lt;h3 id=&quot;does-kylin-support-aws-glue&quot;&gt;Does Kylin support AWS 
Glue?&lt;/h3&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th&gt;Â &lt;/th&gt;
+      &lt;th&gt;Kylin version which supports Glue&lt;/th&gt;
+      &lt;th&gt;Issue Link&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Kylin on HBase (Before Kylin 4)&lt;/td&gt;
+      &lt;td&gt;2.6.6 or higher&lt;br /&gt;3.1.0 or higher&lt;/td&gt;
+      &lt;td&gt;https://issues.apache.org/jira/browse/KYLIN-4206&lt;br 
/&gt;https://zhuanlan.zhihu.com/p/99481373&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Kylin on Parquet&lt;/td&gt;
+      &lt;td&gt;4.0.1 or higher&lt;/td&gt;
+      &lt;td&gt;This article.&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;h2 id=&quot;prerequisites-for-deployment&quot;&gt;Prerequisites for 
deployment&lt;/h2&gt;
+
+&lt;h3 id=&quot;software-version&quot;&gt;Software Version&lt;/h3&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th&gt;&lt;strong&gt;Software&lt;/strong&gt;&lt;/th&gt;
+      &lt;th&gt;&lt;strong&gt;Version&lt;/strong&gt;&lt;/th&gt;
+      &lt;th&gt;Reference&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;Apache Kylin&lt;/td&gt;
+      &lt;td&gt;4.0.1 or higher&lt;/td&gt;
+      &lt;td&gt;&lt;a 
href=&quot;https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency&quot;&gt;KIP
 10 refactor hive and hadoop dependency&lt;/a&gt;.&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;AWS EMR&lt;/td&gt;
+      &lt;td&gt;6.5.0 or higher&lt;br /&gt;5.33.1 or higher&lt;/td&gt;
+      &lt;td&gt;&lt;a 
href=&quot;https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html&quot;&gt;Amazon
 EMR release 6.5.0 - Amazon EMR&lt;/a&gt;.&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;h3 id=&quot;prepare-aws-glue-database-and-tables&quot;&gt;Prepare AWS Glue 
database and tables&lt;/h3&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Create an EMR cluster.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Note: Parameter hive.metastore.client.factory.class is configured to 
enable AWS Glue. For details, you may refer to the commands below.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;aws emr create-cluster 
--applications &lt;span class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Hadoop &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Hive &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Spark &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;ZooKeeper &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Tez &lt;span 
class=&quot;nv&quot;&gt;Name&lt;/span&gt;&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;Ganglia &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --ec2-attributes &lt;span class=&quot;k&quot;&gt;${}&lt;/span&gt; &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --release-label emr-6.5.0 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --log-uri &lt;span class=&quot;k&quot;&gt;${}&lt;/span&gt; &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --instance-groups &lt;span class=&quot;k&quot;&gt;${}&lt;/span&gt; &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --configurations &lt;span 
class=&quot;s1&quot;&gt;&#39;[{&quot;Classification&quot;:&quot;hive-site&quot;,&quot;Properties&quot;:{&quot;hive.metastore.client.factory.class&quot;:&quot;com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory&quot;}}]&#39;&lt;/span&gt;
 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --auto-scaling-role EMR_AutoScaling_DefaultRole &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --ebs-root-volume-size 100 &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --service-role EMR_DefaultRole &lt;span 
class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --enable-debugging &lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --name &lt;span 
class=&quot;s1&quot;&gt;&#39;Kylin4_on_EMR65_with_Glue&#39;&lt;/span&gt; 
&lt;span class=&quot;se&quot;&gt;\&lt;/span&gt;
+  --region cn-northwest-1
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Log in to the Master node. Check the Hadoop version and whether 
the Hadoop cluster is successfully started.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;optionalget-environmental-information&quot;&gt;(Optional)Get 
environmental information&lt;/h3&gt;
+
+&lt;blockquote&gt;
+  &lt;p&gt;If you are using RDS or other metadata storage, you may skip this 
step.&lt;/p&gt;
+&lt;/blockquote&gt;
+
+&lt;p&gt;RDBMS is recommended for metastore in Kylin 4. So for testing 
purposes, in this article, we use MariaDB which comes with the Master node for 
metastore; for hostname, account and password of MariaDB, see &lt;code 
class=&quot;highlighter-rouge&quot;&gt;/etc/hive/conf/hive-site.xml&lt;/code&gt;.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;kylin.metadata.url&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;kylin4_on_cloud@jdbc,url&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;jdbc:mysql://&lt;span 
class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span 
class=&quot;nv&quot;&gt;HOSTNAME&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;}&lt;/span&gt;:3306/hue,username&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;hive,password&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span 
class=&quot;nv&quot;&gt;PASSWORD&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;}&lt;/span&gt;,maxActive&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;10,maxIdle&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;10,driverClassName&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;org.mariadb.jdbc.Driver  
+kylin.env.zookeeper-connect-string&lt;span 
class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span 
class=&quot;nv&quot;&gt;HOSTNAME&lt;/span&gt;&lt;span 
class=&quot;k&quot;&gt;}&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Configure the variables as per the actual information, for example, 
replace  ${PASSWORD} with the real password, save it locally and it will be 
used to start Kylin.&lt;/p&gt;
+
+&lt;h3 
id=&quot;test-the-connectivity-between-spark-sql-and-aws-glue&quot;&gt;Test the 
connectivity between Spark SQL and AWS Glue&lt;/h3&gt;
+
+&lt;p&gt;Test whether AWS Spark SQL can access databases and table metadata 
through AWS Glue with Spark-SQL. For the first test, you will find that the 
startup fails with an error.&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/5_test_sparksql_glue_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Replace &lt;code 
class=&quot;highlighter-rouge&quot;&gt;hive-site.xml&lt;/code&gt; used by Spark 
with the following commands.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; /etc/spark/conf
+sudo mv hive-site.xml hive-site.xml.bak
+sudo cp /etc/hive/conf/hive-site.xml .
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Then change the value of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;hive.execution.engine&lt;/code&gt; in 
file &lt;code 
class=&quot;highlighter-rouge&quot;&gt;/etc/spark/conf/hive-site.xml&lt;/code&gt;
 to &lt;code class=&quot;highlighter-rouge&quot;&gt;mr&lt;/code&gt;, restart 
Spark-SQL CLI and verify whether the query for AWS Glueâs table data is 
successful.&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/6_test_sparksql_glue_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/7_test_sparksql_glue_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;optional-prepare-kylin-spark-enginejar&quot;&gt;(Optional) 
Prepare kylin-spark-engine.jar&lt;/h3&gt;
+
+&lt;blockquote&gt;
+  &lt;p&gt;This issue will be fixed in Apache Kylin 4.0.2. So you can skip 
this step after updating to Apache Kylin 4.0.2. For users with Kylin 4.0.1, 
please refer to the following steps to replace kylin-spark-engine.jar:&lt;/p&gt;
+&lt;/blockquote&gt;
+
+&lt;p&gt;Clone Kylin git repository, execute &lt;code 
class=&quot;highlighter-rouge&quot;&gt;mvn clean package 
-DskipTests&lt;/code&gt; to build a new &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin-spark-project/kylin-spark-engine/target/kylin-spark-engine-4.0.0-SNAPSHOT.jar&lt;/code&gt;
 .&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;git clone 
https://github.com/hit-lacus/kylin.git
+&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;kylin
+git checkout KYLIN-5160
+mvn clean package -DskipTests
+
+&lt;span class=&quot;c&quot;&gt;# find -name 
kylin-spark-engine-4.0.0-SNAPSHOT.jar 
kylin-spark-project/kylin-spark-engine/target&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Patch link: &lt;a 
href=&quot;https://github.com/apache/kylin/pull/1819&quot;&gt;https://github.com/apache/kylin/pull/1819&lt;/a&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;deploy-kylin-and-connect-to-aws-glue&quot;&gt;Deploy Kylin and 
connect to AWS Glue&lt;/h2&gt;
+
+&lt;h3 id=&quot;download-kylin&quot;&gt;Download Kylin&lt;/h3&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;
+    &lt;p&gt;Download and decompress Kylin. Please download the corresponding 
Kylin package according to your EMR version. That is, with EMR 5.X you can 
download Spark 2 package; with EMR 6.X you can download Spark 3 package.&lt;br 
/&gt;
+ &lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+ # aws s3 cp s3://${BUCKET}/apache-kylin-4.0.1-bin-spark3.tar.gz .
+ # wget apache-kylin-4.0.1-bin-spark3.tar.gz
+ tar zxvf apache-kylin-4.0.1-bin-spark3.tar.gz .
+ cd apache-kylin-4.0.1-bin-spark3
+ export KYLIN_HOME=/home/hadoop/apache-kylin-4.0.1-bin-spark3
+&lt;/code&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;(Optional) Get MariaDB driver jar&lt;br /&gt;
+ &amp;gt; If you are using other databases for metastore, please skip this 
step.&lt;/p&gt;
+
+    &lt;p&gt;&lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+ cd $KYLIN_HOME
+ mkdir ext
+ cp /usr/lib/hive/lib/mariadb-connector-java.jar $KYLIN_HOME/ext
+&lt;/code&gt;&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;h3 id=&quot;prepare-spark&quot;&gt;Prepare Spark&lt;/h3&gt;
+
+&lt;p&gt;AWS Spark has built-in support of AWS Glue, so you will use AWS Spark 
when loading table metadata and building jobs. Kylin 4.0.1 supports Apache 
Spark officially. Because the compatibility between Apache Spark and AWS Spark 
is not very good, we will use Apache Spark for cube queries. To sum up, you 
need to switch between AWS Spark and Apache Spark according to your task (query 
task or build task).&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Prepare AWS Spark&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+mkdir ext
+cp /usr/lib/hive/lib/mariadb-connector-java.jar &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;/ext
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Download Apache Spark
+    &lt;ul&gt;
+      &lt;li&gt;Please download the corresponding Spark installation package 
according to your EMR version. That is, with EMR 5.X you can download Spark 
2.4.7 and with EMR 6.X you can download Spark 3.1.2.&lt;br /&gt;
+&lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+cd $KYLIN_HOME
+aws s3 cp s3://${BUCKET}/spark-2.4.7-bin-hadoop2.7.tgz $KYLIN_HOME # Or 
downloads spark-2.4.7-bin-hadoop2.7.tgz from offical website
+tar zxvf spark-2.4.7-bin-hadoop2.7.tgz
+mv spark-2.4.7-bin-hadoop2.7 spark-apache
+&lt;/code&gt;&lt;/li&gt;
+    &lt;/ul&gt;
+  &lt;/li&gt;
+  &lt;li&gt;First, you need to load AWS Glue table, so direct &lt;code 
class=&quot;highlighter-rouge&quot;&gt;$KYLIN_HOME/spark&lt;/code&gt; to AWS 
Spark with soft link. Note: you do not need to set up &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SPARK_HOME&lt;/code&gt;, because if 
&lt;code class=&quot;highlighter-rouge&quot;&gt;$KYLIN_HOME/spark&lt;/code&gt; 
exists and &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SPARK_HOME&lt;/code&gt; is not set up, 
Kylin will use &lt;code 
class=&quot;highlighter-rouge&quot;&gt;$KYLIN_HOME/spark&lt;/code&gt; as 
&lt;code class=&quot;highlighter-rouge&quot;&gt;SPARK_HOME&lt;/code&gt; by 
default.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;ln -s spark-aws spark
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h3 id=&quot;modify-kylin-startup-script&quot;&gt;Modify Kylin startup 
script&lt;/h3&gt;
+
+&lt;ol&gt;
+  &lt;li&gt;Start Spark SQL CLI and keep it in running status.&lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Acquire PID of &lt;code 
class=&quot;highlighter-rouge&quot;&gt;SparkSQLCLIDriver&lt;/code&gt; with 
&lt;code class=&quot;highlighter-rouge&quot;&gt;jps -ml ${PID}&lt;/code&gt;. 
Then acquire &lt;code 
class=&quot;highlighter-rouge&quot;&gt;spark.driver.extraClasspath&lt;/code&gt; 
of &lt;strong&gt;Driver&lt;/strong&gt;. Or, you can acquire these from 
/etc/spark/conf/spark-defaults.conf.&lt;br /&gt;
+ &lt;code class=&quot;highlighter-rouge&quot;&gt;shell
+ jps -ml | grep SparkSubmit
+ jinfo ${PID} | grep &quot;spark.driver.extraClassPath&quot;
+&lt;/code&gt;&lt;br /&gt;
+ &lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/8_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;Edit &lt;code 
class=&quot;highlighter-rouge&quot;&gt;bin/kylin.sh&lt;/code&gt;, modify 
&lt;code 
class=&quot;highlighter-rouge&quot;&gt;KYLIN_TOMCAT_CLASSPATH&lt;/code&gt;  and 
add &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin_driver_classpath&lt;/code&gt;; 
save bin/kylin.sh, then exit Spark SQL CLI.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;kylin.sh before modifying&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/9_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;For EMR 6.5.0, in the modified &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin.sh&lt;/code&gt;, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin_driver_classpath&lt;/code&gt; is 
at the end of the code.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/10_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;For EMR 5.33.1, in the modified &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin.sh&lt;/code&gt;, &lt;code 
class=&quot;highlighter-rouge&quot;&gt;kylin_driver_classpath&lt;/code&gt; is 
placed before &lt;code 
class=&quot;highlighter-rouge&quot;&gt;$SPARK_HOME/jars&lt;/code&gt;.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/11_kylin_start_up_script_en.png&quot;
 alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;configure-kylin&quot;&gt;Configure Kylin&lt;/h3&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+vim conf/kylin.properties 
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h4 id=&quot;minimal-kylin-configuration&quot;&gt;Minimal Kylin 
Configuration&lt;/h4&gt;
+
+&lt;table&gt;
+  &lt;thead&gt;
+    &lt;tr&gt;
+      &lt;th&gt;Property Key&lt;/th&gt;
+      &lt;th&gt;Property Value(Example)&lt;/th&gt;
+      &lt;th&gt;Notes&lt;/th&gt;
+    &lt;/tr&gt;
+  &lt;/thead&gt;
+  &lt;tbody&gt;
+    &lt;tr&gt;
+      &lt;td&gt;kylin.metadata.url&lt;/td&gt;
+      
&lt;td&gt;kylin4_on_cloud@jdbc,url=jdbc:mysql://${HOSTNAME}:3306/hue,username=hive,password=${PASSWORD},maxActive=10,maxIdle=10,driverClassName=org.mariadb.jdbc.Driver&lt;/td&gt;
+      &lt;td&gt;N/A&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;kylin.env.zookeeper-connect-string&lt;/td&gt;
+      &lt;td&gt;${HOSTNAME}&lt;/td&gt;
+      &lt;td&gt;N/A&lt;/td&gt;
+    &lt;/tr&gt;
+    &lt;tr&gt;
+      &lt;td&gt;kylin.engine.spark-conf.spark.driver.extraClassPath&lt;/td&gt;
+      
&lt;td&gt;/usr/lib/hadoop-lzo/lib/&lt;em&gt;:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/&lt;/em&gt;:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar&lt;/td&gt;
+      &lt;td&gt;Copied from spark.driver.extraClasspath in 
/etc/spark/conf/spark-default.conf&lt;/td&gt;
+    &lt;/tr&gt;
+  &lt;/tbody&gt;
+&lt;/table&gt;
+
+&lt;h3 id=&quot;start-kylin-and-verify-the-building-job&quot;&gt;Start Kylin 
and verify the building job&lt;/h3&gt;
+
+&lt;h4 id=&quot;start-kylin&quot;&gt;Start Kylin&lt;/h4&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+ln -s spark spark_aws &lt;span class=&quot;c&quot;&gt;# skip this step if soft 
link &#39;spark&#39; exists &lt;/span&gt;
+bin/kylin.sh restart
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/12_start_kylin_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/13_start_kylin_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h4 id=&quot;optional-replace-kylin-spark-enginejar&quot;&gt;(Optional) 
Replace kylin-spark-engine.jar&lt;/h4&gt;
+
+&lt;blockquote&gt;
+  &lt;p&gt;This step is only required for Kylin 4.0.1 users.&lt;/p&gt;
+&lt;/blockquote&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;/tomcat/webapps/kylin/WEB-INF/lib/
+mv kylin-spark-engine-4.0.1.jar kylin-spark-engine-4.0.1.jar.bak &lt;span 
class=&quot;c&quot;&gt;# remove old one &lt;/span&gt;
+cp kylin-spark-engine-4.0.0-SNAPSHOT.jar  .
+
+bin/kylin.sh restart &lt;span class=&quot;c&quot;&gt;# restart kylin to make 
new jar be loaded&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;h4 id=&quot;load-aws-glue-table-and-build&quot;&gt;Load AWS Glue table and 
build&lt;/h4&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Load AWS Glue table metadata&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/14_load_glue_meta_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/15_load_glue_meta_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;Create Model and Cube, then trigger a building job.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/16_load_glue_meta_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h3 id=&quot;verify-the-query&quot;&gt;Verify the query&lt;/h3&gt;
+
+&lt;p&gt;Switch the Spark used by Kylin and restart Kylin.&lt;/p&gt;
+
+&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;pre 
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span 
class=&quot;nb&quot;&gt;cd&lt;/span&gt; &lt;span 
class=&quot;nv&quot;&gt;$KYLIN_HOME&lt;/span&gt;
+rm spark &lt;span class=&quot;c&quot;&gt;# &#39;spark&#39; is a soft link, it 
is point to aws spark&lt;/span&gt;
+ln -s spark_apache spark &lt;span class=&quot;c&quot;&gt;# switch from aws 
spark to apache spark&lt;/span&gt;
+bin/kylin.sh restart
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;Perform a test query and this query is successful.&lt;/p&gt;
+
+&lt;p&gt;&lt;img 
src=&quot;/images/blog/kylin4_support_aws_glue/17_verify_query_en.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;discussion-and-qa&quot;&gt;Discussion and 
Q&amp;amp;A&lt;/h2&gt;
+
+&lt;h3 id=&quot;why-we-must-use-both-aws-spark-and-apache-spark&quot;&gt;Why 
we must use both AWS Spark and Apache Sparkï¼&lt;/h3&gt;
+
+&lt;p&gt;AWS Spark has built-in support for AWS Glue so you will use AWS Spark 
when loading table metadata and building jobs;  Kylin 4.0.1 supports Apache 
Spark.  Because the compatibility between Apache Spark and AWS Spark is not 
very good, we will use Apache Spark for cube query. To sum up, you need to 
switch between AWS Spark and Apache Spark according to your task (query task or 
build task).&lt;/p&gt;
+
+&lt;h3 id=&quot;why-do-users-need-to-modify-kylinsh&quot;&gt;Why do users need 
to modify kylin.sh?&lt;/h3&gt;
+
+&lt;p&gt;As Spark Driver, Kylin needs to load table metadata through &lt;code 
class=&quot;highlighter-rouge&quot;&gt;aws-glue-datacatalog-spark-client.jar&lt;/code&gt;,
 so you need to modify kylin.sh and load the relevant jar into classpath of 
Kylin process.&lt;/p&gt;
+
+&lt;h3 id=&quot;if-i-faced-more-questions-where-should-i-asked&quot;&gt;If I 
faced more questions, where should I asked?&lt;/h3&gt;
+
+&lt;p&gt;If you have any questions about using Kylin on AWS, please contact us 
via mailling list(&lt;a 
href=&quot;&amp;#109;&amp;#097;&amp;#105;&amp;#108;&amp;#116;&amp;#111;:&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&quot;&gt;&amp;#117;&amp;#115;&amp;#101;&amp;#114;&amp;#064;&amp;#107;&amp;#121;&amp;#108;&amp;#105;&amp;#110;&amp;#046;&amp;#097;&amp;#112;&amp;#097;&amp;#099;&amp;#104;&amp;#101;&amp;#046;&amp;#111;&amp;#114;&amp;#103;&lt;/a&gt;),
 please check for detail &lt;a 
href=&quot;https://kylin.apache.org/community/&quot;&gt;https://kylin.apache.org/community/&lt;/a&gt;
 .&lt;/p&gt;
+</description>
+        <pubDate>Thu, 17 Mar 2022 04:00:00 -0700</pubDate>
+        
<link>http://kylin.apache.org/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>The future of Apache Kylinï¼More powerful and easy-to-use 
OLAP</title>
         <description>&lt;h2 id=&quot;apache-kylin-today&quot;&gt;01 Apache 
Kylin Today&lt;/h2&gt;
 
@@ -287,6 +1015,137 @@ If users use cloud object storage as Kyl
       </item>
     
       <item>
+        <title>How Meituan Dominates Online Shopping with Apache Kylin</title>
+        <description>&lt;p&gt;Letâs face it, online shopping now affects 
nearly every part of our shopping lives. From ordering groceries to &lt;a 
href=&quot;https://www.carvana.com/&quot;&gt;purchasing a car&lt;/a&gt;, 
weâre living in an age of limitless choices when it comes to online commerce. 
Nowhere is this more the case than with the worldâs 2nd largest consumer 
market: China.&lt;/p&gt;
+
+&lt;p&gt;Leading the online shopping revolution in China is Meituan, who since 
2016 has grown to support nearly 460 million consumers from over 2,000 
industries, regularly processing hundreds of $billions in transactions. To 
support these staggering operations, Meituan has invested heavily in its data 
analytics system and employs more than 10,000 engineers to ensure a stable and 
reliable experience for their customers.&lt;/p&gt;
+
+&lt;p&gt;But the driving force behind Meituanâs success is not simply a 
robust analytics system. While the organizationâs executives might think so, 
its engineers understand that it is the OLAP engine that system is built upon 
that has empowered the company to move quickly and win in the market.&lt;/p&gt;
+
+&lt;h2 
id=&quot;meituans-secret-weapon-apache-kylin&quot;&gt;&lt;strong&gt;Meituanâs 
Secret Weapon: Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Since 2016, Meituanâs technical team has relied on&lt;a 
href=&quot;https://kyligence.io/apache-kylin-overview/&quot;&gt; Apache 
Kylin&lt;/a&gt; to power their&lt;a 
href=&quot;https://kyligence.io/resources/extreme-olap-with-apache-kylin/&quot;&gt;
 OLAP engine&lt;/a&gt;. Apache Kylin, an open source OLAP engine built on the 
Hadoop platform, resolves complex queries at sub-second speeds through 
multidimensional precomputation, allowing for blazing-fast analysis on even the 
largest datasets.&lt;/p&gt;
+
+&lt;p&gt;However, the limitations of this open source solution became apparent 
as the companyâs business grew, becoming less and less efficient as cubes and 
queries became larger and more complex. To solve this problem, the engineering 
team leveraged Kylinâs open source foundations to dig into the engine, 
understand its underlying principles, and develop an implementation strategy 
that other organizations using Kylin can adopt to greatly improve their data 
output efficiency.&lt;/p&gt;
+
+&lt;p&gt;Meituanâs technical team has graciously shared their story of this 
process below so that you can apply it toward solving your own big data 
challenges.&lt;/p&gt;
+
+&lt;h2 
id=&quot;a-global-pandemic-and-a-new-normal-for-business&quot;&gt;&lt;strong&gt;A
 Global Pandemic and a New Normal for Business&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;For the last four years, Meituanâs Qingtian sales system has served 
as the companyâs data processing workhorse, handling massive amounts of daily 
sales data involving a wide range of highly complex technical scenarios. The 
stability and efficiency of this system is paramount, and itâs why 
Meituanâs engineers have made significant investments in optimizing the OLAP 
engine Qingtian is built upon.&lt;/p&gt;
+
+&lt;p&gt;After a thorough investigation, the team identified Apache Kylin as 
the only OLAP engine that could meet their needs and scale with anticipated 
growth. The engine was rolled out in 2016 and, over the next few years, Kylin 
played an important role in the companyâs evolving data analytics 
system.&lt;/p&gt;
+
+&lt;p&gt;Growth expectations, however, turned out to be severely 
underestimated, as a global pandemic quickly drove major changes in how 
consumers shopped and how businesses sold their goods. Such a massive shift in 
online shopping led to even faster growth for Meituan as well as a nearly 
untenable amount of new business data.&lt;/p&gt;
+
+&lt;p&gt;This caused efficiency bottlenecks that even their Kylin-based system 
started to struggle with. Cube building and query performance was unable to 
keep up with these changes in consumer behaviors, slowing down data analysis 
and decision-making and creating a major obstacle towards addressing user 
experiences.&lt;/p&gt;
+
+&lt;p&gt;Meituanâs technical team would spend the next six months carrying 
out optimizations and iterations for Kylin, including dimension pruning, model 
design, resource adaptation, and improving SLA compliance.&lt;/p&gt;
+
+&lt;h2 
id=&quot;responding-to-new-consumer-behaviors-with-apache-kylin&quot;&gt;&lt;strong&gt;Responding
 to New Consumer Behaviors with Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;In order to understand the approach taken when optimizing Meituanâs 
data architecture, itâs important to understand how the business is managed. 
The companyâs sales force operates with two business models â in-store 
sales and phone sales â and is then further broken down by various 
territories and corporate departments. All analytics data must be communicated 
across both business models.&lt;/p&gt;
+
+&lt;p&gt;With this in mind, Meituan engineers incorporated Kylin into their 
design of the data architecture as follows:&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-01.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 3. Apache Kylinâs layer-by-layer building data flow&lt;/p&gt;
+
+&lt;p&gt;While this design addressed many of Meituanâs initial concerns 
around scalability and efficiency, continued shifts in consumer behaviors and 
the organizationâs response to dramatic changes in the market put enormous 
pressure on Kylin when it came to building cubes. This lead to an unsustainable 
level of consumption of both resources and time.&lt;/p&gt;
+
+&lt;p&gt;It became clear that Kylinâs MOLAP model was presenting the 
following challenges:&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;The build process involved many steps that were highly correlated, 
making it difficult to root cause problems.&lt;/li&gt;
+  &lt;li&gt;MapReduce - instead of the more efficient Spark - was still being 
used as the build engine for historical tasks.&lt;/li&gt;
+  &lt;li&gt;The platformâs default dynamic resource adaption method demanded 
considerable resources for small tasks. Data was sharded unnecessarily and a 
large number of small files were generated, resulting in a waste of 
resources.&lt;/li&gt;
+  &lt;li&gt;Data volumes Meituan was now having to work with were well beyond 
the original architectural plan, resulting in two hours of cube building every 
day.&lt;/li&gt;
+  &lt;li&gt;The overall SLA fulfillment rate remained lower than 
expected.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Recognizing these problems, the team set a goal of improving the 
platformâs efficiency (you can see the quantitative targets below). Finding a 
solution would involve classifying Kylinâs build process, digging into how 
Kylin worked under the hood, breaking down that process, and finally 
implementing a solution.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-02.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 4. Implementation path diagram&lt;/p&gt;
+
+&lt;h2 
id=&quot;optimization-understanding-how-apache-kylin-builds-cubes&quot;&gt;&lt;strong&gt;Optimization:
 Understanding How Apache Kylin Builds Cubes&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Understanding the cube building process is critical for pinpointing 
efficiency and performance issues. In the case of Kylin, a solid grasp of its 
precomputation approach and its âby layerâ cubing algorithm are necessary 
when formulating a solution.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Precomputation with Apache 
Kylin&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;Apache Kylin generates all possible dimensional combinations and 
pre-calculates the metrics that may be used in future multidimensional 
analysis, saving the results as a cube. Metric aggregation results are saved on 
&lt;em&gt;cuboids&lt;/em&gt; (a logical branch of the cube), and during queries 
relevant cuboids are found through SQL statements, and then read and quickly 
returned as metric values.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-03.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 5. Precomputation across four dimensions example&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Apache Kylinâs By-Layer Cubing 
Algorithm&lt;/strong&gt;&lt;/p&gt;
+
+&lt;p&gt;An N-dimensional cube is composed of 1 N-dimensional sub-cube, N 
(N-1)-dimensional sub-cubes, N*(N-1)/2 (N-2)-dimensional sub-cubes, â¦, N 
1-dimensional sub-cubes, and one 0-dimensional sub-cube, consisting of a total 
of 2^N sub-cubes. In Kylinâs by-layer cubing algorithm, the number of 
dimensions decreases with the calculation of each layer, and each layerâs 
calculation is based on the calculation result of its parent layer (except the 
first layer, which bases it on the source data).&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-04.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 6. Cuboid example&lt;/p&gt;
+
+&lt;h2 id=&quot;the-proof-is-in-the-process&quot;&gt;&lt;strong&gt;The Proof 
Is in the Process&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Understanding the principles outlined above, the Meituan team 
identified five key areas to focus on for optimization: engine selection, data 
reading, dictionary building, layer-by-layer build, and file conversion. 
Addressing these areas would lead to the greatest gains in reducing the 
required resources for calculation and shortening processing time.&lt;/p&gt;
+
+&lt;p&gt;The team outlined the challenges, their solutions, and key objectives 
in the following table:&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-05.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 7. Breakdown of Apache Kylinâs process&lt;/p&gt;
+
+&lt;h2 
id=&quot;putting-apache-kylin-to-the-test&quot;&gt;&lt;strong&gt;Putting Apache 
Kylin to the Test&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;With their solutions in place, the next step was to test if Kylinâs 
build process had actually improved. To do this, the team selected a set of 
critical sales tasks and ran a pilot (outlined below):&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-06.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 8. Meituanâs pilot program for their Apache Kylin 
optimizations&lt;/p&gt;
+
+&lt;p&gt;The results of the pilot were astonishing. Ultimately, the team was 
able to realize a significant reduction in resource consumption as seen in the 
following chart:&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-07.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 9. Resource usage and performance of Apache Kylin before and 
after pilot&lt;/p&gt;
+
+&lt;h2 id=&quot;analytics-optimized&quot;&gt;&lt;strong&gt;Analytics 
Optimize&lt;/strong&gt;d&lt;/h2&gt;
+
+&lt;p&gt;Today, Meituanâs Qingtian system is processing over 20 different 
Kylin tasks, and after six months of constant optimization, the monthly CU 
usage for Kylinâs resource queue and the CU usage for pending tasks have seen 
significant reductions.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-08.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 10. Current performance of Apache Kylin after solution 
implementation&lt;/p&gt;
+
+&lt;p&gt;Resource usage isnât the only area of impressive improvement. The 
Qingtian systemâs SLA compliance also was able to reach 100% as of June 
2020.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-09.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Figure 11. Meituan SLA compliance after Apache Kylin 
optimization&lt;/p&gt;
+
+&lt;h2 
id=&quot;taking-on-the-future-with-apache-kylin&quot;&gt;&lt;strong&gt;Taking 
on the Future with Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
+
+&lt;p&gt;Over the past four years, Meituanâs technical team has accumulated 
a great deal of experience in optimizing query performance and build efficiency 
with Apache Kylin. But Meituanâs success is also the story of open sourceâs 
success.&lt;/p&gt;
+
+&lt;p&gt;The&lt;a href=&quot;http://kylin.apache.org/community/&quot;&gt; 
Apache Kylin community&lt;/a&gt; has many active and outstanding code 
contributors (&lt;a 
href=&quot;https://kyligence.io/comparing-kylin-vs-kyligence/&quot;&gt;including
 Kyligence&lt;/a&gt;), who are relentlessly working to expand the Kylin 
ecosystem and add more new features. Itâs in sharing success stories like 
this that Apache Kylin is able to remain the leading open source solution for 
analytics on massive datasets.&lt;/p&gt;
+
+&lt;p&gt;Together, with the entire Apache Kylin community, Meituan is making 
sure critical analytics work can remain unburdened by growing datasets, and 
that when the next major shift in business takes place, industry leaders like 
Meituan will be able to analyze whatâs happening and quickly take 
action.&lt;/p&gt;
+</description>
+        <pubDate>Tue, 03 Aug 2021 08:00:00 -0700</pubDate>
+        
<link>http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</link>
+        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</guid>
+        
+        
+        <category>blog</category>
+        
+      </item>
+    
+      <item>
         <title>Kylin å¨ç¾å¢å°åºé¤é¥®çå®è·µåä¼å</title>
         
<description>&lt;p&gt;ä»2016å¹´å¼å§ï¼ç¾å¢å°åºé¤é¥®ææ¯å¢éå°±å¼å§ä½¿ç¨Apache
 
Kylinä½ä¸ºOLAPå¼æï¼ä½æ¯éçä¸å¡çé«éåå±ï¼å¨æå»ºåæ¥è¯¢å±é¢é½åºç°äºæçé®é¢ãäºæ¯ï¼ææ¯å¢éä»åçè§£è¯»å¼å§ï¼ç¶åå¯¹è¿ç¨è¿è¡å±å±æè§£ï¼å¹¶å¶å®äºç±ç¹åé¢çå®æ½è·¯çº¿ãæ¬ææ»ç»äºä¸äºç»éªåå¿å¾ï¼å¸æè½å¤å¸®å©ä¸çæ´å¤çææ¯å¢éæé«æ°æ®çäº§åºæçã&lt;/p&gt;
 
@@ -516,137 +1375,6 @@ If users use cloud object storage as Kyl
       </item>
     
       <item>
-        <title>How Meituan Dominates Online Shopping with Apache Kylin</title>
-        <description>&lt;p&gt;Letâs face it, online shopping now affects 
nearly every part of our shopping lives. From ordering groceries to &lt;a 
href=&quot;https://www.carvana.com/&quot;&gt;purchasing a car&lt;/a&gt;, 
weâre living in an age of limitless choices when it comes to online commerce. 
Nowhere is this more the case than with the worldâs 2nd largest consumer 
market: China.&lt;/p&gt;
-
-&lt;p&gt;Leading the online shopping revolution in China is Meituan, who since 
2016 has grown to support nearly 460 million consumers from over 2,000 
industries, regularly processing hundreds of $billions in transactions. To 
support these staggering operations, Meituan has invested heavily in its data 
analytics system and employs more than 10,000 engineers to ensure a stable and 
reliable experience for their customers.&lt;/p&gt;
-
-&lt;p&gt;But the driving force behind Meituanâs success is not simply a 
robust analytics system. While the organizationâs executives might think so, 
its engineers understand that it is the OLAP engine that system is built upon 
that has empowered the company to move quickly and win in the market.&lt;/p&gt;
-
-&lt;h2 
id=&quot;meituans-secret-weapon-apache-kylin&quot;&gt;&lt;strong&gt;Meituanâs 
Secret Weapon: Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;Since 2016, Meituanâs technical team has relied on&lt;a 
href=&quot;https://kyligence.io/apache-kylin-overview/&quot;&gt; Apache 
Kylin&lt;/a&gt; to power their&lt;a 
href=&quot;https://kyligence.io/resources/extreme-olap-with-apache-kylin/&quot;&gt;
 OLAP engine&lt;/a&gt;. Apache Kylin, an open source OLAP engine built on the 
Hadoop platform, resolves complex queries at sub-second speeds through 
multidimensional precomputation, allowing for blazing-fast analysis on even the 
largest datasets.&lt;/p&gt;
-
-&lt;p&gt;However, the limitations of this open source solution became apparent 
as the companyâs business grew, becoming less and less efficient as cubes and 
queries became larger and more complex. To solve this problem, the engineering 
team leveraged Kylinâs open source foundations to dig into the engine, 
understand its underlying principles, and develop an implementation strategy 
that other organizations using Kylin can adopt to greatly improve their data 
output efficiency.&lt;/p&gt;
-
-&lt;p&gt;Meituanâs technical team has graciously shared their story of this 
process below so that you can apply it toward solving your own big data 
challenges.&lt;/p&gt;
-
-&lt;h2 
id=&quot;a-global-pandemic-and-a-new-normal-for-business&quot;&gt;&lt;strong&gt;A
 Global Pandemic and a New Normal for Business&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;For the last four years, Meituanâs Qingtian sales system has served 
as the companyâs data processing workhorse, handling massive amounts of daily 
sales data involving a wide range of highly complex technical scenarios. The 
stability and efficiency of this system is paramount, and itâs why 
Meituanâs engineers have made significant investments in optimizing the OLAP 
engine Qingtian is built upon.&lt;/p&gt;
-
-&lt;p&gt;After a thorough investigation, the team identified Apache Kylin as 
the only OLAP engine that could meet their needs and scale with anticipated 
growth. The engine was rolled out in 2016 and, over the next few years, Kylin 
played an important role in the companyâs evolving data analytics 
system.&lt;/p&gt;
-
-&lt;p&gt;Growth expectations, however, turned out to be severely 
underestimated, as a global pandemic quickly drove major changes in how 
consumers shopped and how businesses sold their goods. Such a massive shift in 
online shopping led to even faster growth for Meituan as well as a nearly 
untenable amount of new business data.&lt;/p&gt;
-
-&lt;p&gt;This caused efficiency bottlenecks that even their Kylin-based system 
started to struggle with. Cube building and query performance was unable to 
keep up with these changes in consumer behaviors, slowing down data analysis 
and decision-making and creating a major obstacle towards addressing user 
experiences.&lt;/p&gt;
-
-&lt;p&gt;Meituanâs technical team would spend the next six months carrying 
out optimizations and iterations for Kylin, including dimension pruning, model 
design, resource adaptation, and improving SLA compliance.&lt;/p&gt;
-
-&lt;h2 
id=&quot;responding-to-new-consumer-behaviors-with-apache-kylin&quot;&gt;&lt;strong&gt;Responding
 to New Consumer Behaviors with Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;In order to understand the approach taken when optimizing Meituanâs 
data architecture, itâs important to understand how the business is managed. 
The companyâs sales force operates with two business models â in-store 
sales and phone sales â and is then further broken down by various 
territories and corporate departments. All analytics data must be communicated 
across both business models.&lt;/p&gt;
-
-&lt;p&gt;With this in mind, Meituan engineers incorporated Kylin into their 
design of the data architecture as follows:&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-01.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 3. Apache Kylinâs layer-by-layer building data flow&lt;/p&gt;
-
-&lt;p&gt;While this design addressed many of Meituanâs initial concerns 
around scalability and efficiency, continued shifts in consumer behaviors and 
the organizationâs response to dramatic changes in the market put enormous 
pressure on Kylin when it came to building cubes. This lead to an unsustainable 
level of consumption of both resources and time.&lt;/p&gt;
-
-&lt;p&gt;It became clear that Kylinâs MOLAP model was presenting the 
following challenges:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;The build process involved many steps that were highly correlated, 
making it difficult to root cause problems.&lt;/li&gt;
-  &lt;li&gt;MapReduce - instead of the more efficient Spark - was still being 
used as the build engine for historical tasks.&lt;/li&gt;
-  &lt;li&gt;The platformâs default dynamic resource adaption method demanded 
considerable resources for small tasks. Data was sharded unnecessarily and a 
large number of small files were generated, resulting in a waste of 
resources.&lt;/li&gt;
-  &lt;li&gt;Data volumes Meituan was now having to work with were well beyond 
the original architectural plan, resulting in two hours of cube building every 
day.&lt;/li&gt;
-  &lt;li&gt;The overall SLA fulfillment rate remained lower than 
expected.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;Recognizing these problems, the team set a goal of improving the 
platformâs efficiency (you can see the quantitative targets below). Finding a 
solution would involve classifying Kylinâs build process, digging into how 
Kylin worked under the hood, breaking down that process, and finally 
implementing a solution.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-02.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 4. Implementation path diagram&lt;/p&gt;
-
-&lt;h2 
id=&quot;optimization-understanding-how-apache-kylin-builds-cubes&quot;&gt;&lt;strong&gt;Optimization:
 Understanding How Apache Kylin Builds Cubes&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;Understanding the cube building process is critical for pinpointing 
efficiency and performance issues. In the case of Kylin, a solid grasp of its 
precomputation approach and its âby layerâ cubing algorithm are necessary 
when formulating a solution.&lt;/p&gt;
-
-&lt;p&gt;&lt;strong&gt;Precomputation with Apache 
Kylin&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;Apache Kylin generates all possible dimensional combinations and 
pre-calculates the metrics that may be used in future multidimensional 
analysis, saving the results as a cube. Metric aggregation results are saved on 
&lt;em&gt;cuboids&lt;/em&gt; (a logical branch of the cube), and during queries 
relevant cuboids are found through SQL statements, and then read and quickly 
returned as metric values.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-03.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 5. Precomputation across four dimensions example&lt;/p&gt;
-
-&lt;p&gt;&lt;strong&gt;Apache Kylinâs By-Layer Cubing 
Algorithm&lt;/strong&gt;&lt;/p&gt;
-
-&lt;p&gt;An N-dimensional cube is composed of 1 N-dimensional sub-cube, N 
(N-1)-dimensional sub-cubes, N*(N-1)/2 (N-2)-dimensional sub-cubes, â¦, N 
1-dimensional sub-cubes, and one 0-dimensional sub-cube, consisting of a total 
of 2^N sub-cubes. In Kylinâs by-layer cubing algorithm, the number of 
dimensions decreases with the calculation of each layer, and each layerâs 
calculation is based on the calculation result of its parent layer (except the 
first layer, which bases it on the source data).&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-04.png&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 6. Cuboid example&lt;/p&gt;
-
-&lt;h2 id=&quot;the-proof-is-in-the-process&quot;&gt;&lt;strong&gt;The Proof 
Is in the Process&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;Understanding the principles outlined above, the Meituan team 
identified five key areas to focus on for optimization: engine selection, data 
reading, dictionary building, layer-by-layer build, and file conversion. 
Addressing these areas would lead to the greatest gains in reducing the 
required resources for calculation and shortening processing time.&lt;/p&gt;
-
-&lt;p&gt;The team outlined the challenges, their solutions, and key objectives 
in the following table:&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-05.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 7. Breakdown of Apache Kylinâs process&lt;/p&gt;
-
-&lt;h2 
id=&quot;putting-apache-kylin-to-the-test&quot;&gt;&lt;strong&gt;Putting Apache 
Kylin to the Test&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;With their solutions in place, the next step was to test if Kylinâs 
build process had actually improved. To do this, the team selected a set of 
critical sales tasks and ran a pilot (outlined below):&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-06.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 8. Meituanâs pilot program for their Apache Kylin 
optimizations&lt;/p&gt;
-
-&lt;p&gt;The results of the pilot were astonishing. Ultimately, the team was 
able to realize a significant reduction in resource consumption as seen in the 
following chart:&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-07.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 9. Resource usage and performance of Apache Kylin before and 
after pilot&lt;/p&gt;
-
-&lt;h2 id=&quot;analytics-optimized&quot;&gt;&lt;strong&gt;Analytics 
Optimize&lt;/strong&gt;d&lt;/h2&gt;
-
-&lt;p&gt;Today, Meituanâs Qingtian system is processing over 20 different 
Kylin tasks, and after six months of constant optimization, the monthly CU 
usage for Kylinâs resource queue and the CU usage for pending tasks have seen 
significant reductions.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-08.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 10. Current performance of Apache Kylin after solution 
implementation&lt;/p&gt;
-
-&lt;p&gt;Resource usage isnât the only area of impressive improvement. The 
Qingtian systemâs SLA compliance also was able to reach 100% as of June 
2020.&lt;/p&gt;
-
-&lt;p&gt;&lt;img src=&quot;/images/blog/meituan/chart-09.jpeg&quot; 
alt=&quot;&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Figure 11. Meituan SLA compliance after Apache Kylin 
optimization&lt;/p&gt;
-
-&lt;h2 
id=&quot;taking-on-the-future-with-apache-kylin&quot;&gt;&lt;strong&gt;Taking 
on the Future with Apache Kylin&lt;/strong&gt;&lt;/h2&gt;
-
-&lt;p&gt;Over the past four years, Meituanâs technical team has accumulated 
a great deal of experience in optimizing query performance and build efficiency 
with Apache Kylin. But Meituanâs success is also the story of open sourceâs 
success.&lt;/p&gt;
-
-&lt;p&gt;The&lt;a href=&quot;http://kylin.apache.org/community/&quot;&gt; 
Apache Kylin community&lt;/a&gt; has many active and outstanding code 
contributors (&lt;a 
href=&quot;https://kyligence.io/comparing-kylin-vs-kyligence/&quot;&gt;including
 Kyligence&lt;/a&gt;), who are relentlessly working to expand the Kylin 
ecosystem and add more new features. Itâs in sharing success stories like 
this that Apache Kylin is able to remain the leading open source solution for 
analytics on massive datasets.&lt;/p&gt;
-
-&lt;p&gt;Together, with the entire Apache Kylin community, Meituan is making 
sure critical analytics work can remain unburdened by growing datasets, and 
that when the next major shift in business takes place, industry leaders like 
Meituan will be able to analyze whatâs happening and quickly take 
action.&lt;/p&gt;
-</description>
-        <pubDate>Tue, 03 Aug 2021 08:00:00 -0700</pubDate>
-        
<link>http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</link>
-        <guid 
isPermaLink="true">http://kylin.apache.org/blog/2021/08/03/How-Meituan-Dominates-Online-Shopping-with-Apache-Kylin/</guid>
-        
-        
-        <category>blog</category>
-        
-      </item>
-    
-      <item>
         <title>Apache kylin4 æ°æ¶æåäº«</title>
         <description>&lt;p&gt;è¿ç¯æç« ä¸»è¦åä¸ºä»¥ä¸å 
ä¸ªé¨åï¼&lt;br /&gt;
 - Apache Kylin ä½¿ç¨åºæ¯&lt;br /&gt;
@@ -836,314 +1564,6 @@ For example, a query joins two subquerie
         
         
         <category>blog</category>
-        
-      </item>
-    
-      <item>
-        <title>æèµä¸ºä»ä¹éæ© Kylin4</title>
-        <description>&lt;p&gt;å¨ 2021å¹´5æ29æ¥ä¸¾åç QCon å
¨çè½¯ä»¶å¼åèå¤§ä¼ä¸ï¼æ¥èªæèµçæ°æ®åºç¡å¹³å°è´è´£äºº 
éçä¿ å¨å¤§æ°æ®å¼æºæ¡æ¶ä¸åºç¨ä¸é¢ä¸åäº«äºæèµåé¨å¯¹ 
Kylin 4.0 çä½¿ç¨ç»ååä¼åå®è·µï¼å¯¹äºä¼å¤ Kylin 
èç¨æ·æ¥è¯´ï¼è¿ä¹æ¯åçº§ Kylin 4 çå®ç¨æ»ç¥ã&lt;/p&gt;
-
-&lt;p&gt;æ¬æ¬¡åäº«ä¸»è¦åä¸ºä»¥ä¸åä¸ªé¨åï¼&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;æèµéç¨ Kylin 4 çåå &lt;/li&gt;
-  &lt;li&gt;Kylin 4 åçä»ç»&lt;/li&gt;
-  &lt;li&gt;Kylin 4 æ§è½ä¼å&lt;/li&gt;
-  &lt;li&gt;Kylin 4 å¨æèµçå®è·µ&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;kylin-4-&quot;&gt;01 æèµéç¨ Kylin 4 çåå &lt;/h2&gt;
-&lt;p&gt;é¦ååäº«æèµä¸ºä»ä¹ä¼éæ©åçº§ä¸º Kylin 4ï¼è¿éå
ç®ååé¡¾ä¸ä¸æèµ OLAP 
çåå±åç¨ï¼æèµåæä¸ºäºå¿«éè¿ä»£ï¼éæ©äºé¢è®¡ç® + MySQL 
çæ¹å¼ï¼2018å¹´ï¼å ä¸ºæ¥è¯¢çµæ´»åå¼åæçå¼å¥äº 
Druidï¼ä½æ¯åå¨é¢èååº¦ä¸é«ãä¸æ¯æç²¾ç¡®å»éåæç» OLAP 
çé®é¢ï¼å¨è¿æ ·çèæ¯ä¸ï¼æèµå¼å
¥äºæ»¡è¶³èååº¦é«ãæ¯æç²¾ç¡®å»éå RT æä½ç Apache Kylin 
åæ¥è¯¢éå¸¸çµæ´»ç ROLAP ClickHouseã&lt;/p&gt;
-
-&lt;p&gt;ä»2018å¹´å¼å¥ Kylin å°ç°å¨ï¼æèµå·²ç»ä½¿ç¨ Kylin 
ä¸å¹´å¤äºãéçä¸å¡åºæ¯çä¸æä¸°å¯åæ°æ®éçä¸æç§¯ç´¯ï¼æèµç®åæ
 600 ä¸çåéåå®¶ï¼2020å¹´ GMV æ¯ 1073äº¿ï¼æ¥æå»ºéä¸º 100 
äº¿+ï¼ç®å Kylin 
å·²ç»åºæ¬è¦çäºæèµææçä¸å¡èå´ã&lt;/p&gt;
-
-&lt;p&gt;éçæèµèªèº«çè¿éåå±åä¸ææ·±å¥å°ä½¿ç¨ 
Kylinï¼æä»¬ä¹éå°ä¸äºææï¼&lt;br /&gt;
-- é¦å Kylin on HBase çæå»ºæ§è½æ 
æ³æ»¡è¶³æèµçé¢æï¼æå»ºæ§è½ä¼å½±åå°ç¨æ·çæ
éæ¢å¤æ¶é´åç¨³å®æ§çä½éªï¼&lt;br /&gt;
-- å
¶æ¬¡ï¼éçæ´å¤å¤§åå®¶ï¼ååºåä¸çº§å«ä¼åãæ°åä¸ååï¼çæ¥å
¥ï¼å¯¹æä»¬çæ¥è¯¢ä¹å¸¦æ¥äºå¾å¤§çææãKylin on HBase åéäº 
QueryServer åç¹æ¥è¯¢çå±éï¼æ 
æ³å¾å¥½å°æ¯æè¿äºå¤æçåºæ¯ï¼&lt;br /&gt;
-- æåï¼å ä¸º HBase 
ä¸æ¯ä¸ä¸ªäºåçç³»ç»ï¼å¾é¾åå°å¼¹æ§çèµæºä¼¸ç¼©ï¼éçæ°æ®éçä¸æå¢é¿ï¼è¿ä¸ªç³»ç»å¯¹äºåå®¶èè¨ï¼ä½¿ç¨æ¶é´æ¯åå¨é«å³°åä½è°·çï¼è¿å°±é
 æå¹³åçèµæºä½¿ç¨çä¸å¤é«ã&lt;/p&gt;
-
-&lt;p&gt;é¢å¯¹è¿äºææï¼æèµéæ©å»åæ´äºåçç Apache Kylin 4 
å»é æ¢ååçº§ã&lt;/p&gt;
-
-&lt;h2 id=&quot;kylin-4--1&quot;&gt;02 Kylin 4 åçä»ç»&lt;/h2&gt;
-&lt;p&gt;é¦åä»ç»ä¸ä¸ Kylin 4 çä¸»è¦ä¼å¿ãApache Kylin 4 æ¯å®å
¨åºäº Spark å»åæå»ºåæ¥è¯¢çï¼è½å¤ååå°å©ç¨ 
Sparkçå¹¶è¡åãåéååå¨å±å¨æä»£ç 
çæçææ¯ï¼å»æé«å¤§æ¥è¯¢çæçã&lt;br /&gt;


[... 282 lines stripped ...]

svn commit: r1899035 [2/3] - in /kylin/site: ./ blog/ blog/2022/03/ blog/2022/03/17/ blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/ cn/blog/ cn_blog/2022/03/ cn_blog/2022/03/17/ cn_blog/2022/03/17/kylin4-now-supporting-aws-glue-catalog/ images...

Reply via email to