Modified: kylin/site/feed.xml URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1900099&r1=1900098&r2=1900099&view=diff ============================================================================== --- kylin/site/feed.xml (original) +++ kylin/site/feed.xml Thu Apr 21 08:37:12 2022 @@ -19,11 +19,648 @@ <description>Apache Kylin Home</description> <link>http://kylin.apache.org/</link> <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/> - <pubDate>Thu, 31 Mar 2022 06:59:26 -0700</pubDate> - <lastBuildDate>Thu, 31 Mar 2022 06:59:26 -0700</lastBuildDate> + <pubDate>Thu, 21 Apr 2022 01:27:57 -0700</pubDate> + <lastBuildDate>Thu, 21 Apr 2022 01:27:57 -0700</lastBuildDate> <generator>Jekyll v2.5.3</generator> <item> + <title>Kylin on Cloud ââ 䏤尿¶å¿«éæå»ºäºä¸æ°æ®åæå¹³å°(ä¸)</title> + <description><p>以ä¸é¨å为 <code class="highlighter-rouge">Kylin on Cloud ââ 䏤尿¶å¿«éæå»ºäºä¸æ°æ®åæå¹³å°</code> çä¸ç¯ï¼ä¸ç¯è¯·æ¥çï¼<a href="../kylin4-on-cloud-part1/">Kylin on Cloud ââ 䏤尿¶å¿«éæå»ºäºä¸æ°æ®åæå¹³å°(ä¸)</a></p> + +<h3 id="kylin-">Kylin æ¥è¯¢é群</h3> + +<h4 id="kylin--1">å¯å¨ Kylin æ¥è¯¢é群</h4> + +<p>1.å¨å¯å¨æå»ºé群æ¶ä½¿ç¨ç kylin_configs.yaml çåºç¡ä¸ï¼æå¼ mdx å¼å ³ï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>ENABLE_MDX: &amp;ENABLE_MDX 'true' +</code></pre> +</div> + +<p>2.ç¶åæ§è¡é¨ç½²å½ä»¤å¯å¨é群ï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>python deploy.py --type deploy --mode query +</code></pre> +</div> + +<h4 id="kylin--2">ä½éª kylin çæ¥è¯¢é度</h4> + +<p>1.æ¥è¯¢é群å¯å¨æååï¼å æ§è¡ <code class="highlighter-rouge">python deploy.py --type list</code> å½ä»¤æ¥ååºææèç¹ä¿¡æ¯ï¼ç¶å卿µè§å¨è¾å ¥ http://${kylin_node_public_ip}:7070/kylin æ£æ¥ kylin UIï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/14_kylin_web_ui.png" alt="" /></p> + +<p>2.å¨ Insight 页颿§è¡ä¸ä¹åå¨ spark-sql ä¸ç¸åç sqlï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>select TAXI_TRIP_RECORDS_VIEW.PICKUP_DATE, NEWYORK_ZONE.BOROUGH, count(*), sum(TAXI_TRIP_RECORDS_VIEW.TRIP_TIME_HOUR), sum(TAXI_TRIP_RECORDS_VIEW.TOTAL_AMOUNT) +from TAXI_TRIP_RECORDS_VIEW +left join NEWYORK_ZONE +on TAXI_TRIP_RECORDS_VIEW.PULOCATIONID = NEWYORK_ZONE.LOCATIONID +group by TAXI_TRIP_RECORDS_VIEW.PICKUP_DATE, NEWYORK_ZONE.BOROUGH; +</code></pre> +</div> + +<p><img src="/images/blog/kylin4_on_cloud/15_query_in_kylin.png" alt="" /></p> + +<p>å¯ä»¥çå°ï¼å¨æ¥è¯¢å»ä¸ cube çæ åµä¸ï¼ä¹å°±æ¯æ¥è¯¢ç»æç´æ¥æ¥èªäºé¢è®¡ç®åçæ°æ®ï¼åªä½¿ç¨äºå¤§æ¦ 4 ç§çæ¶é´å°±è¿åäºæ¥è¯¢ç»æï¼å¤§å¤§èçäºæ¥è¯¢æ¶é´ã</p> + +<h3 id="section">é¢è®¡ç®é使¥è¯¢ææ¬</h3> + +<p>å¨å¯¹æ¯åç SparkSql å Kylin æ¥è¯¢éåº¦çæµè¯ä¸ï¼æä»¬ä½¿ç¨çæ°æ®éæ¯çº½çº¦å¸åºç§è½¦è®¢åæ°æ®ï¼äºå®è¡¨å ±æ 2 亿+ æ°æ®ãä»å¯¹æ¯ç»æå¯ä»¥çå°ï¼å¨ä¸äº¿çå¤§æ°æ®åæåºæ¯ä¸ï¼Kylin è½å¤æ¾èæåæ¥è¯¢æçï¼éè¿ä¸æ¬¡æå»ºå éä¸åä¸ä¸æ¬¡ä¸å¡æ¥è¯¢ï¼æå¤§çé使¥è¯¢ææ¬ã</p> + +<h3 id="section-1">é ç½®è¯ä¹å±</h3> + +<h4 id="mdx-for-kylin--dataset">å MDX for Kylin å¯¼å ¥ Dataset</h4> + +<p>å¨ <code class="highlighter-rouge">MDX for Kylin</code> ä¸å¯ä»¥æ ¹æ®æè¿æ¥ç Kylin ä¸ç Cube æ¥å建 <code class="highlighter-rouge">Dataset</code>ï¼å®ä¹ Cube å ³ç³»ï¼å建ä¸å¡ææ ã为æ¹ä¾¿ä½éªï¼ç¨æ·å¯ä»¥ç´æ¥ä» S3 ä¸è½½ Dataset æä»¶å¯¼å ¥å° <code class="highlighter-rouge">MDX for Kylin</code> ä¸ï¼</p> + +<p>1.ä» S3 ä¸è½½ Dataset æä»¶å°æ¬å°æºå¨</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>wget https://s3.cn-north-1.amazonaws.com.cn/public.kyligence.io/kylin/kylin_demo/covid_trip_project_covid_trip_dataset.json +</code></pre> +</div> + +<p>2.è®¿é® <code class="highlighter-rouge">MDX for Kylin</code> çé¢</p> + +<p>卿µè§å¨è¾å ¥ <code class="highlighter-rouge">http://${kylin_node_public_ip}:7080</code> è®¿é® <code class="highlighter-rouge">MDX for Kylin</code> 页é¢ï¼ä»¥ <code class="highlighter-rouge">ADMIN/KYLIN</code> çç¨æ·åå¯ç ç»åç»å½ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/16_mdx_web_ui.png" alt="" /></p> + +<p>3.确认 Kylin è¿æ¥</p> + +<p><code class="highlighter-rouge">MDX for Kylin</code> ä¸å·²ç»é ç½®äºéè¦è¿æ¥ç kylin èç¹çä¿¡æ¯ï¼é¦æ¬¡ç»å½éè¦è¾å ¥ kylin èç¹çç¨æ·ååå¯ç ä¹å°±æ¯ <code class="highlighter-rouge">ADMIN/KYLIN</code>ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/17_connect_to_kylin.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_on_cloud/18_exit_management.png" alt="" /></p> + +<p>4.å¯¼å ¥ Dataset</p> + +<p>è¿æ¥ Kylin æååç¹å»å³ä¸è§ç徿 éåºç®¡ççé¢ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/19_kylin_running.png" alt="" /></p> + +<p>åæ¢å° <code class="highlighter-rouge">covid_trip_project</code> 项ç®ï¼å¨ Dataset 页é¢ä¸ç¹å» <code class="highlighter-rouge">Import Dataset</code>ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/20_import_dataset.png" alt="" /></p> + +<p>éæ©ååä» S3 ä¸è½½çæä»¶ <code class="highlighter-rouge">covid_trip_project_covid_trip_dataset.json</code> å¯¼å ¥ã</p> + +<p><code class="highlighter-rouge">covid_trip_dataset</code> ä¸å®ä¹äºåååææ ç年累计ãæç´¯è®¡ãå¹´å¢éãæå¢éï¼åæ¶é´å±çº§ãå°åºå±çº§çç¹æ®ç»´åº¦ã度éï¼ä»¥åæ°å èºçç æ»çãåºç§è½¦å¹³åé度çä¸å¡ææ ãå¦ä½æå¨å建 Dataset 请åèï¼<a href="https://cwiki.apache.org/confluence/display/KYLIN/Create+Dataset+in+MDX+for+Kylin">Create dataset in MDX for Kylin</a>ï¼MDX for Kylin æå龿¥è¯·åèï¼<a href="https://kyligence.github.io/mdx-kylin/">MDX for Kylin ä½¿ç¨æå</a>ã</p> + +<h2 id="section-2">æ°æ®åæ</h2> + +<h3 id="tableau-">éè¿ Tableau è¿è¡æ°æ®åæ</h3> + +<p>æä»¬ä»¥æ¬å° windows æºå¨ä¸ç tableau 为ä¾è¿æ¥ MDX for Kylin è¿è¡æ°æ®åæã</p> + +<p>1.éæ© Tableau å ç½®ç <code class="highlighter-rouge">Microsoft Analysis Service</code> æ¥è¿æ¥ <code class="highlighter-rouge">MDX for Kylin</code> (éè¦æåå®è£ <code class="highlighter-rouge">Microsoft Analysis Services</code> 驱å¨ï¼å¯ä» tableau å®ç½ä¸è½½ï¼<a href="https://www.tableau.com/support/drivers?_ga=2.104833284.564621013.1647953885-1839825424.1608198275">Microsoft Analysis Services 驱å¨ä¸è½½</a>)</p> + +<p><img src="/images/blog/kylin4_on_cloud/21_tableau_connect.png" alt="" /></p> + +<p>2.å¨å¼¹åºç设置页é¢ä¸å¡«å <code class="highlighter-rouge">MDX for Kylin</code> çè¿æ¥å°åï¼ä»¥åç¨æ·ååå¯ç ï¼è¿æ¥å°å为 <code class="highlighter-rouge">http://${kylin_node_public_ip}:7080/mdx/xmla/covid_trip_project</code>:</p> + +<p><img src="/images/blog/kylin4_on_cloud/22_tableau_server.png" alt="" /></p> + +<p>3.éæ© <code class="highlighter-rouge">covid_trip_dataset</code> ä½ä¸ºæ°æ®éï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/23_tableau_dataset.png" alt="" /></p> + +<p>4.ç¶åå³å¯å¨å·¥ä½è¡¨ä¸è¿è¡æ°æ®åæï¼ç±äºæä»¬å¨ <code class="highlighter-rouge">MDX for Kylin</code> ä¸å·²ç»ç»ä¸å®ä¹äºä¸å¡ææ ï¼æä»¥å¨ tableau ä¸å¶ä½æ°æ®åææ¥è¡¨æ¶ï¼å¯ä»¥ç´æ¥ææ½å®ä¹å¥½çä¸å¡ææ å°å·¥ä½è¡¨ä¸è¿è¡å±ç¤ºã</p> + +<p>5.é¦å åæç«æ æ°æ®ï¼éè¿ç¡®è¯äººæ°ãç æ»çä¸¤ä¸ªææ æ¥ç»å¶å½å®¶çº§å«çç«æ å°å¾ï¼åªéè¦å°å°åºå±çº§ä¸ç <code class="highlighter-rouge">COUNTRY_SHORT_NAME</code> æ¾å°å·¥ä½è¡¨çåä¸ ï¼å°äºå å®ä¹å¥½çæ°å¢ç¡®è¯äººæ°æ»å <code class="highlighter-rouge">SUM_NEW_POSITIVE_CASES</code> åç æ»çææ <code class="highlighter-rouge">CFR_COVID19</code> æ¾å°å·¥ä½è¡¨çè¡ä¸ï¼ç¶å鿩以å°å¾å½¢å¼å±ç¤ºæ°æ®ç»æï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/24_tableau_covid19_map.png" alt="" /></p> + +<p>å ¶ä¸ï¼å¾æ é¢ç§¯ä»£è¡¨æ»äº¡äººæ°çº§å«ï¼å¾æ é¢è²æ·±æµ ä»£è¡¨ç æ»ç级å«ãéè¿ç«æ å°å¾å¯ä»¥çåºï¼ç¾å½åå°åº¦çç¡®è¯äººæ°ç¸å¯¹è¾å¤ï¼ä½æ¯è¿ä¸¤ä¸ªå½å®¶çç æ»çä¸å ¶ä»å¤§å¤æ°å½å®¶æ²¡æææ¾å·®å«ï¼èç¡®è¯äººæ°å¾å°çç§é²ãç¦åªé¿å¾ã墨西å¥çå½å®¶çç æ»çåå± é«ä¸ä¸ãä»è¿ä¸ªç°è±¡å ¥æï¼ä¹è®¸å¯ä»¥ææå°æ´æ·±å±æ¬¡çåå ã</p> + +<p>ç±äºæä»¬è®¾ç½®äºå°åºå±çº§ï¼æä»¥å¯ä»¥å°å½å®¶çº§å«çç«æ å°å¾ä¸é»å°ç级å«ï¼æ¥çå个å½å®¶å é¨å个å°åºçç«æ æ åµï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/25_tableau_province.png" alt="" /></p> + +<p>å¨ province 级å«çç«æ å°å¾æ¾å¤§çç¾å½çç«æ ç¶åµï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/26_tableau_us_covid19.png" alt="" /></p> + +<p>å¯ä»¥åç°ï¼ç¾å½æ¯ä¸ªå·çç æ»çæ²¡æææ¾å·®è·ï¼é½å¨ 0.01 å·¦å³ï¼å¨ç¡®è¯äººæ°ä¸ï¼CaliforniaãTexasãFlorida 以å纽约å¸å 个å°åºææ¾åé«ï¼è¿å 个å°åºç»æµåè¾¾ã人å£ä¼å¤ï¼æ°å èºçç¡®è¯äººæ°ä¹é乿åãä¸é¢é对纽约å¸åºç§è½¦æ°æ®éï¼ç»åç«æ å屿 åµï¼åæç«æ å½¢å¿ä¸äººä»¬ä¹ååºç§è½¦åºè¡çæ°æ®ååã</p> + +<p>6.对äºçº½çº¦å¸åºç§è½¦è®¢åæ°æ®éï¼åå«ä»ä»¥ä¸ä¸¤ä¸ªä¸å¡é®é¢å ¥æï¼</p> + +<ul> + <li>åæçº½çº¦å¸å个è¡åºåºè¡ç¹å¾ï¼å¯¹æ¯è®¢åæ°éãåºè¡é度çåºè¡ææ </li> +</ul> + +<p>å° lookup 表 <code class="highlighter-rouge">PICKUP_NEWYORK_ZONE</code> ä¸çåæ®µ <code class="highlighter-rouge">BOROUGH</code> ææ½å°å·¥ä½è¡¨çåä¸ï¼å°ææ <code class="highlighter-rouge">ORDER_COUNT</code>ã<code class="highlighter-rouge">trip_mean_speed</code> ææ½å°å·¥ä½è¡¨çè¡ä¸ï¼ä»¥ç¬¦å·å°å¾çæ¹å¼å±ç¤ºï¼é¢è²æ·±æµ 代表平åé度ãé¢ç§¯å¤§å°ä»£è¡¨è®¢åæ°éï¼å¯ä»¥çå°ä»æ¼åé¡¿åºåºåçåºç§è½¦è®¢åæ¯å«çè¡åºæ»åé½è¦é«ï¼ä½æ¯å¹³åé度æå°ï¼Queens è¡åºæ¬¡ä¹ï¼Staten Island 忝åºç§è� �¦æ´»å¨æå°çä¸ä¸ªè¡åºãä» Bronx åºåçåºç§è½¦å¹³åé度é«è¾¾ 82 è±é/å°æ¶ï¼æ¯å ¶ä»è¡åºçå¹³åé度é½é«åºå åãä»è¿äºåºè¡ç¹å¾å¯ä»¥æ å°åºçº½çº¦å¸å个è¡åºç人å£å¯éç¨åº¦ä»¥åç»æµåè¾¾ç¨åº¦ã</p> + +<p><img src="/images/blog/kylin4_on_cloud/27_tableau_taxi_1.png" alt="" /></p> + +<p>ç¶åå° lookup 表 <code class="highlighter-rouge">PICKUP_NEWYORK_ZONE</code> ä¸çåæ®µ <code class="highlighter-rouge">BOROUGH</code> æ¢æ <code class="highlighter-rouge">DROPOFF_NEWYORK_ZONE</code> ä¸ç <code class="highlighter-rouge">BOROUGH</code>ï¼ç»è®¡åºç§è½¦è®¢åå°è¾¾è¡åºçæ°éåå¹³åé度ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/27_tableau_taxi_2.png" alt="" /></p> + +<p>ç¸æ¯åºåè¡åºçæ°æ®ï¼brooklyãQueens å Bronx ä¸ä¸ªè¡åºçå°è¾¾æ°æ®é½ææ¯è¾ææ¾çå·®å«ï¼ä»æ¯ä¾å ³ç³»ä¸æ¥çï¼å°è¾¾ brookly å Bronx çåºç§è½¦è®¢åè¦è¿è¿å¤äºä» Brookly å Bronx åºåç订åï¼å°è¾¾ Queens è¡åºçè®¢åæ°éåææ¾å°äºä» Queens è¡åºåºåç订åã</p> + +<ul> + <li>ç«æ åå纽约å¸å± æ°ä¹ååºç§è½¦çåºè¡ä¹ æ¯ååï¼æ´ååè¿ç¨åºè¡è¿æ¯è¿ç¨</li> +</ul> + +<p>éè¿å¹³ååºè¡éç¨åæå± æ°åºè¡ä¹ æ¯ååï¼å°ç»´åº¦ <code class="highlighter-rouge">MONTH_START</code> ææ½å°å·¥ä½è¡¨çè¡ï¼å°ææ <code class="highlighter-rouge">trip_mean_distance</code> ææ½å°å·¥ä½è¡¨çåï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/28_tableau_taxi_3.png" alt="" /></p> + +<p>æ ¹æ®æ±ç¶å¾çç»æå¯ä»¥åç°ï¼ç«æ åå人们çåºè¡ä¹ æ¯åçäºææ¾çååï¼ä» 2020.03 å¼å§å¹³ååºè¡éç¨æææ¾åé«ï¼çè³æçæä»½åçæ°åå¢é¿ï¼å¹¶ä¸ç«æ å¼å§åæ¯ä¸ªæçå¹³ååºè¡éç¨åçå¾ä¸ç¨³å®ãåºäºè¿ç§æ°æ®è¡¨ç°ï¼æä»¬å¯ä»¥åç»åæä»½ç»´åº¦çç«æ æ°æ®è¿è¡èååæï¼å° <code class="highlighter-rouge">SUM_NEW_POSITIVE_CASES</code> å <code class="highlighter-rouge">MTD_ORDER_COUNT</code> ææ½å°å·¥ä½è¡¨çè¡ä¸ï¼å¹¶å¨çéå¨ä¸å¢ å ç鿡件 <code class="highlighter-rouge">PROVINCE_STATE_NAME=New York</code>ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/29_tableau_taxi_4.png" alt="" /></p> + +<p>å¯ä»¥çå°ä¸ä¸ªæè¶£çç°è±¡ï¼ç«æ åæååçåçæ¶ååºç§è½¦è®¢å鿥å§åå°ï¼èå¹³ååºè¡éç¨å¢å¤§ï¼è¯´æå¤§å®¶åå°äºå¾å¤ä¸å¿ è¦ççè·ç¦»åºè¡ï¼æè éç¨åºç§è½¦ä»¥å¤çæ´å®å ¨çäº¤éæ¹å¼è¿è¡äºçè·ç¦»åºè¡ã对æ¯ä¸ç§æ°æ®çæ²çº¿ååï¼å¯ä»¥çå°ç«æ 严éç¨åº¦å人们çåºè¡æ åµè¡¨ç°åºå¾é«çç¸å ³æ§ï¼ç«æ 䏥鿶åºç§è½¦è®¢åéåå°ï¼å¹³ååºè¡éç¨æåï¼ç¶åç«æ 好转ï¼åºç§è½¦è®¢å éå¢å¤§ï¼å¹³ååºè¡éç¨åè½ã</p> + +<h3 id="excel-">éè¿ Excel è¿è¡æ°æ®åæ</h3> + +<p>æäº <code class="highlighter-rouge">MDX for Kylin</code> ç帮å©ï¼æä»¬å¨ Excel ä¸ä¹å¯ä»¥è¿æ¥ Kylin è¿è¡å¤§æ°æ®åæãè¿æ¬¡æµè¯ä¸ï¼æä»¬ä½¿ç¨æ¬å° windows æºå¨ä¸ç Excel è¿æ¥ MDX for Kylin è¿è¡æ¼ç¤ºã</p> + +<p>1.æå¼ Excelï¼éæ© æ°æ® -&gt; è·åæ°æ® -&gt; æ¥èªæ°æ®åº -&gt; èª <code class="highlighter-rouge">Analysis Services</code>ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/30_excel_connect.png" alt="" /></p> + +<p>2.卿°æ®è¿æ¥å导ä¸å¡«åMDX for Kylin è¿æ¥ä¿¡æ¯ï¼æå¡å¨å称为 <code class="highlighter-rouge">http://${kylin_node_public_ip}:7080/mdx/xmla/covid_trip_project</code>ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/31_excel_server.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_on_cloud/32_tableau_dataset.png" alt="" /></p> + +<p>3.ç¶å为å½åçæ°æ®è¿æ¥åå»ºæ°æ®éè§è¡¨ï¼å¨æ°æ®éè§è¡¨å段ä¸ï¼æä»¬å¯ä»¥çå°ï¼å¨ Excel ä¸è¿æ¥ <code class="highlighter-rouge">MDX for Kylin</code> ä¸ç dataset è·åæ°æ®ä¿¡æ¯ï¼å¯ä»¥ä¸ Tableau ä¿æå®å ¨ä¸è´ï¼æ 论åæäººåæ¯å¨ Tableau è¿æ¯ Excel ä¸è¿è¡åæï¼é½æ¯å¨ä¸è´çæ°æ®æ¨¡åã维度åä¸å¡ææ çåºç¡ä¸ï¼è¾¾å°ç»ä¸è¯ä¹çææã</p> + +<p>4.å¨ Tableau 䏿们坹 <code class="highlighter-rouge">covid19</code> å <code class="highlighter-rouge">newyork_trip_data</code> ä¸¤ä¸ªæ°æ®éè¿è¡äºç«æ å°å¾ç»å¶åè¶å¿åæãå¨ Excel ä¸å¯¹äºåæ ·çæ°æ®éåæ°æ®åºæ¯ï¼æä»¬å¯ä»¥æ¥çæ´å¤çæç»æ°æ®ã</p> + +<ul> + <li>对äºç«æ æ°æ®ï¼ä¸ºæ°æ®éè§è¡¨éåå°åºå±çº§å段 <code class="highlighter-rouge">REGION_HIERARCHY</code>ï¼ä»¥åäºå å®ä¹å¥½çæ°å¢ç 便°æ»å <code class="highlighter-rouge">SUM_NEW_POSITIVE_CASES</code> åç æ»çææ <code class="highlighter-rouge">CFR_COVID19</code>ï¼</li> +</ul> + +<p><img src="/images/blog/kylin4_on_cloud/33_tableau_covid19_1.png" alt="" /></p> + +<p>ç±äºå°åºå±çº§çæä¸å±ä¸º <code class="highlighter-rouge">CONTINENT_NAME</code>ï¼æä»¥é»è®¤å±ç¤ºæ´²çº§å«çç¡®è¯äººæ°åç æ»çï¼å¯ä»¥çå°ç¡®è¯äººæ°æå¤çæ´²æ¯æ¬§æ´²ï¼ç æ»çæé«çæ¯éæ´²ãå¨è¿å¼ æ°æ®éè§è¡¨ä¸æä»¬å¯ä»¥æ¹ä¾¿çä¸é»å°æ´ä¸å±çå°åºçº§å«æ¥çæ´ç»ç²åº¦çæç»æ°æ®ï¼æ¯å¦æ¥çäºæ´²å½å®¶çç«æ æ°æ®ï¼å¹¶æ ¹æ®ç¡®è¯äººæ°è¿è¡éåºæåºï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/34_excel_covid20_2.png" alt="" /></p> + +<p>æ°æ®æ¾ç¤ºï¼äºæ´²å½å®¶ä¸ç¡®è¯äººæ°æååä¸çå½å®¶å嫿¯å°åº¦ãåè³å ¶å伿ã</p> + +<ul> + <li>对äºçº½çº¦å¸åºç§è½¦è®¢åæ°æ®ï¼é对 âç«æ 对äºåºç§è½¦è®¢åæ°éææ ææ¾å½±åâ çé®é¢ï¼é¦å ä»å¹´ä»½çç»´åº¦ä¸æ¥çåºç§è½¦è®¢åæ°éç年累计åå¢éï¼æ°å»ºéè§è¡¨éæ©æ¶é´å±çº§ç»´åº¦ <code class="highlighter-rouge">TIME_HIERARCHY</code>ã<code class="highlighter-rouge">YOY_ORDER_COUNT</code> å <code class="highlighter-rouge">YTD_ORDER_COUNT</code>ï¼</li> +</ul> + +<p><img src="/images/blog/kylin4_on_cloud/35_excel_taxi_1.png" alt="" /></p> + +<p>å¯ä»¥çå°ï¼2020 å¹´ç«æ çå导è´åºç§è½¦è®¢åæ°éæ¥å§åå°ï¼2020年订åéå¢é为 -0.7079ï¼åå°äº 70% çåºè¡è®¢åï¼2021 年订åéå¢éä»ä¸ºè´æ°ï¼ä½æ¯ç¸æ¯ 2020 å¹´ç«æ åæè®¢åéåå°é度æ¾ç¼äºè®¸å¤ã</p> + +<p>å±å¼æ¶é´å±çº§ï¼å¯ä»¥æ¥çå£åº¦çº§å«ãæçº§å«ç´å°å¤©çº§å«ç订å累计å¼ï¼éæ© <code class="highlighter-rouge">MOM_ORDER_COUNT</code> å <code class="highlighter-rouge">ORDER_COUNT</code> å°éè§è¡¨ä¸è¿å¯ä»¥åæ¶æ¥çæåº¦è®¢åå¢é以åå个æ¶é´å±çº§çè®¢åæ°éï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/36_excel_taxi_2.png" alt="" /></p> + +<p>2020 å¹´ 3 æä»½ï¼è®¢åå¢é为 -0.52ï¼åºç§è½¦è®¢åå·²ç»åºç°ææ¾åå°ï¼4 æä»½æ´æ¯è·è³ -0.92ï¼åå°äº 90% ç订åï¼åæå¼å§æ ¢æ ¢å¢é¿ï¼ä½æ¯ä¹å§ç»è¿ä½äºç«æ ä¹åçæ°éã</p> + +<h3 id="api--kylin-">éè¿ API éæ Kylin å°æ°æ®åæå¹³å°</h3> + +<p>é¤äº ExcelãTableau è¿ç§åä¸ BI å·¥å ·ï¼å¾å¤ä¼ä¸å é¨ä¼å¼åèªå·±çæ°æ®åæå¹³å°ï¼å¨è¿ç±»èªç æ°æ®åæå¹³å°ä¸ï¼ç¨æ·ä»ç¶å¯ä»¥éè¿è°ç¨ API çæ¹å¼å° Kylin + MDX for Kylin ä½ä¸ºåæå¹³å°çåºç¡åºåº§ï¼ä¿è¯ç»ä¸çæ°æ®å£å¾ãå¨è¿æ¬¡æ¼ç¤ºä¸ï¼æä»¬å°å±ç¤ºå¦ä½éè¿ Olap4j å MDX for Kylin åéæ¥è¯¢ï¼è·å¾åæç»æï¼Olap4j æ¯ä¸ä¸ªä¸ JDBC 驱å¨ç±»ä¼¼ï¼è½å¤è®¿é®ä»»æ OLAP æå¡ç Java åºã</p> + +<p>æä»¬æä¾äºä¸ä¸ªç®åç demo å¯ä»¥æ¹ä¾¿ç¨æ·ç´æ¥è¿è¡æµè¯ï¼æºç ä½äº <a href="https://github.com/apache/kylin/tree/mdx-query-demo">mdx query demo</a>ï¼</p> + +<p>1.ä¸è½½ demo æ¼ç¤ºç¸å ³ jar å :</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>wget https://s3.cn-north-1.amazonaws.com.cn/public.kyligence.io/kylin/kylin_demo/mdx_query_demo.tgz +tar -xvf mdx_query_demo.tgz +cd mdx_query_demo +</code></pre> +</div> + +<p>2.è¿è¡ demo</p> + +<p>è¿è¡ demo ä¹åä¿è¯è¿è¡ç¯å¢å®è£ äº java8ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/37_jdk_8.png" alt="" /></p> + +<p>è¿è¡ demo éè¦ä¸¤ä¸ªåæ°ï¼mdx èç¹ç ip å éè¦è¿è¡ç mdx æ¥è¯¢ï¼ç«¯å£é»è®¤ä¸º 7080ï¼è¿éç mdx èç¹ ip å°±æ¯ kylin èç¹ç public ipï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>java -cp olap4j-xmla-1.2.0.jar:olap4j-1.2.0.jar:xercesImpl-2.9.1.jar:mdx-query-demo-0.0.1.jar io.kyligence.mdxquerydemo.MdxQueryDemoApplication "${kylin_node_public_ip}" "${mdx_query}" +</code></pre> +</div> + +<p>å¦æç¨æ·å¨è¿è¡ demo æ¶æ²¡æéè¿å½ä»¤è¡è¾å ¥éè¦æ§è¡ç mdx è¯å¥ï¼åä¼é»è®¤æ§è¡ä»¥ä¸ mdx è¯å¥ç»è®¡ä»åºåè¡åºç维度ä¸å个è¡åºçè®¢åæ°éåå¹³åéç¨ï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>SELECT +{[Measures].[ORDER_COUNT], +[Measures].[trip_mean_distance]} +DIMENSION PROPERTIES [MEMBER_UNIQUE_NAME],[MEMBER_ORDINAL],[MEMBER_CAPTION] ON COLUMNS, +NON EMPTY [PICKUP_NEWYORK_ZONE].[BOROUGH].[BOROUGH].AllMembers +DIMENSION PROPERTIES [MEMBER_UNIQUE_NAME],[MEMBER_ORDINAL],[MEMBER_CAPTION] ON ROWS +FROM [covid_trip_dataset] +</code></pre> +</div> + +<p>å¨è¿æ¬¡æ¼ç¤ºä¸æä»¬ç´æ¥æ§è¡é»è®¤æ¥è¯¢ï¼æ§è¡æåä¹åï¼ç»è¿ç®åå¤ççæ¥è¯¢ç»æä¼è¾åºå°å½ä»¤è¡ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/38_demo_result.png" alt="" /></p> + +<p>å¯ä»¥çå°ï¼è¿è¡ Demo ä¹åæåè·å¾äºéè¦æ¥è¯¢çæ°æ®ï¼æ°æ®ç»ææ¾ç¤ºï¼ä» Manhattan åºåçåºç§è½¦è®¢åæ°éæå¤ï¼è®¢åå¹³åéç¨åªæå¤§çº¦ 2.4 è±éï¼ç¬¦å Manhattan å°çé¢ç§¯å°ä¸äººå£ç¨ å¯çç¹ç¹ï¼èä» Bronx ç订åå¹³åéç¨è¾¾å° 33 è±éï¼æåçé«äºå ¶ä»ä»»ä½è¡åºï¼å¯è½æ¯ç±äº Bronx å°å¤åå»çç¼æ ã</p> + +<p>ä¸ Tableau å Excel ç¸åï¼å¨ Demo ä¸ç¼åç mdx è¯è¨ä¸å¯ä»¥ç´æ¥ä½¿ç¨å¨ Kylin 以å MDX for Kylin ä¸å®ä¹çææ ãå¨ä¼ä¸èªç æ°æ®åæå¹³å°ä¸ï¼ç¨æ·å¯ä»¥å¯¹æ¥è¯¢è¿åçæ°æ®ç»æè¿è¡è¿ä¸æ¥åæï¼æ ¹æ®å±ç¤ºéæ±çææ¥è¡¨ã</p> + +<h3 id="section-3">ç»ä¸çæ°æ®å£å¾</h3> + +<p>éè¿ä»¥ä¸ç§ä¸åçæ°æ®åææ¹å¼è¿æ¥ Kylin + MDX for Kylin è¿è¡æ°æ®åæå±ç¤ºï¼æä»¬å¯ä»¥åç°ï¼åå© Kylin å¤ç»´æ°æ®åºå MDX for Kylin è¯ä¹å±åè½ï¼æ è®ºç¨æ·å¨ä¸å¡åºæ¯ä¸ä½¿ç¨åªç§æ¹å¼åææ°æ®ï¼é½å¯ä»¥ä½¿ç¨ç¸åçæ°æ®æ¨¡ååä¸å¡ææ ï¼è¾¾å°ç»ä¸æ°æ®å£å¾çç®çã</p> + +<h2 id="section-4">鿝é群</h2> + +<h3 id="section-5">鿝æ¥è¯¢é群</h3> + +<p>å¨ä¸è¿°åæå®æä¹åï¼æä»¬å¯ä»¥æ§è¡éç¾¤éæ¯å½ä»¤æ¥éæ¯æ¥è¯¢é群ãå¦æç¨æ·å¸æåæ¶éæ¯ Kylin 以å MDX for Kylin çå æ°æ®åº RDSãçæ§èç¹ä»¥å VPCï¼é£ä¹å¯ä»¥æ§è¡éç¾¤éæ¯å½ä»¤ï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>python deploy.py --type destroy-all +</code></pre> +</div> + +<h3 id="aws-">æ£æ¥ AWS èµæº</h3> + +<p>å¨éæ¯ææéç¾¤èµæºåï¼<code class="highlighter-rouge">CloudFormation</code> ä¸ä¸ä¼ä¿çä¸é¨ç½²å·¥å ·ç¸å ³çä»»ä½ Stackãå¦æç¨æ·æ³è¦å é¤ S3 ä¸ä¸é¨ç½²å·¥å ·ç¸å ³çæä»¶ä»¥åæ°æ®ï¼å¯ä»¥æå¨å é¤ S3 å·¥ä½ç®å½ä¸ç以䏿件夹ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/39_check_s3_demo.png" alt="" /></p> + +<h2 id="section-6">æ»ç»</h2> + +<p>éè¿è¿æ¬¡æ¼ç¤ºæç¨ï¼åªéè¦ä¸ä¸ª AWS è´¦å·ï¼ç¨æ·å°±å¯ä»¥ä½¿ç¨äºä¸é¨ç½²å·¥å ·ï¼åå©äº Kylin çé¢è®¡ç®ææ¯åå¤ç»´æ¨¡åï¼ä»¥åMDX for Kylin çåºç¡ææ 管çï¼å¿«é䏿¹ä¾¿çæå»ºåºäº Kylin + MDX for Kylin çäºä¸å¤§æ°æ®åæå¹³å°ï¼å¯¹æ¥åç§ BI å·¥å ·è¿è¡ææ¯éªè¯ï¼è¾¾å°éæ¬å¢æãç»ä¸æ°æ®å£å¾çç®çã</p> + +</description> + <pubDate>Wed, 20 Apr 2022 04:00:00 -0700</pubDate> + <link>http://kylin.apache.org/cn_blog/2022/04/20/kylin4-on-cloud-part2/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn_blog/2022/04/20/kylin4-on-cloud-part2/</guid> + + + <category>cn_blog</category> + + </item> + + <item> + <title>Kylin on Cloud ââ 䏤尿¶å¿«éæå»ºäºä¸æ°æ®åæå¹³å°(ä¸)</title> + <description><h2 id="section">èæ¯</h2> + +<p>Apache Kylin æ¯åºäºé¢è®¡ç®åå¤ç»´æ¨¡åçå¤ç»´æ°æ®åºï¼æ¯æ SQL æ åæ¥è¯¢æ¥å£ï¼å¨ Kylin ä¸ç¨æ·å¯ä»¥éè¿å建 Model å®ä¹è¡¨å ³ç³»ï¼éè¿å建 Cube å®ä¹ç»´åº¦å度éï¼ç¶åæå»º Cube 对éè¦èåçæ°æ®è¿è¡é¢è®¡ç®ï¼å°é¢è®¡ç®å¥½çæ°æ®ä¿åèµ·æ¥ï¼ç¨æ·æ§è¡æ¥è¯¢æ¶ä¾¿å¯ä»¥ç´æ¥å¨ç»è¿é¢è®¡ç®çæ°æ®ä¸è¿è¡è¿ä¸æ¥çèåæè ç´æ¥è¿åæ¥è¯¢ç»æï¼æåæåæ¥è¯¢æçã</p> + +<p>éç Kylin 4.0 æ°æ¶æççæ¬åå¸ä¸æ´æ°ï¼Kylin å ·å¤äºå¨è±ç¦» Hadoop çäºç¯å¢ä¸è¿è¡é群é¨ç½²çè½åï¼ä¸ºäºä½¿ç¨æ·è½å¤è½»æ¾å°å¨äºä¸é¨ç½² Kylinï¼Kylin 社åºåäºè¿æ¥å¼åäºäºä¸é¨ç½²å·¥å ·ï¼ç¨æ·ä½¿ç¨é¨ç½²å·¥å ·åªéæ§è¡ä¸è¡å½ä»¤ä¾¿å¯ä»¥å¾å°ä¸ä¸ªå®å¤ç kylin é群ï¼è·å¾é«æå¿«éçåæä½éªï¼2022 å¹´1æä»½ï¼Kylin 社åºåå¸äº mdx for kylin æ¥å 强 Kylin ä½ä¸ºå¤ç»´æ°æ®åºçä¸å¡è¡¨è¾¾è½åï¼MDX for Kylin æä¾äº MDX çæ¥è¯¢æ¥å£ï¼mdx for kylin å¯ä»¥� �¨ Kylin å·²ç»å®ä¹å¥½çå¤ç»´æ¨¡åçåºç¡ä¸æ´è¿ä¸æ¥çå建ä¸å¡ææ ï¼å° Kylin ä¸çæ°æ®æ¨¡å转æ¢ä¸ºä¸å¡å好çè¯è¨ï¼èµäºæ°æ®ä¸å¡ä»·å¼ï¼æ¹ä¾¿å¯¹æ¥ ExcelãTableau ç BI å·¥å ·è¿è¡å¤ç»´åæã</p> + +<p>åºäºä»¥ä¸ä¸ç³»åçææ¯æ¯æï¼ç¨æ·ä¸ä» å¯ä»¥æ¹ä¾¿å¿«æ·çå¨äºä¸é¨ç½² Kylin é群ï¼å建å¤ç»´æ¨¡åï¼ä½éªç»è¿é¢è®¡ç®çå¿«éæ¥è¯¢ååºï¼è¿è½å¤ç»å MDX for Kylin 对ä¸å¡ææ è¿è¡å®ä¹å管çï¼å° DW ææ¯å±æåå°ä¸å¡è¯ä¹å±ã</p> + +<p>ç¨æ·å¯ä»¥å¨ Kylin + MDX for Kylin ä¹ä¸ç´æ¥å¯¹æ¥ BI å·¥å ·è¿è¡å¤ç»´æ°æ®åæï¼ä¹å¯ä»¥ä»¥æ¤ä¸ºåºåº§å»ºè®¾ææ å¹³å°ç夿åºç¨ãç¸æ¯äºç´æ¥åºäº SparkãHive çå¨è¿è¡æ¶è¿è¡ Join åèåæ¥è¯¢ç计ç®å¼æä¹ä¸æå»ºææ å¹³å°ï¼å©ç¨ Kylin å¯ä»¥ä¾æäºå¤ç»´æ¨¡ååé¢è®¡ç®ææ¯ï¼ä»¥å mdx for kylin çè¯ä¹å±è½åï¼æ»¡è¶³ææ 平尿éè¦çæµ·éæ°æ®è®¡ç®ãæéæ¥è¯¢ååºãç»ä¸çå¤ç»´æ¨¡åã对æ¥å¤ç§ BIãåºç¡çä¸å¡ææ 管ççå¤ç§å ³� �®åè½ã</p> + +<p>æ¬æç以ä¸é¨åå°ä¼å¸¦é¢è¯»è ï¼ä»ä¸ä¸ªæ°æ®å·¥ç¨å¸çè§åº¦ï¼å¿«éä½éªå¨äºä¸æå»ºåºäº Kylin çæ°æ®åæå¹³å°ï¼Kylin on Cloudï¼ï¼å¨äº¿è¡çº§æ°æ®ä¹ä¸è·å¾é«æ§è½ä½ææ¬çæ¥è¯¢ä½éªï¼å¹¶éè¿ mdx for kylin 管çä¸å¡ææ ï¼ç´æ¥å¯¹æ¥ BI å·¥å ·å¿«éçææ¥è¡¨ã</p> + +<p>æ¬æç¨æ¯ä¸ä¸ªæ¥éª¤é½æè¯¦ç»è¯´æï¼å¹¶éæé å¾åæ£æ¥ç¹ï¼å¸®å©æ°æä¸è·¯ã读è åªéè¦åå¤ä¸ä¸ª AWS è´¦å·ï¼é¢è®¡è¿ä¸ªè¿ç¨éè¦å¤§çº¦ 2 å°æ¶ï¼è±è´¹ ï¿¥100 å·¦å³ã</p> + +<p><img src="/images/blog/kylin4_on_cloud/0_deploy_kylin.png" alt="" /></p> + +<h2 id="section-1">ä¸å¡åºæ¯</h2> + +<p>èª 2020 å¹´åä»¥æ¥ COVID-19 å¨å ¨ä¸çèå´å å¿«éä¼ æï¼å¯¹äººä»¬çè¡£é£ä½è¡å°¤å ¶æ¯åºè¡ä¹ æ¯é ææå¤§å½±åãè¿æ¬¡æ°æ®åæç»å COVID-19 ç«æ æ°æ®å 2018 年以æ¥çº½çº¦åºç§è½¦åºè¡æ°æ®ï¼éè¿åæç«æ ææ ååç§åºè¡ææ ï¼æ¯å¦ç¡®è¯äººæ°ãç æ»çãåºç§è½¦è®¢åæ°ãå¹³ååºè¡è·ç¦»çï¼æ¥æ´å¯çº½çº¦å¸åºç§è½¦è¡ä¸åç«æ å½±åçååè¶å¿ï¼ä»¥æ¯æå³çã</p> + +<h3 id="section-2">ä¸å¡é®é¢</h3> + +<ul> + <li>夿æ èååæå个å½å®¶å°åºç«æ 严éç¨åº¦</li> + <li>纽约å¸å个è¡åºåºè¡ææ 对æ¯ï¼æ¯å¦è®¢åæ°æ°éãåºè¡éç¨ç</li> + <li>ç«æ 对äºåºç§è½¦è®¢åæ°éææ ææ¾å½±å</li> + <li>ç«æ ä¹åçåºè¡ä¹ æ¯ååï¼æ´ååè¿ç¨åºè¡è¿æ¯è¿ç¨</li> + <li>ç«æ 严éç¨åº¦ä¸åºç§è½¦åºè¡æ¬¡æ°æ¯å¦å¼ºç¸å ³</li> +</ul> + +<h3 id="section-3">æ°æ®é</h3> + +<h4 id="covid-19-">COVID-19 æ°æ®é</h4> + +<p>COVID-19 æ°æ®éå æ¬ä¸å¼ äºå®è¡¨ <code class="highlighter-rouge">covid_19_activity</code> åä¸å¼ 维度表 <code class="highlighter-rouge">lookup_calendar</code>ã</p> + +<p>å ¶ä¸ï¼<code class="highlighter-rouge">covid_19_activity</code> è®°å½æ¯ä¸å¤©å ¨çèå´å ä¸åå°åºçç¡®è¯åæ»äº¡æ°åï¼<code class="highlighter-rouge">lookup_calendar</code> ä¸ºæ¥æç»´åº¦è¡¨ï¼ä¿åäºæ¶é´çæ©å±ä¿¡æ¯ï¼æ¯å¦æ¯ä¸ä¸ªæ¥æå¯¹åºçå¹´å§ãæå§çï¼<code class="highlighter-rouge">covid_19_activity</code> å <code class="highlighter-rouge">lookup_calendar</code> ä¹é´éè¿æ¥æè¿è¡å ³èã</p> + +<p>COVID-19 æ°æ®éç¸å ³ä¿¡æ¯å¦ä¸:</p> + +<table> + <tbody> + <tr> + <td>æ°æ®å¤§å°</td> + <td>235 MB</td> + </tr> + <tr> + <td>äºå®è¡¨æ°æ®è¡æ°</td> + <td>2,753,688</td> + </tr> + <tr> + <td>æ°æ®æ¥æ</td> + <td>2020-01-21~2022-03-07</td> + </tr> + <tr> + <td>æ°æ®éæä¾æ¹ä¸è½½å°å</td> + <td>https://data.world/covid-19-data-resource-hub/covid-19-case-counts/workspace/file?filename=COVID-19+Activity.csv</td> + </tr> + <tr> + <td>æ°æ®é S3 å°å</td> + <td>s3://public.kyligence.io/kylin/kylin_demo/data/covid19_data/</td> + </tr> + </tbody> +</table> + +<h4 id="section-4">纽约å¸åºç§è½¦è®¢åæ°æ®é</h4> + +<p>纽约å¸åºç§è½¦è®¢åæ°æ®éå æ¬ä¸å¼ äºå®è¡¨ <code class="highlighter-rouge">taxi_trip_records_view</code> åä¸¤å¼ ç»´åº¦è¡¨ <code class="highlighter-rouge">newyork_zone</code>ã<code class="highlighter-rouge">lookup_calendar</code>ã</p> + +<p>å ¶ä¸ï¼<code class="highlighter-rouge">taxi_trip_records_view</code> ä¸ç䏿¡è®°å½å¯¹ä¸æ¬¡åºç§è½¦åºè¡ï¼è®°å½äºåºåå°ç¹ IDãå°è¾¾å°ç¹ IDãåºè¡æ¶é¿ã订åéé¢ãåºè¡è·ç¦»çï¼<code class="highlighter-rouge">newyork_zone</code> è®°å½äºå°ç¹ ID æå¯¹åºçè¡æ¿åºçä¿¡æ¯ï¼<code class="highlighter-rouge">taxi_trip_records_view</code> åå«éè¿ <code class="highlighter-rouge">PULocationID</code> å <code class="highlighter-rouge">DOLocationID</code> 两个åä¸ <code class="highlighter-rouge">newyork_zone</code> 建ç«å ³èå ³ç³»ï¼ç»è®¡åºåè¡åº� �å°è¾¾è¡åºä¿¡æ¯ï¼<code class="highlighter-rouge">lookup_calendar</code> ä¸ <code class="highlighter-rouge">COVID-19</code> æ°æ®éä¸ç维度表为åä¸å¼ 表ï¼<code class="highlighter-rouge">taxi_trip_records_view</code> ä¸ <code class="highlighter-rouge">lookup_calendar</code> éè¿æ¥æè¿è¡å ³èã</p> + +<p>纽约å¸åºç§è½¦è®¢åæ°æ®éç¸å ³ä¿¡æ¯å¦ä¸ï¼</p> + +<table> + <tbody> + <tr> + <td>æ°æ®å¤§å°</td> + <td>19 G</td> + </tr> + <tr> + <td>äºå®è¡¨æ°æ®è¡æ°</td> + <td>226,849,274</td> + </tr> + <tr> + <td>æ°æ®æ¥æ</td> + <td>2018-01-01~2021-07-31</td> + </tr> + <tr> + <td>æ°æ®éæä¾æ¹ä¸è½½å°å</td> + <td>https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page</td> + </tr> + <tr> + <td>æ°æ®é S3 å°å</td> + <td>s3://public.kyligence.io/kylin/kylin_demo/data/trip_data_2018-2021/</td> + </tr> + </tbody> +</table> + +<h4 id="er-">ER å ³ç³»å¾</h4> + +<p>æ°å ç«æ æ°æ®éå纽约å¸åºç§è½¦è®¢åæ°æ®éç ER å ³ç³»å¾å¦ä¸å¾æç¤ºï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/1_table_ER.png" alt="" /></p> + +<h3 id="section-5">ææ è®¾è®¡</h3> + +<p>é对éè¦åæçä¸å¡åºæ¯åä¸å¡é®é¢ï¼æä»¬è®¾è®¡äºä»¥ä¸ååææ åä¸å¡ææ ï¼</p> + +<h6 id="section-6">1.ååææ </h6> + +<p>ååææ æçæ¯å¨ Kylin Cube ä¸å建çåç§åº¦éï¼å®ä»¬é常æ¯å¨åä¸åä¸é¢è¿è¡èå计ç®ï¼ç¸å¯¹æ¯è¾ç®åã</p> + +<ul> + <li>Covid19 ç 便° sum(covid_19_activity.people_positive_cases_count)</li> + <li>Covid19 ç æ»æ° sum(covid_19_activity. people_death_count)</li> + <li>æ°å¢ Covid19 ç 便° sum(covid_19_activity. people_positive_new_cases_count)</li> + <li>æ°å¢ Covid19 ç æ»æ° sum(covid_19_activity. people_death_new_count)</li> + <li>åºç§è½¦åºè¡éç¨ sum(taxi_trip_records_view. trip_distance)</li> + <li>åºç§è½¦è®¢å交æé¢ sum(taxi_trip_records_view. total_amount)</li> + <li>åºç§è½¦åºè¡æ°é count()</li> + <li>åºç§è½¦åºè¡æ¶é¿ sum(taxi_trip_records_view.trip_time_hour)</li> +</ul> + +<h6 id="section-7">2.ä¸å¡ææ </h6> + +<p>ä¸å¡ææ æ¯æåºäºååææ å®ä¹çåç§å¤åè¿ç®ï¼å ·æå ·ä½çä¸å¡å«ä¹ã</p> + +<ul> + <li>åååææ çæç´¯è®¡MTDã年累计YTD</li> + <li>åååææ çæå¢éMOMãå¹´å¢éYOY</li> + <li>Covid19 ç æ»çï¼æ»äº¡äººæ°/ç¡®è¯äººæ°</li> + <li>åºç§è½¦å¹³ååºè¡é度ï¼åºç§è½¦åºè¡éç¨/åºç§è½¦åºè¡æ¶é´</li> + <li>åºç§è½¦åºè¡å¹³åéç¨ï¼åºç§è½¦åºè¡éç¨/åºç§è½¦åºè¡æ°é</li> +</ul> + +<h2 id="section-8">æä½æ¥éª¤æ¦è§</h2> + +<p>æå»ºåºäº Apache Kylin çäºä¸æ°æ®åæå¹³å°å¹¶è¿è¡æ°æ®åæçä¸»è¦æä½æ¥éª¤å¦ä¸å¾ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/2_step_overview.jpg" alt="" /></p> + +<h2 id="section-9">éç¾¤æ¶æ</h2> + +<p>使ç¨äºä¸é¨ç½²å·¥å ·é¨ç½²åºç Kylin éç¾¤æ¶æå¦å¾æç¤ºï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/3_kylin_cluster.jpg" alt="" /></p> + +<h2 id="kylin-on-cloud-">Kylin on Cloud é¨ç½²</h2> + +<h3 id="section-10">ç¯å¢è¦æ±</h3> + +<ul> + <li>éè¦æ¬å°æºå¨å·²å®è£ gitï¼ç¨äºä¸è½½é¨ç½²å·¥å ·ä»£ç ï¼</li> + <li>éè¦æ¬å°æºå¨å·²å®è£ Python 3.6.6 å以ä¸çæ¬ï¼ç¨äºè¿è¡é¨ç½²å·¥å ·ã</li> +</ul> + +<h3 id="aws-">AWS æéæ£æ¥ä¸åå§å</h3> + +<p>ç»å½ AWS è´¦å·ï¼æ ¹æ® <a href="https://github.com/apache/kylin/blob/kylin4_on_cloud/readme/prerequisites.md">åå¤ææ¡£</a> æ¥æ£æ¥ç¨æ·æéãå建é¨ç½²å·¥å ·éè¦ç Access KeyãIAM RoleãKey Pair å S3 å·¥ä½ç®å½ãåç»ç AWS æä½é½ä¼ä»¥è¿ä¸ªå¸å·ç身份æ§è¡ã</p> + +<h3 id="section-11">é ç½®é¨ç½²å·¥å ·</h3> + +<p>1.æ§è¡ä¸é¢çå½ä»¤è·å¾ Kylin on AWS é¨ç½²å·¥å ·ç代ç </p> + +<div class="highlighter-rouge"><pre class="highlight"><code>git clone -b kylin4_on_cloud --single-branch https://github.com/apache/kylin.git <span class="o">&amp;&amp;</span> <span class="nb">cd </span>kylin +</code></pre> +</div> + +<p>2.卿¬å°æºå¨åå§å python èæç¯å¢</p> + +<p>æ£æ¥ python ç¯å¢ï¼éè¦ Python 3.6.6 以ä¸ï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>python --version +</code></pre> +</div> + +<p>åå§å python èæç¯å¢ï¼å®è£ ä¾èµï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>bin/init.sh +<span class="nb">source </span>venv/bin/activate +</code></pre> +</div> + +<p>3.ä¿®æ¹é ç½®æä»¶ <code class="highlighter-rouge">kylin_configs.yaml</code></p> + +<p>æå¼é¨ç½²å·¥å ·ä»£ç ä¸ç kylin_configs.yamlï¼å°æä»¶ä¸çé ç½®é¡¹æ¿æ¢ä¸ºå®é å¼ï¼</p> + +<ul> + <li><code class="highlighter-rouge">AWS_REGION</code>: EC2 èç¹ä½ç½® Regionï¼é»è®¤ä¸º cn-northwest-1</li> + <li><code class="highlighter-rouge">${IAM_ROLE_NAME}</code>: æåå建ç IAM Role åç§°ï¼æ¯å¦ kylin_deploy_role</li> + <li><code class="highlighter-rouge">${S3_URI}</code>: ç¨äºé¨ç½² kylin ç S3 å·¥ä½ç®å½ï¼æ¯å¦ s3://kylindemo/kylin_demo_dir/</li> + <li><code class="highlighter-rouge">${KEY_PAIR}</code>: æåå建ç Key pairs ååï¼æ¯å¦ kylin_deploy_key</li> + <li><code class="highlighter-rouge">${Cidr Ip}</code>: å è®¸è®¿é® EC2 å®ä¾ç IP å°åèå´ï¼æ¯å¦ 10.1.0.0/32ï¼é常设为æ¨çå¤ç½ IP å°åï¼ç¡®ä¿å建ç EC2 å®ä¾åªææ¨è½è®¿é®</li> +</ul> + +<p>åºäºè¯»åå离é离æå»ºåæ¥è¯¢èµæºçèèï¼å¨ä»¥ä¸çæ¥éª¤ä¸ä¼å å¯å¨ä¸ä¸ªæå»ºé群ç¨äºè¿æ¥ Glue 建表ãå è½½æ°æ®æºãæäº¤æå»ºä»»å¡è¿è¡é¢è®¡ç®ï¼ç¶å鿝æå»ºé群ï¼ä¿çå æ°æ®ï¼å¯å¨å¸¦æ MDX for Kylin çæ¥è¯¢é群ï¼ç¨äºå建ä¸å¡ææ ãè¿æ¥ BI å·¥å ·æ§è¡æ¥è¯¢ï¼è¿è¡æ°æ®åæãKylin on AWS éç¾¤ä½¿ç¨ RDS åå¨å æ°æ®ï¼ä½¿ç¨ S3 åå¨æå»ºåçæ°æ®ï¼å¹¶ä¸æ¯æä» AWS Glue ä¸å è½½æ°æ®æºï¼é¤äº EC2 èç¹ä¹å¤ä½¿ç¨çèµæ ºé½æ¯æä¹ åçï¼ä¸ä¼éçèç¹çå é¤èæ¶å¤±ï¼æä»¥å¨æ²¡ææ¥è¯¢æè æå»ºä»»å¡æ¶ï¼ç¨æ·å¯ä»¥éæ¶éæ¯æå»ºææ¥è¯¢é群ï¼åªè¦ä¿çå æ°æ®ãS3 å·¥ä½ç®å½å³å¯ã</p> + +<h3 id="kylin-">Kylin æå»ºé群</h3> + +<h4 id="kylin--1">å¯å¨ Kylin æå»ºé群</h4> + +<p>1.éè¿å¦ä¸å½ä»¤å¯å¨æå»ºéç¾¤ãæ ¹æ®ç½ç»æ åµä¸åï¼é¨ç½²å¯å¨å¯è½éè¦ 15-30 åéã</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>python deploy.py --type deploy --mode job +</code></pre> +</div> + +<p>2.æå»ºé群é¨ç½²æååï¼å½ä»¤çªå£å¯ä»¥çå°å¦ä¸è¾åºï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/4_deploy_cluster_successfully.png" alt="" /></p> + +<h4 id="aws--1">æ£æ¥ AWS æå¡</h4> + +<p>1.è¿å ¥ AWS æ§å¶å°ç CloudFormation çé¢ï¼å¯ä»¥çå° Kylin é¨ç½²å·¥å ·ä¸å ±èµ·äº 7 个 stackï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/5_check_aws_stacks.png" alt="" /></p> + +<p>2.ç¨æ·å¯ä»¥éè¿ AWS æ§å¶å°æ¥ç EC2 èç¹ç详ç»ä¿¡æ¯ï¼ä¹å¯ä»¥å¨å½ä»¤è¡çé¢ä½¿ç¨å¦ä¸å½ä»¤ååºææ EC2 èç¹çååãç§æ IP åå ¬æ IPï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>python deploy.py --type list +</code></pre> +</div> + +<p><img src="/images/blog/kylin4_on_cloud/6_list_cluster_node.png" alt="" /></p> + +<h4 id="spark-sql-">ä½éª spark-sql åçæ¥è¯¢é度</h4> + +<p>为äºç´è§çæåå°é¢è®¡ç®ç»æ¥è¯¢æ§è½å¸¦æ¥çæåï¼å¨æå»º cube ä¹åï¼æä»¬å å¨ spark-sql ä¸ä½éªåççæ¥è¯¢é度ï¼</p> + +<p>1.é¦å ï¼æä»¬éè¿ kylin èç¹çå ¬æ IP ç»å½å°è¯¥ kylin æå¨ç EC2 æºå¨ï¼å¹¶åæ¢å° root ç¨æ·ï¼æ§è¡ ~/.bash_profile 使æå设置çç¯å¢åéçæï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>ssh -i <span class="s2">"</span><span class="k">${</span><span class="nv">KEY_PAIR</span><span class="k">}</span><span class="s2">"</span> ec2-user@<span class="k">${</span><span class="nv">kylin_node_public_ip</span><span class="k">}</span> +sudo su +<span class="nb">source</span> ~/.bash_profile +</code></pre> +</div> + +<p>2.ç¶åè¿å ¥ <code class="highlighter-rouge">$SPARK_HOME</code> å¹¶ä¿®æ¹é ç½®æä»¶ <code class="highlighter-rouge">conf/spark-defaults.conf</code>ï¼å° <code class="highlighter-rouge">spark_master_node_private_ip</code> ä¿®æ¹ä¸º spark master èç¹çç§æ IPï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$SPARK_HOME</span> +vim conf/spark-defaults.conf + +<span class="c"># å° spark_master_node_private_ip æ¿æ¢ä¸ºçå® spark master èç¹çç§æip</span> +spark.master spark://spark_master_node_private_ip:7077 +</code></pre> +</div> + +<p><code class="highlighter-rouge">spark-defaults.conf</code> ä¸å ³äº driver å executor çèµæºé ç½®ä¸ kylin æ¥è¯¢é群çèµæºé ç½®æ¯ä¸è´çã</p> + +<p>3.å¨ spark-sql ä¸å»ºè¡¨</p> + +<p>æµè¯æç¨æ°æ®éçæææ°æ®åæ¾å¨ä½äº <code class="highlighter-rouge">cn-north-1</code> å <code class="highlighter-rouge">us-east-1</code> å°åºç S3 bucket ä¸ï¼å¦æä½ ç S3 bucket ä½äº <code class="highlighter-rouge">cn-north-1</code> æè <code class="highlighter-rouge">us-east-1</code>ï¼é£ä¹ä½ å¯ä»¥ç´æ¥æ§è¡å»ºè¡¨ sqlï¼å¦åéè¦æ§è¡ä»¥ä¸èæ¬å¤å¶æ°æ®å° <code class="highlighter-rouge">kylin_configs.yaml</code> ä¸è®¾ç½®ç S3 å·¥ä½ç®å½ä¸ï¼å¹¶ä¿®æ¹å»ºè¡¨ sqlï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="c">## AWS CN ç¨æ·</span> +aws s3 sync s3://public.kyligence.io/kylin/kylin_demo/data/ <span class="k">${</span><span class="nv">S3_DATA_DIR</span><span class="k">}</span> --region cn-north-1 + +<span class="c">## AWS Global ç¨æ·</span> +aws s3 sync s3://public.kyligence.io/kylin/kylin_demo/data/ <span class="k">${</span><span class="nv">S3_DATA_DIR</span><span class="k">}</span> --region us-east-1 + +<span class="c"># ä¿®æ¹å»ºè¡¨ sql</span> +sed -i <span class="s2">"s#s3://public.kyligence.io/kylin/kylin_demo/data/#</span><span class="k">${</span><span class="nv">S3_DATA_DIR</span><span class="k">}</span><span class="s2">#g"</span> /home/ec2-user/kylin_demo/create_kylin_demo_table.sql +</code></pre> +</div> + +<p>æ§è¡å»ºè¡¨ sqlï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>bin/spark-sql -f /home/ec2-user/kylin_demo/create_kylin_demo_table.sql +</code></pre> +</div> + +<p>4.å¨ spark-sql 䏿§è¡æ¥è¯¢</p> + +<p>è¿å ¥ spark-sqlï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>bin/spark-sql +</code></pre> +</div> + +<p>å¨ spark-sql 䏿§è¡æ¥è¯¢ï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="n">use</span> <span class="n">kylin_demo</span><span class="p">;</span> +<span class="k">select</span> <span class="n">TAXI_TRIP_RECORDS_VIEW</span><span class="p">.</span><span class="n">PICKUP_DATE</span><span class="p">,</span> <span class="n">NEWYORK_ZONE</span><span class="p">.</span><span class="n">BOROUGH</span><span class="p">,</span> <span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">),</span> <span class="k">sum</span><span class="p">(</span><span class="n">TAXI_TRIP_RECORDS_VIEW</span><span class="p">.</span><span class="n">TRIP_TIME_HOUR</span><span class="p">),</span> <span class="k& quot;>sum</span><span class="p">(</span><span class="n">TAXI_TRIP_RECORDS_VIEW</span><span class="p">.</span><span class="n">TOTAL_AMOUNT</span><span class="p">)</span> +<span class="k">from</span> <span class="n">TAXI_TRIP_RECORDS_VIEW</span> +<span class="k">left</span> <span class="k">join</span> <span class="n">NEWYORK_ZONE</span> +<span class="k">on</span> <span class="n">TAXI_TRIP_RECORDS_VIEW</span><span class="p">.</span><span class="n">PULOCATIONID</span> <span class="o">=</span> <span class="n">NEWYORK_ZONE</span><span class="p">.</span><span class="n">LOCATIONID</span> +<span class="k">group</span> <span class="k">by</span> <span class="n">TAXI_TRIP_RECORDS_VIEW</span><span class="p">.</span><span class="n">PICKUP_DATE</span><span class="p">,</span> <span class="n">NEWYORK_ZONE</span><span class="p">.</span><span class="n">BOROUGH</span><span class="p">;</span> +</code></pre> +</div> + +<p>ç¶åå¯ä»¥çå°ï¼å¨èµæºä¸ kylin æ¥è¯¢é群é ç½®ç¸åçæ åµä¸ï¼ä½¿ç¨ spark-sql ç´æ¥æ¥è¯¢èæ¶è¶ è¿100sï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/7_query_in_spark_sql.png" alt="" /></p> + +<p>5.æ¥è¯¢æ§è¡æååå¿ é¡»éåº spark-sql åè¿è¡ä¸é¢çæ¥éª¤ï¼é²æ¢å ç¨èµæºã</p> + +<h4 id="kylin--2">å¯¼å ¥ Kylin å æ°æ®</h4> + +<p>1.è¿å ¥ <code class="highlighter-rouge">$KYLIN_HOME</code></p> + +<div class="highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> <span class="nv">$KYLIN_HOME</span> +</code></pre> +</div> + +<p>2.å¯¼å ¥å æ°æ®</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>bin/metastore.sh restore /home/ec2-user/meta_backups/ +</code></pre> +</div> + +<p>3.éè½½å æ°æ®</p> + +<p>æ ¹æ® EC2 èç¹çå ¬æ IPï¼å¨æµè§å¨è¾å ¥ <code class="highlighter-rouge">http://${kylin_node_public_ip}:7070/kylin</code> è¿å ¥ kylin web 页é¢ï¼å¹¶ä½¿ç¨ ADMIN/KYLIN çé»è®¤ç¨æ·åå¯ç ç»å½ï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/8_kylin_web_ui.png" alt="" /></p> + +<p>éè¿ System -&gt; Configuration -&gt; Reload Metadata éè½½ Kylin å æ°æ®:</p> + +<p><img src="/images/blog/kylin4_on_cloud/9_reload_kylin_metadata.png" alt="" /></p> + +<p>å¦æç¨æ·æ³è¦äºè§£å¦ä½æå¨å建 Kylin å æ°æ®ä¸æå å«ç Model å Cubeï¼å¯ä»¥åèï¼(Create model and cube in kylin)[https://cwiki.apache.org/confluence/display/KYLIN/Create+Model+and+Cube+in+Kylin]ã</p> + +<h4 id="section-12">æ§è¡æå»º</h4> + +<p>æäº¤ cube æå»ºä»»å¡ï¼ç±äºå¨ model 䏿ªè®¾ç½®ååºåï¼æä»¥è¿éç´æ¥å¯¹ä¸¤ä¸ª cube è¿è¡å ¨éæå»ºï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/10_full_build_cube.png.png" alt="" /></p> + +<p><img src="/images/blog/kylin4_on_cloud/11_kylin_job_complete.png" alt="" /></p> + +<h4 id="section-13">鿝æå»ºé群</h4> + +<p>æå»ºå®æä¹åï¼æ§è¡éç¾¤éæ¯å½ä»¤éæ¯æå»ºé群ï¼é»è®¤æ åµä¸ä¼ä¿ç RDS stackãmonitor stack å vpc stackï¼</p> + +<div class="highlighter-rouge"><pre class="highlight"><code>python deploy.py --type destroy +</code></pre> +</div> + +<p>éç¾¤éæ¯æåï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/12_destroy_job_cluster.png" alt="" /></p> + +<h4 id="aws--2">æ£æ¥ AWS èµæº</h4> + +<p>éç¾¤éæ¯æååï¼å¯ä»¥å° AWS æ§å¶å°ç <code class="highlighter-rouge">CloudFormation</code> æå¡ç¡®è®¤æ¯å¦åå¨èµæºæ®çï¼ç±äºé»è®¤ä¼ä¿çå æ°æ® RDSãçæ§èç¹å VPC èç¹ï¼æä»¥éç¾¤éæ¯å CloudFormation 页é¢è¿ä¼åå¨ä»¥ä¸ä¸ä¸ª Stackï¼</p> + +<p><img src="/images/blog/kylin4_on_cloud/13_check_aws_stacks.png" alt="" /></p> + +<p>ä¸é¢å¯å¨æ¥è¯¢é群æ¶ä»ç¶ä¼ä½¿ç¨è¿ä¸ä¸ª Stack ä¸çèµæºï¼è¿æ ·æä»¬å¯ä»¥ä¿è¯æ¥è¯¢é群åæå»ºé群使ç¨åä¸å¥å æ°æ®ã</p> + +<p>以ä¸é¨å为 <code class="highlighter-rouge">Kylin on Cloud ââ 䏤尿¶å¿«éæå»ºäºä¸æ°æ®åæå¹³å°</code> çä¸ç¯ï¼ä¸ç¯è¯·æ¥çï¼<a href="../kylin4-on-cloud-part2/">Kylin on Cloud ââ 䏤尿¶å¿«éæå»ºäºä¸æ°æ®åæå¹³å°(ä¸)</a></p> + +</description> + <pubDate>Wed, 20 Apr 2022 04:00:00 -0700</pubDate> + <link>http://kylin.apache.org/cn_blog/2022/04/20/kylin4-on-cloud-part1/</link> + <guid isPermaLink="true">http://kylin.apache.org/cn_blog/2022/04/20/kylin4-on-cloud-part1/</guid> + + + <category>cn_blog</category> + + </item> + + <item> <title>å¦ä½ä½¿ç¨ Excel æ¥è¯¢ Kylinï¼MDX for Kylinï¼</title> <description><h2 id="kylin--mdx">Kylin 为ä»ä¹éè¦ MDXï¼</h2> @@ -864,46 +1501,46 @@ CELL PROPERTIES VALUE, FORMAT_STRING, LA </item> <item> - <title>宿ï¼Kylin 4 ç°å·²æ¯æ AWS Glue Catalog</title> - <description><h2 id="emr--kylin--glue-">为ä»ä¹å¨ EMR é¨ç½² Kylin éè¦æ¯æ Glue ï¼</h2> + <title>Kylin 4 now is supporting AWS Glue Catalog</title> + <description><h2 id="why-does-installing-kylin-on-emr-need-to-support-aws-glue">Why does installing Kylin on EMR need to support AWS Glue?</h2> -<h3 id="aws-glue">ä»ä¹æ¯ AWS Glueï¼</h3> +<h3 id="what-is-aws-glue">What is AWS Glue?</h3> -<p>AWS Glue æ¯ä¸é¡¹å®å ¨æç®¡ç ETLï¼æåã转æ¢åå è½½ï¼æå¡ï¼ä½¿ AWS ç¨æ·è½å¤è½»æ¾èç»æµé«æå°å¯¹æ°æ®è¿è¡åç±»ãæ¸ çåæ©å ï¼å¹¶å¨åç§æ°æ®åå¨ä¹é´å¯é å°ç§»å¨æ°æ®ãAWS Glue ç±ä¸ä¸ªç§°ä¸º AWS Glue æ°æ®ç®å½çä¸å¤®å æ°æ®åå¨åºãä¸ä¸ªèªå¨çæä»£ç ç ETL 弿以åä¸ä¸ªå¤çä¾èµé¡¹è§£æãä½ä¸çæ§åéè¯ççµæ´»è®¡åç¨åºç»æãAWS Glue æ¯æ æå¡å¨æå¡ï¼å æ¤æ é设置æç®¡çåºç¡è®¾æ½ã</p> +<p>AWS Glue is a fully hosted ETL (Extract, Transform, and Load) service that enables AWS users to easily and cost-effectively classify, cleanse, enrich data and move data between various data storages. AWS Glue consists of a central metastore called AWS Glue Data Catalog, an ETL engine that can automatically generate code and a flexible scheduler that can handle dependency resolution, monitor jobs and retry. AWS Glue is a serverless service, so there is no infrastructure to set up or manage.</p> -<h3 id="kylin--aws-glue-catalog">Kylin 为ä»ä¹éè¦æ¯æ AWS Glue Catalogï¼</h3> +<h3 id="why-does-kylin-need-aws-glue-catalog">Why does Kylin need AWS Glue Catalog?</h3> -<p>ç®åç¤¾åºæå¾å¤ Kylin ç¨æ·å¨ä½¿ç¨ AWS EMRï¼ç»ä»¶ä¸»è¦å æ¬ HadoopãSparkãHiveãPresto çï¼å¦ææ²¡æé ç½®ä½¿ç¨ AWS Glue data Catalogï¼é£ä¹å¨åä¸ªæ°æ®ä»åºç»ä»¶å¦ HiveãSparkãPresto å»ºçæ°æ®è¡¨ï¼å¨å ¶å®ç»ä»¶ä¸æ¯æ¾ä¸å°çï¼ä¹å°±ä¸è½ä½¿ç¨ï¼å ¬å¸åºå±çæ°æ®ä»åºæ¯æä¾ç»å个ä¸å¡é¨é¨æ¥è¿è¡ä½¿ç¨ï¼ä¸ºäºè§£å³è¿ä¸ªé®é¢ï¼å¨å建 AWS EMR é群æ¶å°±å¯ä»¥ä½¿ç¨ AWS Glue data Catalog æ¥åå¨å æ°æ®ï¼å¯¹å个ç»ä»¶å ±äº«æ°æ®æºï¼å¯¹å个ä¸å¡é¨é¨è¿è¡å ±äº«æ� �°æ®æºï¼å°å个ä¸å¡é¨é¨çæ°æ®æå»ºæä¸ä¸ªå¤§çæ°æ®ç«æ¹ä½ï¼è½å¤å¿«éååºå ¬å¸é«éåå±çä¸å¡éæ±ã<br /> -ç°ä»£å ¬å¸çæ°æ®é½æ¯åºäºäºå¹³å°æå»ºï¼å¤§æ°æ®å¢é使ç¨ç AWS EMR æ¥è¿è¡æ°æ®å å·¥ãæ°æ®åæã以忍¡åè®ç»ï¼éçæ°æ®æ´å¢å¸¦æ¥ææ°æ ¢ãææ°é¾ï¼EMR/Spark/Hive å¾é¾æ»¡è¶³æ°æ®åæå¸ãè¿è¥äººåãéå®çå¿«éæ¥è¯¢æ°æ®çéæ±ï¼äºæ¯ä¸äºç¨æ·éæ©äº Apache Kylin ä½ä¸ºå¼æº OLAP è§£å³æ¹æ¡ã<br /> -使¯æè¿ç¤¾åºç¨æ·èç³»å°æä»¬ï¼åç¥ Kylin 4 è¿ä¸æ¯æä» Glue 读åè¡¨å æ°æ®ï¼æä»¥æä»¬å社åºç¨æ·åä½ä¸èµ·æ£æ¥è¿ééå°çé®é¢å¹¶æç»è§£å³äºé®é¢ï¼ä»èä½¿å¾ Kylin 4 æ¯æäº AWS Glue Catalogï¼è¿æ ·å¸¦æ¥ç好å¤å¨äº HiveãPrestoãSparkãKylin ä¸å¯ä»¥å ±äº«è¡¨åæ°æ®ï¼ä½¿å¾æ¯ä¸ªä¸»é¢é½ä¸²èèµ·æ¥å½¢æä¸ä¸ªå¤§çæ°æ®åæå¹³å°ï¼æç ´å æ°æ®éç¢ã</p> +<p>At present, many users in the Kylin community use AWS EMR for running large-scale distributed data processing jobs on Hadoop, Spark, Hive, Presto, etc. Without AWS Glue Data Catalog, tables built on these data warehouse components (like Hive, Spark and Presto) can not be used by any other components. As the data warehouse needs to answer requirements from various business departments, they use AWS Glue Data Catalog for metadata storage when creating the AWS EMR clusters, to share the data sources among different components and business departments. That is, to build one data cube with data from each business department, so they can provide quick responses to different business requirements.<br /> +In modern companies, data is saved on cloud object storage and big data teams use AWS EMR for data processing, data analysis and model training. But with data explosion, it becomes really difficult to extract data and the response time is too long. In other words, the solution of EMR + Spark/Hive cannot meet the speedy data query requirements from data analysts, O&amp;M personnel and sales. So some users turn to Apache Kylin as their open-source OLAP solution.<br /> +Recently, our users approached us with the request that Kylin 4 could directly read table metadata from AWS Glue. After some collaboration, now Kylin 4 supports AWS Glue Catalog, making it possible for tables and data to be shared among Hive, Presto, Spark and Kylin. This helps to break down the metadata barrier, so different topics can be combined to form a big data analysis platform.</p> -<h3 id="apache-kylin--aws-glue-">Apache Kylin æ¯æ AWS Glue åï¼</h3> +<h3 id="does-kylin-support-aws-glue">Does Kylin support AWS Glue?</h3> <table> <thead> <tr> <th> </th> - <th>æ¯æ Glue ç Kylin çæ¬</th> + <th>Kylin version which supports Glue</th> <th>Issue Link</th> </tr> </thead> <tbody> <tr> <td>Kylin on HBase (Before Kylin 4)</td> - <td>2.6.6 or higher<br /> 3.1.0 or higher</td> + <td>2.6.6 or higher<br />3.1.0 or higher</td> <td>https://issues.apache.org/jira/browse/KYLIN-4206<br />https://zhuanlan.zhihu.com/p/99481373</td> </tr> <tr> <td>Kylin on Parquet</td> <td>4.0.1 or higher</td> - <td>æ¬æã</td> + <td>This article.</td> </tr> </tbody> </table> -<h2 id="section">é¨ç½²ååå¤</h2> +<h2 id="prerequisites-for-deployment">Prerequisites for deployment</h2> -<h3 id="section-1">软件信æ¯ä¸è§</h3> +<h3 id="software-version">Software Version</h3> <table> <thead> @@ -917,27 +1554,27 @@ CELL PROPERTIES VALUE, FORMAT_STRING, LA <tr> <td>Apache Kylin</td> <td>4.0.1 or higher</td> - <td>å¿ é¡»æ¯ 4.0.1 以åä¸ï¼è¯¦æ åè <a href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP 10 refactor hive and hadoop dependency</a>.</td> + <td><a href="https://cwiki.apache.org/confluence/display/KYLIN/KIP+10+refactor+hive+and+hadoop+dependency">KIP 10 refactor hive and hadoop dependency</a>.</td> </tr> <tr> <td>AWS EMR</td> <td>6.5.0 or higher<br />5.33.1 or higher</td> - <td>è¦çEMR 6 / EMR 5 çè¾æ°çæ¬ï¼<a href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon EMR release 6.5.0 - Amazon EMR</a>.</td> + <td><a href="https://docs.amazonaws.cn/en_us/emr/latest/ReleaseGuide/emr-650-release.html">Amazon EMR release 6.5.0 - Amazon EMR</a>.</td> </tr> </tbody> </table> -<h3 id="glue-">åå¤ Glue æ°æ®åºå表</h3> +<h3 id="prepare-aws-glue-database-and-tables">Prepare AWS Glue database and tables</h3> <p><img src="/images/blog/kylin4_support_aws_glue/1_prepare_aws_glue_table_en.png" alt="" /></p> <p><img src="/images/blog/kylin4_support_aws_glue/2_prepare_aws_glue_table_en.png" alt="" /></p> <ul> - <li>å建 AWS EMR é群ã</li> + <li>Create an EMR cluster.</li> </ul> -<p>è¿éå¯å¨ä¸ä¸ª EMR çé群ï¼éè¦æ³¨æçæ¯ï¼è¿ééè¿é ç½® <code class="highlighter-rouge">hive.metastore.client.factory.class</code> å¯å¨äº Glue å¤é¨å æ°æ®ã以ä¸å½ä»¤å¯ä»¥ä½ä¸ºåèã</p> +<p>Note: Parameter hive.metastore.client.factory.class is configured to enable AWS Glue. For details, you may refer to the commands below.</p> <div class="highlighter-rouge"><pre class="highlight"><code>aws emr create-cluster --applications <span class="nv">Name</span><span class="o">=</span>Hadoop <span class="nv">Name</span><span class="o">=</span>Hive <span class="nv">Name</span><span class="o">=</span>Spark <span class="nv">Name</span><span class="o">=</span>ZooKeeper <span class="nv">Name</span><span class="o">=</span>Tez <span class="nv">Name</span><span class="o">=</span>Ganglia <span class="se">\</span> --ec2-attributes <span class="k">${}</span> <span class="se">\</span> @@ -955,35 +1592,35 @@ CELL PROPERTIES VALUE, FORMAT_STRING, LA </div> <ul> - <li>ç»å½ Master èç¹ï¼å¹¶ä¸æ£æ¥ Hadoop çæ¬ å Hadoop é群æ¯å¦å¯å¨æåã</li> + <li>Log in to the Master node. Check the Hadoop version and whether the Hadoop cluster is successfully started.</li> </ul> <p><img src="/images/blog/kylin4_support_aws_glue/3_prepare_hadoop_cluster_en.png" alt="" /></p> <p><img src="/images/blog/kylin4_support_aws_glue/4_prepare_hadoop_cluster_en.png" alt="" /></p> -<h3 id="optional">è·åç¯å¢ä¿¡æ¯ï¼Optionalï¼</h3> +<h3 id="optionalget-environmental-information">(Optional)Get environmental information</h3> <blockquote> - <p>å¦æä½ ä½¿ç¨ RDS æè å ¶ä»å æ°æ®åå¨ï¼è¯·é æ è·³è¿æ¤æ¥ã</p> + <p>If you are using RDS or other metadata storage, you may skip this step.</p> </blockquote> -<p>ç±äº Kylin 4.X æ¨èä½¿ç¨ RDBMS ä½ä¸ºå æ°æ®åå¨ï¼å¤äºæµè¯ç®çï¼è¿éä½¿ç¨ Master èç¹èªå¸¦ç MariaDB ä½ä¸ºå æ°æ®åå¨ï¼å ³äº MariaDB ç主æºåç§°ãè´¦å·ãå¯ç çä¿¡æ¯ï¼å¯ä»¥ä» <code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code> è·åã</p> +<p>RDBMS is recommended for metastore in Kylin 4. So for testing purposes, in this article, we use MariaDB which comes with the Master node for metastore; for hostname, account and password of MariaDB, see <code class="highlighter-rouge">/etc/hive/conf/hive-site.xml</code>.</p> <div class="highlighter-rouge"><pre class="highlight"><code>kylin.metadata.url<span class="o">=</span>kylin4_on_cloud@jdbc,url<span class="o">=</span>jdbc:mysql://<span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span>:3306/hue,username<span class="o">=</span>hive,password<span class="o">=</span><span class="k">${</span><span class="nv">PASSWORD</span><span class="k">}</span>,maxActive<span class="o">=</span>10,maxIdle<span class="o">=</span>10,driverClassName<span class="o">=</span>org.mariadb.jdbc.Driver kylin.env.zookeeper-connect-string<span class="o">=</span><span class="k">${</span><span class="nv">HOSTNAME</span><span class="k">}</span> </code></pre> </div> -<p>è·åè¿äºä¿¡æ¯åï¼å¹¶ä¸æ¿æ¢ä»¥ä¸ Kylin é 置项éé¢çåéï¼å¦ <code class="highlighter-rouge">${PASSWORD}</code>ï¼ä¿åå°æ¬å°ï¼ä¾ä¸ä¸æ¥å¯å¨ Kylin è¿ç¨ä½¿ç¨ã</p> +<p>Configure the variables as per the actual information, for example, replace ${PASSWORD} with the real password, save it locally and it will be used to start Kylin.</p> -<h3 id="spark-sql--aws-glue-">æµè¯ Spark SQL å AWS Glue çè¿éæ§</h3> +<h3 id="test-the-connectivity-between-spark-sql-and-aws-glue">Test the connectivity between Spark SQL and AWS Glue</h3>
[... 987 lines stripped ...]