added Chinese version of howto_optimize_build Signed-off-by: Billy Liu <billy...@apache.org>
Project: http://git-wip-us.apache.org/repos/asf/kylin/repo Commit: http://git-wip-us.apache.org/repos/asf/kylin/commit/96854ef8 Tree: http://git-wip-us.apache.org/repos/asf/kylin/tree/96854ef8 Diff: http://git-wip-us.apache.org/repos/asf/kylin/diff/96854ef8 Branch: refs/heads/document Commit: 96854ef87d2cbdae14aa1aba0e1b7c44020f2a42 Parents: 2a282ee Author: link3280 <491325...@qq.com> Authored: Sat Oct 21 12:53:35 2017 +0800 Committer: Billy Liu <billy...@apache.org> Committed: Fri Jan 26 13:09:53 2018 +0800 ---------------------------------------------------------------------- .../_docs21/howto/howto_optimize_build.cn.md | 166 +++++++++++++++++++ 1 file changed, 166 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kylin/blob/96854ef8/website/_docs21/howto/howto_optimize_build.cn.md ---------------------------------------------------------------------- diff --git a/website/_docs21/howto/howto_optimize_build.cn.md b/website/_docs21/howto/howto_optimize_build.cn.md new file mode 100644 index 0000000..d454859 --- /dev/null +++ b/website/_docs21/howto/howto_optimize_build.cn.md @@ -0,0 +1,166 @@ +--- +layout: docs21 +title: ä¼åcubeæå»º +categories: å¸®å© +permalink: /cn/docs21/howto/howto_optimize_build.html +--- + +Kylinå°Cubeæå»ºä»»å¡å解为å ä¸ªä¾æ¬¡æ§è¡çæ¥éª¤ï¼è¿äºæ¥éª¤å æ¬Hiveæä½ãMapReduceæä½åå ¶ä»ç±»åçæä½ãå¦æä½ æå¾å¤Cubeæå»ºä»»å¡éè¦æ¯å¤©è¿è¡ï¼é£ä¹ä½ è¯å®æ³è¦åå°å ¶ä¸æ¶èçæ¶é´ã䏿æç §Cubeæå»ºæ¥éª¤é¡ºåºæä¾äºä¸äºä¼åç»éªã + +## å建Hiveçä¸é´å¹³è¡¨ + +è¿ä¸æ¥å°æ°æ®ä»æºHive表æååºæ¥(åææjoinç表ä¸èµ·)å¹¶æå ¥å°ä¸ä¸ªä¸é´å¹³è¡¨ã妿Cubeæ¯ååºçï¼Kylinä¼å ä¸ä¸ä¸ªæ¶é´æ¡ä»¶ä»¥ç¡®ä¿åªæå¨æ¶é´èå´å çæ°æ®æä¼è¢«æåãä½ å¯ä»¥å¨è¿ä¸ªæ¥éª¤çlogæ¥çç¸å ³çHiveå½ä»¤ï¼æ¯å¦ï¼ + +``` +hive -e "USE default; +DROP TABLE IF EXISTS kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c034430f6c34; + +CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c034430f6c34 +(AIRLINE_FLIGHTDATE date,AIRLINE_YEAR int,AIRLINE_QUARTER int,...,AIRLINE_ARRDELAYMINUTES int) +STORED AS SEQUENCEFILE +LOCATION 'hdfs:///kylin/kylin200instance/kylin-0a8d71e8-df77-495f-b501-03c06f785b6c/kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c034430f6c34'; + +SET dfs.replication=2; +SET hive.exec.compress.output=true; +SET hive.auto.convert.join.noconditionaltask=true; +SET hive.auto.convert.join.noconditionaltask.size=100000000; +SET mapreduce.job.split.metainfo.maxsize=-1; + +INSERT OVERWRITE TABLE kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c034430f6c34 SELECT +AIRLINE.FLIGHTDATE +,AIRLINE.YEAR +,AIRLINE.QUARTER +,... +,AIRLINE.ARRDELAYMINUTES +FROM AIRLINE.AIRLINE as AIRLINE +WHERE (AIRLINE.FLIGHTDATE >= '1987-10-01' AND AIRLINE.FLIGHTDATE < '2017-01-01'); + +``` + +å¨Hiveå½ä»¤è¿è¡æ¶ï¼Kylinä¼ç¨`conf/kylin_hive_conf.properties`éçé ç½®ï¼æ¯å¦ä¿çæ´å°çåä½å¤ä»½åå¯ç¨Hiveçmapper side joinãéè¦çè¯å¯ä»¥æ ¹æ®é群çå ·ä½æ åµå¢å å ¶ä»é ç½®ã + +妿cubeçååºå(å¨è¿ä¸ªæ¡ä¾ä¸æ¯"FIGHTDATE")ä¸Hive表çååºåç¸åï¼é£ä¹æ ¹æ®å®è¿æ»¤æ°æ®è½è®©Hiveèªæå°è·³è¿ä¸å¹é çååºãå æ¤å¼ºç建议ç¨Hiveçååºåï¼å¦æå®æ¯æ¥æåï¼ä½ä¸ºcubeçååºåãè¿å¯¹äºé£äºæ°æ®éå¾å¤§ç表æ¥è¯´å 乿¯å¿ é¡»çï¼å¦åHiveä¸å¾ä¸æ¯æ¬¡å¨è¿æ¥æ«æå ¨é¨æä»¶ï¼æ¶èé常é¿çæ¶é´ã + +妿å¯ç¨äºHiveçæä»¶åå¹¶ï¼ä½ å¯ä»¥å¨`conf/kylin_hive_conf.xml`éå ³éå®ï¼å 为Kylinæèªå·±åå¹¶æä»¶çæ¹æ³(ä¸ä¸è)ï¼ + + <property> + <name>hive.merge.mapfiles</name> + <value>false</value> + <description>Disable Hive's auto merge</description> + </property> + +## éæ°ååä¸é´è¡¨ + +å¨ä¹åç䏿¥ä¹åï¼Hiveå¨HDFSä¸çç®å½éçæäºæ°æ®æä»¶ï¼æäºæ¯å¤§æä»¶ï¼æäºæ¯å°æä»¶çè³ç©ºæä»¶ãè¿ç§ä¸å¹³è¡¡çæä»¶åå¸ä¼å¯¼è´ä¹åçMRä»»å¡åºç°æ°æ®å¾æçé®é¢ï¼æäºmapper宿å¾å¾å¿«ï¼ä½å ¶ä»ç就徿 ¢ãé对è¿ä¸ªé®é¢ï¼Kylinå¢å äºè¿ä¸ä¸ªæ¥éª¤æ¥âéæ°ååâæ°æ®ï¼è¿æ¯ç¤ºä¾è¾åº: + +``` +total input rows = 159869711 +expected input rows per mapper = 1000000 +num reducers for RedistributeFlatHiveTableStep = 160 + +``` + +éæ°åå表çå½ä»¤ï¼ + +``` +hive -e "USE default; +SET dfs.replication=2; +SET hive.exec.compress.output=true; +SET hive.auto.convert.join.noconditionaltask=true; +SET hive.auto.convert.join.noconditionaltask.size=100000000; +SET mapreduce.job.split.metainfo.maxsize=-1; +set mapreduce.job.reduces=160; +set hive.merge.mapredfiles=false; + +INSERT OVERWRITE TABLE kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c034430f6c34 SELECT * FROM kylin_intermediate_airline_cube_v3610f668a3cdb437e8373c034430f6c34 DISTRIBUTE BY RAND(); +" +``` + +é¦å ï¼Kylin计ç®åºä¸é´è¡¨çè¡æ°ï¼ç¶ååºäºè¡æ°ç大å°ç®åºéæ°ååæ°æ®éè¦çæä»¶æ°ãé»è®¤æ åµä¸,Kylin为æ¯ä¸ç¾ä¸è¡åé ä¸ä¸ªæä»¶ãå¨è¿ä¸ªä¾åä¸ï¼æ1.6亿è¡å160个reducerï¼æ¯ä¸ªreducerä¼åä¸ä¸ªæä»¶ã卿¥ä¸æ¥å¯¹è¿å¼ 表è¿è¡çMRæ¥éª¤éï¼Hadoopä¼å¯å¨åæä»¶ç¸åæ°éçmapperæ¥å¤çæ°æ®(é常ä¸ç¾ä¸è¡æ°æ®æ¯ä¸ä¸ªHDFSæ°æ®åè¦å°)ãå¦æä½ çæ¥å¸¸æ°æ®é没æè¿ä¹å¤§æè Hadoopé群æè¶³å¤çèµæºï¼ä½ æè®¸æ³è¦æ´å¤çå¹¶åæ°ï¼è¿æ¶å¯ä»¥å°`conf/kylin.properties`éç`kylin.job.mapreduce.mapper.input.rows`设为å°ä¸ç¹çæ°å¼ï¼æ¯å¦: + +`kylin.job.mapreduce.mapper.input.rows=500000` + +å ¶æ¬¡ï¼Kylinä¼è¿è¡ *"INSERT OVERWRITE TABLE ... DISTRIBUTE BY "* å½¢å¼çHiveQLæ¥ååæ°æ®å°æå®æ°éçreducerä¸ã + +å¨å¾å¤æ åµä¸ï¼Kylin请æ±Hiveéæºååæ°æ®å°reducerï¼ç¶åå¾å°å¤§å°ç¸è¿çæä»¶ï¼ååçè¯å¥æ¯"DISTRIBUTE BY RAND()"ã + +å¦æä½ çcubeæå®äºä¸ä¸ªé«åºæ°çåï¼æ¯å¦"USER_ID"ï¼ä½ä¸º"åç"维度(å¨cubeçâé«çº§è®¾ç½®â页é¢)ï¼Kylinä¼è®©Hiveæ ¹æ®è¯¥åçå¼éæ°ååæ°æ®ï¼é£ä¹å¨è¯¥åæçç¸åå¼çè¡å°è¢«ååå°åä¸ä¸ªæä»¶ãè¿æ¯éæºè¦ååè¦å¥½å¾å¤ï¼å 为ä¸ä» éæ°åå¸äºæ°æ®ï¼å¹¶ä¸å¨æ²¡æé¢å¤ä»£ä»·çæ åµä¸å¯¹æ°æ®è¿è¡äºé¢å åç±»ï¼å¦æ¤ä¸æ¥æ¥ä¸æ¥çcube buildå¤çä¼ä»ä¸åçãå¨å ¸åçåºæ¯ä¸ï¼è¿æ ·ä¼åå¯ä»¥åå°40%çbuildæ¶é¿ãå¨è¿ä¸ªæ¡ä¾ä¸ååçè¯å¥æ¯"DISTRIBUTE BY USER_ID"ï¼ + +请注æ: 1)âåçâååºè¯¥æ¯é«åºæ°ç维度åï¼å¹¶ä¸å®ä¼åºç°å¨å¾å¤çcuboidä¸ï¼ä¸åªæ¯åºç°å¨å°æ°çcuboidï¼ã 使ç¨å®æ¥åçè¿è¡ååå¯ä»¥å¨æ¯ä¸ªæ¶é´èå´å çæ°æ®åååå¸ï¼å¦åä¼é ææ°æ®å¾æï¼ä»èéä½buildæçãå ¸åçæ£é¢ä¾åæ¯ï¼âUSER_IDâãâSELLER_IDâãâPRODUCTâãâCELL_NUMBERâççï¼è¿äºåçåºæ°åºè¯¥å¤§äºä¸å(è¿å¤§äºreducerçæ°é)ã 2)"åç"对cubeçåå¨åæ ·æå¥½å¤ï¼ä¸è¿è¿è¶ åºäºæ¬æçèå´ã + +## æåäºå®è¡¨çå¯ä¸å + +å¨è¿ä¸æ¥éª¤Kylinè¿è¡MR任塿¥æå使ç¨åå ¸ç¼ç ç维度åçå¯ä¸å¼ã + +å®é ä¸è¿æ¥å¦å¤è¿åäºä¸äºäºæ ï¼éè¿HyperLogLog计æ°å¨æ¶écubeçç»è®¡æ°æ®ï¼ç¨äºä¼°ç®æ¯ä¸ªcuboidçè¡æ°ãå¦æä½ åç°mapperè¿è¡å¾å¾æ ¢ï¼è¿é常表æcubeç设计太è¿å¤æï¼è¯·åè +[ä¼åcube设计](howto_optimize_cubes.html)æ¥ç®åcubeã妿reduceråºç°äºå åæº¢åºé误ï¼è¿è¡¨æcuboidç»åçç太å¤äºæè æ¯YARNçå ååé æ»¡è¶³ä¸äºéè¦ã妿è¿ä¸æ¥ä»ä»»ä½æä¹ä¸è®²ä¸è½å¨åççæ¶é´å 宿ï¼ä½ å¯ä»¥æ¾å¼ä»»å¡å¹¶èèéæ°è®¾è®¡cubeï¼å 为继ç»ä¸å»ä¼è±è´¹æ´é¿çæ¶é´ã + +ä½ å¯ä»¥éè¿éä½åæ ·çæ¯ä¾ï¼kylin.job.cubing.inmen.sampling.percentï¼æ¥å éè¿ä¸ªæ¥éª¤ï¼ä½æ¯å¸®å©å¯è½ä¸å¤§èä¸å½±åäºcubeç»è®¡æ°æ®çåç¡®æ§ï¼æææä»¬å¹¶ä¸æ¨èã + +## æå»ºç»´åº¦åå ¸ + +æäºå䏿¥æåç维度åå¯ä¸å¼ï¼Kylinä¼å¨å åéæå»ºåå ¸ï¼å¨ä¸ä¸ªçæ¬å°æ¹ä¸ºMapReduceä»»å¡ï¼ãé常è¿ä¸æ¥æ¯è¾å¿«ï¼ä½å¦æå¯ä¸å¼éåå¾å¤§ï¼Kylinå¯è½ä¼æ¥åºç±»ä¼¼âåå ¸ä¸æ¯æè¿é«åºæ°âã对äºUHCç±»åçåï¼è¯·ä½¿ç¨å ¶ä»ç¼ç æ¹å¼ï¼æ¯å¦âfixed_lengthâãâintegerâççã + +## ä¿åcuboidçç»è®¡æ°æ®åå建 HTable + +è¿ä¸¤æ¥æ¯è½»é级åå¿«éçã + +## æå»ºåºç¡cuboid + +è¿ä¸æ¥ç¨Hiveçä¸é´è¡¨æå»ºåºç¡çcuboidï¼æ¯âéå±âæå»ºcubeç®æ³ç第ä¸è½®MR计ç®ãMapperçæ°ç®ä¸ç¬¬äºæ¥çreduceræ°ç®ç¸çï¼Reducerçæ°ç®æ¯æ ¹æ®cubeç»è®¡æ°æ®ä¼°ç®çï¼é»è®¤æ åµä¸æ¯500MBè¾åºä½¿ç¨ä¸ä¸ªreducerï¼å¦æè§å¯å°reducerçæ°éè¾å°ï¼ä½ å¯ä»¥å°kylin.propertieséçâkylin.job.mapreduce.default.reduce.input.mbâ设为å°ä¸ç¹çæ°å¼ä»¥è·å¾è¿å¤çèµæºï¼æ¯å¦: + +`kylin.job.mapreduce.default.reduce.input.mb=200` + +## Build N-Dimension Cuboid +## æå»ºNç»´cuboid + +è¿äºæ¥éª¤æ¯âéå±âæå»ºcubeçè¿ç¨ï¼æ¯ä¸æ¥ä»¥å䏿¥çè¾åºä½ä¸ºè¾å ¥ï¼ç¶å廿ä¸ä¸ªç»´åº¦ä»¥èåå¾å°ä¸ä¸ªåcuboidã举个ä¾åï¼cuboid ABCD廿Aå¾å°BCDï¼å»æBå¾å°ACDã + +æäºcuboidå¯ä»¥ä»ä¸ä¸ªä»¥ä¸çç¶cuboidèåå¾å°ï¼è¿ç§æ åµä¸ï¼Kylinä¼éæ©æå°çä¸ä¸ªç¶cuboidã举ä¾,ABå¯ä»¥ä»ABC(id:1110)åABD(id:1101)çæï¼åABDä¼è¢«éä¸ï¼å 为å®çæ¯ABCè¦å°ãå¨è¿åºç¡ä¸ï¼å¦æDçåºæ°è¾å°ï¼èåè¿ç®çææ¬å°±ä¼æ¯è¾ä½ãæä»¥ï¼å½è®¾è®¡rowkeyåºåçæ¶åï¼è¯·è®°å¾å°åºæ°è¾å°ç维度æ¾å¨æ«å°¾ãè¿æ ·ä¸ä» æå©äºcubeæå»ºï¼è䏿å©äºcubeæ¥è¯¢ï¼å 为é¢èåä¹éµå¾ªç¸åçè§åã + +é常æ¥è¯´ï¼ä»Nç»´å°(N/2)ç»´çæå»ºæ¯è¾æ ¢ï¼å ä¸ºè¿æ¯cuboidæ°éçç¸æ§å¢é¿çé¶æ®µï¼Nç»´æ1个cuboidï¼(N-1)ç»´æN个cuboidï¼(N-2)ç»´æN*(N-1)个cuboidï¼ä»¥æ¤ç±»æ¨ãç»è¿(N/2)ç»´æå»ºçæ¥éª¤ï¼æ´ä¸ªæå»ºä»»å¡ä¼éæ¸åå¿«ã + +## æå»ºcube + +è¿ä¸ªæ¥éª¤ä½¿ç¨ä¸ä¸ªæ°çç®æ³æ¥æå»ºcubeï¼âéçâæå»ºï¼ä¹ç§°ä¸ºâå åâæå»ºï¼ãå®ä¼ä½¿ç¨ä¸è½®MRæ¥è®¡ç®ææçcuboidsï¼ä½æ¯æ¯é常æ åµä¸æ´èå åãé ç½®æä»¶"conf/kylin_job_inmem.xml"æ£æ¯ä¸ºè¿æ¥è设ãé»è®¤æ åµä¸å®ä¸ºæ¯ä¸ªmapperç³è¯·3GBå åãå¦æä½ çé群æå è¶³çå åï¼ä½ å¯ä»¥å¨ä¸è¿°é ç½®æä»¶ä¸åé æ´å¤å åç»mapperï¼è¿æ ·å®ä¼ç¨å°½å¯è½å¤çå 忥ç¼åæ°æ®ä»¥è·å¾æ´å¥½çæ§è½ï¼æ¯å¦ï¼ + + <property> + <name>mapreduce.map.memory.mb</name> + <value>6144</value> + <description></description> + </property> + + <property> + <name>mapreduce.map.java.opts</name> + <value>-Xmx5632m</value> + <description></description> + </property> + + +请注æï¼Kylin伿 ¹æ®æ°æ®åå¸ï¼ä»cubeçç»è®¡æ°æ®éè·å¾ï¼èªå¨éæ©æä¼çç®æ³ï¼æ²¡æè¢«éä¸çç®æ³å¯¹åºçæ¥éª¤ä¼è¢«è·³è¿ãä½ ä¸éè¦æ¾å¼å°éæ©æå»ºç®æ³ã + +## å°cuboidæ°æ®è½¬æ¢ä¸ºHFile + +è¿ä¸æ¥å¯å¨ä¸ä¸ªMR任塿¥è®²cuboidæä»¶ï¼åºåæä»¶æ ¼å¼ï¼è½¬æ¢ä¸ºHBaseçHFileæ ¼å¼ãKylinéè¿cubeç»è®¡æ°æ®è®¡ç®HBaseçregionæ°ç®ï¼é»è®¤æ åµä¸æ¯5GBæ°æ®å¯¹åºä¸ä¸ªregionãRegionè¶å¤ï¼MR使ç¨çreducerä¹ä¼è¶å¤ãå¦æä½ è§å¯å°reduceræ°ç®è¾å°ä¸æ§è½è¾å·®ï¼ä½ å¯ä»¥å°âconf/kylin.propertiesâéç以ä¸åæ°è®¾å°ä¸ç¹ï¼æ¯å¦ï¼ + +``` +kylin.hbase.region.cut=2 +kylin.hbase.hfile.size.gb=1 +``` + +å¦æä½ ä¸ç¡®å®ä¸ä¸ªregionåºè¯¥æ¯å¤å¤§æ¶ï¼èç³»ä½ çHBase管çåã + +## å°HFileå¯¼å ¥HBase表 + +è¿ä¸æ¥ä½¿ç¨HBase APIæ¥è®²HFileå¯¼å ¥region serverï¼è¿æ¯è½»é级并快éç䏿¥ã + +## æ´æ°cubeä¿¡æ¯ + +å¨å¯¼å ¥æ°æ®å°HBaseåï¼Kylinå¨å æ°æ®ä¸å°å¯¹åºçcube segmentæ 记为readyã + +## æ¸ çèµæº + +å°ä¸é´å®½è¡¨ä»Hiveå é¤ãè¿ä¸æ¥ä¸ä¼é»å¡ä»»ä½æä½ï¼å 为å¨å䏿¥segmentå·²ç»è¢«æ 记为readyã妿è¿ä¸æ¥åçé误ï¼ä¸ç¨æ å¿ï¼åå¾åæ¶å·¥ä½å¯ä»¥æäºåéè¿Kylinç[StorageCleanupJob](howto_cleanup_storage.html)宿ã + +## æ»ç» +è¿æé常å¤å ¶ä»æé«Kylinæ§è½çæ¹æ³ï¼å¦æä½ æç»éªå¯ä»¥åäº«ï¼æ¬¢è¿éè¿[d...@kylin.apache.org](mailto:d...@kylin.apache.org)讨论ã \ No newline at end of file