1.hadoopåsparkçåºå«
2.hadoopåsparkåªä¸ªå¥½
3.sparkåhadoopçåºå«
hadoopåsparkçåºå«
sparkåhadoopçåºå«ï¼è¯ççå å顺åºã计ç®ä¸åãå¹³å°ä¸åã
è¯ççå å顺åºï¼hadoopå±äºç¬¬ä¸ä»£å¼æºå¤§æ°æ®å¤çå¹³å°ï¼èsparkå±äºç¬¬äºä»£ãå±äºä¸ä¸ä»£çsparkè¯å®å¨ç»¼åè¯ä»·ä¸è¦ä¼äºç¬¬ä¸ä»£çhadoopã
计ç®ä¸åsparkåhadoopå¨åå¸å¼è®¡ç®çåºå±æè·¯ä¸ï¼å ¶å®æ¯æ为ç¸ä¼¼çï¼å³mapreduceåå¸å¼è¿ç®æ¨¡åï¼å°è¿ç®åæ两个é¶æ®µï¼é¶æ®µ1-mapï¼è´è´£ä»ä¸æ¸¸æåæ°æ®ååèªè¿ç®ï¼ç¶åå°è¿ç®ç»æshuffleç»ä¸æ¸¸çreduceï¼reduceååèªå¯¹éè¿shuffle读åæ¥çæ°æ®è¿è¡èåè¿ç®sparkåhadoopå¨åå¸å¼è®¡ç®çå ·ä½å®ç°ä¸ï¼åæåºå«ï¼hadoopä¸çmapreduceè¿ç®æ¡æ¶ï¼ä¸ä¸ªè¿ç®jobï¼è¿è¡ä¸æ¬¡map-reduceçè¿ç¨ï¼èsparkçä¸ä¸ªjobä¸ï¼å¯ä»¥å°å¤ä¸ªmap-reduceè¿ç¨çº§èè¿è¡ã
å¹³å°ä¸åsparkåhadoopåºå«æ¯ï¼sparkæ¯ä¸ä¸ªè¿ç®å¹³å°ï¼èhadoopæ¯ä¸ä¸ªå¤åå¹³å°ï¼å å«è¿ç®å¼æï¼è¿å å«åå¸å¼æ件åå¨ç³»ç»ï¼è¿å å«åå¸å¼è¿ç®çèµæºè°åº¦ç³»ç»ï¼ï¼æ以ï¼sparkè·hadoopæ¥æ¯è¾çè¯ï¼ä¸»è¦æ¯æ¯è¿ç®è¿ä¸å大æ°æ®ææ¯åå±å°ç®åè¿ä¸ªé¶æ®µï¼hadoop主è¦æ¯å®çè¿ç®é¨åæ¥æ¸å¼å¾®ï¼èsparkç®åå¦æ¥ä¸å¤©ï¼ç¸å ³ææ¯éæ±é大ï¼offer好æ¿ã
hadoopåsparkåªä¸ªå¥½
sparkåhadoopçåºå«ï¼è¯ççå å顺åºã计ç®ä¸åãå¹³å°ä¸åã
è¯ççå å顺åºï¼hadoopå±äºç¬¬ä¸ä»£å¼æºå¤§æ°æ®å¤çå¹³å°ï¼èsparkå±äºç¬¬äºä»£ãå±äºä¸ä¸ä»£çsparkè¯å®å¨ç»¼åè¯ä»·ä¸è¦ä¼äºç¬¬ä¸ä»£çhadoopã
计ç®ä¸åsparkåhadoopå¨åå¸å¼è®¡ç®çåºå±æè·¯ä¸ï¼å ¶å®æ¯æ为ç¸ä¼¼çï¼å³mapreduceåå¸å¼è¿ç®æ¨¡åï¼å°è¿ç®åæ两个é¶æ®µï¼é¶æ®µ1-mapï¼è´è´£ä»ä¸æ¸¸æåæ°æ®ååèªè¿ç®ï¼ç¶åå°è¿ç®ç»æshuffleç»ä¸æ¸¸çreduceï¼reduceååèªå¯¹éè¿shuffle读åæ¥çæ°æ®è¿è¡èåè¿ç®sparkåhadoopå¨åå¸å¼è®¡ç®çå ·ä½å®ç°ä¸ï¼åæåºå«ï¼hadoopä¸çmapreduceè¿ç®æ¡æ¶ï¼ä¸ä¸ªè¿ç®jobï¼è¿è¡ä¸æ¬¡map-reduceçè¿ç¨ï¼èsparkçä¸ä¸ªjobä¸ï¼å¯ä»¥å°å¤ä¸ªmap-reduceè¿ç¨çº§èè¿è¡ã
å¹³å°ä¸åsparkåhadoopåºå«æ¯ï¼sparkæ¯ä¸ä¸ªè¿ç®å¹³å°ï¼èhadoopæ¯ä¸ä¸ªå¤åå¹³å°ï¼å å«è¿ç®å¼æï¼è¿å å«åå¸å¼æ件åå¨ç³»ç»ï¼è¿å å«åå¸å¼è¿ç®çèµæºè°åº¦ç³»ç»ï¼ï¼æ以ï¼sparkè·hadoopæ¥æ¯è¾çè¯ï¼ä¸»è¦æ¯æ¯è¿ç®è¿ä¸å大æ°æ®ææ¯åå±å°ç®åè¿ä¸ªé¶æ®µï¼hadoop主è¦æ¯å®çè¿ç®é¨åæ¥æ¸å¼å¾®ï¼èsparkç®åå¦æ¥ä¸å¤©ï¼ç¸å ³ææ¯éæ±é大ï¼offer好æ¿ã
sparkåhadoopçåºå«
解å³é®é¢çå±é¢ä¸ä¸æ ·
é¦å ï¼HadoopåApache Spark两è é½æ¯å¤§æ°æ®æ¡æ¶ï¼ä½æ¯åèªåå¨çç®çä¸å°½ç¸åãHadoopå®è´¨ä¸æ´å¤æ¯ä¸ä¸ªåå¸å¼æ°æ®åºç¡è®¾æ½: å®å°å·¨å¤§çæ°æ®éåæ´¾å°ä¸ä¸ªç±æ®é计ç®æºç»æçé群ä¸çå¤ä¸ªèç¹è¿è¡åå¨ï¼æå³çæ¨ä¸éè¦è´ä¹°åç»´æ¤æè´µçæå¡å¨ç¡¬ä»¶ã
åæ¶ï¼Hadoopè¿ä¼ç´¢å¼åè·è¸ªè¿äºæ°æ®ï¼è®©å¤§æ°æ®å¤çååææçè¾¾å°åææªæçé«åº¦ãSparkï¼åæ¯é£ä¹ä¸ä¸ªä¸é¨ç¨æ¥å¯¹é£äºåå¸å¼åå¨ç大æ°æ®è¿è¡å¤ççå·¥å ·ï¼å®å¹¶ä¸ä¼è¿è¡åå¸å¼æ°æ®çåå¨ã
两è å¯åå¯å
Hadoopé¤äºæä¾ä¸ºå¤§å®¶æå ±è¯çHDFSåå¸å¼æ°æ®åå¨åè½ä¹å¤ï¼è¿æä¾äºå«åMapReduceçæ°æ®å¤çåè½ãæ以è¿éæ们å®å ¨å¯ä»¥æå¼Sparkï¼ä½¿ç¨Hadoopèªèº«çMapReduceæ¥å®ææ°æ®çå¤çã
ç¸åï¼Sparkä¹ä¸æ¯éè¦ä¾éå¨Hadoop身ä¸æè½çåãä½å¦ä¸æè¿°ï¼æ¯ç«å®æ²¡ææä¾æ件管çç³»ç»ï¼æ以ï¼å®å¿ é¡»åå ¶ä»çåå¸å¼æ件系ç»è¿è¡éææè½è¿ä½ãè¿éæ们å¯ä»¥éæ©HadoopçHDFS,源支付源码开源ä¹å¯ä»¥éæ©å ¶ä»çåºäºäºçæ°æ®ç³»ç»å¹³å°ãä½Sparké»è®¤æ¥è¯´è¿æ¯è¢«ç¨å¨Hadoopä¸é¢çï¼æ¯ç«ï¼å¤§å®¶é½è®¤ä¸ºå®ä»¬çç»åæ¯æ好çã
Sparkæ°æ®å¤çé度ç§æMapReduce
Sparkå ä¸ºå ¶å¤çæ°æ®çæ¹å¼ä¸ä¸æ ·ï¼ä¼æ¯MapReduceå¿«ä¸å¾å¤ãMapReduceæ¯åæ¥å¯¹æ°æ®è¿è¡å¤çç: âä»é群ä¸è¯»åæ°æ®ï¼è¿è¡ä¸æ¬¡å¤çï¼å°ç»æåå°é群ï¼ä»é群ä¸è¯»åæ´æ°åçæ°æ®ï¼è¿è¡ä¸ä¸æ¬¡çå¤çï¼å°ç»æåå°é群ï¼ççâ¦â Booz Allen Hamiltonçæ°æ®ç§å¦å®¶Kirk Borneå¦æ¤è§£æã
åè§Sparkï¼å®ä¼å¨å åä¸ä»¥æ¥è¿âå®æ¶âçæ¶é´å®æææçæ°æ®åæï¼âä»é群ä¸è¯»åæ°æ®ï¼å®æææå¿ é¡»çåæå¤çï¼å°ç»æååé群ï¼å®æï¼â Born说éãSparkçæ¹å¤çé度æ¯MapReduceå¿«è¿åï¼å åä¸çæ°æ®åæé度åå¿«è¿åã
å¦æéè¦å¤ççæ°æ®åç»æéæ±å¤§é¨åæ åµä¸æ¯éæçï¼ä¸ä½ ä¹æèå¿çå¾ æ¹å¤ççå®æçè¯ï¼MapReduceçå¤çæ¹å¼ä¹æ¯å®å ¨å¯ä»¥æ¥åçã
ä½å¦æä½ éè¦å¯¹æµæ°æ®è¿è¡åæï¼æ¯å¦é£äºæ¥èªäºå·¥åçä¼ æå¨æ¶éåæ¥çæ°æ®ï¼åæè è¯´ä½ çåºç¨æ¯éè¦å¤éæ°æ®å¤ççï¼é£ä¹ä½ ä¹è®¸æ´åºè¯¥ä½¿ç¨Sparkè¿è¡å¤çã
大é¨åæºå¨å¦ä¹ ç®æ³é½æ¯éè¦å¤éæ°æ®å¤ççãæ¤å¤ï¼é常ä¼ç¨å°Sparkçåºç¨åºæ¯æ以ä¸æ¹é¢ï¼å®æ¶çå¸åºæ´»å¨ï¼å¨çº¿äº§åæ¨èï¼ç½ç»å®å ¨åæï¼æºå¨æ¥è®°çæ§çã
ç¾é¾æ¢å¤
两è çç¾é¾æ¢å¤æ¹å¼è¿¥å¼ï¼ä½æ¯é½å¾ä¸éãå 为Hadoopå°æ¯æ¬¡å¤çåçæ°æ®é½åå ¥å°ç£çä¸ï¼æä»¥å ¶å¤©çå°±è½å¾æå¼¹æ§ç对系ç»é误è¿è¡å¤çã
Sparkçæ°æ®å¯¹è±¡åå¨å¨åå¸äºæ°æ®é群ä¸çå«åå¼¹æ§åå¸å¼æ°æ®é(RDD: Resilient Distributed Dataset)ä¸ãâè¿äºæ°æ®å¯¹è±¡æ¢å¯ä»¥æ¾å¨å åï¼ä¹å¯ä»¥æ¾å¨ç£çï¼æ以RDDåæ ·ä¹å¯ä»¥æä¾å®æçç¾é¾æ¢å¤åè½ï¼âBorneæåºã