POA_Talk_HadoopScheduler.ppt
《POA_Talk_HadoopScheduler.ppt》由会员分享,可在线阅读,更多相关《POA_Talk_HadoopScheduler.ppt(55页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、CONFIDENTIALScheduling in HadoopVivek RatanCONFIDENTIALMeAmazon:5 months(Bangalore office)Principal in GPS(Global Payments Services)2CONFIDENTIALThe IAAU3(Incredible Amazon Acronym Universe)COWFPSHERDEPSSEALSOMAASINGPSCROWPINTCAMCBAEC2SDEFLSPCI/DSSCONFIDENTIALA system that breaks your work into para
2、llel pieces and distributes them across many machinesMap-Reduce implementation+file systemRuns on thousands of machines,processes petabytes of dataAt Yahoo:25,000+machines,80+PB of data4CONFIDENTIALMy talk:Scheduling in HadoopThree drivers:business requirements,user interaction,and technical challen
3、gesSometimes,figuring out what to build is as hard as building itEvolution through repeated iteration5CONFIDENTIALHadoop clusterHadoop clusters6FileBlocksCONFIDENTIALHadoop clusterInteracting with Hadoop7submitTasksJobbroken intoCONFIDENTIALHadoop clusterThe scheduling problemWhat task do you run on
4、 what machine,and in what order?8JobJobJobCONFIDENTIALIn the early daysBusinessas many users as possibleUser interactionsimpleTechnical100s of nodes9CONFIDENTIALHadoop clusterThe Original Hadoop scheduler Im freeDatalocalityJobScheduler10CONFIDENTIALWoo Hoo!Easy!11CONFIDENTIALBut FairnessPrioritiesI
5、solationRun-time determinism(SLA)12CONFIDENTIALTime to iterate!13CONFIDENTIALHadoop clusterEnter,Hadoop On Demand(HOD)JobHOD#machines14CONFIDENTIALWoo Hoo!IsolationSome determinismReuse15CONFIDENTIALBut Specifying#of nodes for a jobNo data localityBad utilizationHard to understand16CONFIDENTIALTime
6、to iterate!17CONFIDENTIALHmmm18CONFIDENTIALLessons learned so farTask vs.job based schedulingDeterminismFairness19CONFIDENTIALWhat do we build next?Business SLAsParts of clusters funded by different groupsUsersEasy-to-understand,fairJob prioritiesTechnical Support 3-4K machines20CONFIDENTIALWhat do
7、we build next?What do we name it?Yahoo scheduler?Treebeard?21CONFIDENTIALEnter,the Capacity Scheduler22CONFIDENTIALQueuesRetail queueAWS queueMisc queueJobRetail23Hadoop clusterCapacitySchedulerCONFIDENTIALCapacitiesCluster capacity(in slots)=maximum number of tasks that can run in parallelN machine
8、s,4 tasks/machine:capacity=4*N slots24CONFIDENTIAL(100 slots)Hadoop clusterCapacities25Retail503020AWSMiscCONFIDENTIALFairness:user limitsEach queue has a user limit:Maximum%of the queues capacity available to a single userUser limit=33%26RetailCONFIDENTIAL(100 slots)Hadoop clusterPutting it all tog
9、etherIm free27RetailAWSMiscCapacitySchedulerCONFIDENTIALStep 1:Picking a queueRunning tasks=30(#running)/capacity=30/50=0.62850RetailRunning tasks=28(#running)/capacity=28/30=0.9330AWSRunning tasks=15(#running)/capacity=15/20=0.7520MiscCONFIDENTIALPutting it all together29(100 slots)Hadoop clusterIm
10、 freeRetailAWSMiscCONFIDENTIALStep 2:Picking a jobRetailUser:APriority:HighJob:1User:APriority:MedJob:2User:BPriority:MedJob:3User:CPriority:LowJob:430CONFIDENTIALPutting it all together31(100 slots)Hadoop clusterIm freeRetailAWSMiscCONFIDENTIALStep 3:Picking a taskData locality32CONFIDENTIALWhy thi
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- POA_Talk_HadoopScheduler
限制150内