《并行程序设计导论》 第二章PPT.ppt
《《并行程序设计导论》 第二章PPT.ppt》由会员分享,可在线阅读,更多相关《《并行程序设计导论》 第二章PPT.ppt(143页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、1Copyright 2010,Elsevier Inc.All rights ReservedChapter 2Parallel Hardware and Parallel SoftwareAn Introduction to Parallel ProgrammingPeter Pacheco2Copyright 2010,Elsevier Inc.All rights ReservedRoadmapnSome backgroundnModifications to the von Neumann modelnParallel hardwarenParallel softwarenInput
2、 and outputnPerformancenParallel program designnWriting and running parallel programsnAssumptions#Chapter Subtitle3SOME BACKGROUNDCopyright 2010,Elsevier Inc.All rights Reserved4Serial hardware and softwareCopyright 2010,Elsevier Inc.All rights ReservedinputoutputprogramsComputer runs oneprogram at
3、a time.5Copyright 2010,Elsevier Inc.All rights ReservedThe von Neumann Architecture#Chapter SubtitleFigure 2.16Main memorynThis is a collection of locations,each of which is capable of storing both instructions and data.nEvery location consists of an address,which is used to access the location,and
4、the contents of the location.Copyright 2010,Elsevier Inc.All rights Reserved7Central processing unit(CPU)nDivided into two parts.nControl unit-responsible for deciding which instruction in a program should be executed.(the boss)nArithmetic and logic unit(ALU)-responsible for executing the actual ins
5、tructions.(the worker)Copyright 2010,Elsevier Inc.All rights Reservedadd 2+28Key termsnRegister very fast storage,part of the CPU.nProgram counter stores address of the next instruction to be executed.nBus wires and hardware that connects the CPU and memory.Copyright 2010,Elsevier Inc.All rights Res
6、erved9Copyright 2010,Elsevier Inc.All rights ReservedmemoryCPUfetch/read10Copyright 2010,Elsevier Inc.All rights ReservedmemoryCPUwrite/store11von Neumann bottleneckCopyright 2010,Elsevier Inc.All rights Reserved12An operating system“process”nAn instance of a computer program that is being executed.
7、nComponents of a process:nThe executable machine language program.nA block of memory.nDescriptors of resources the OS has allocated to the process.nSecurity information.nInformation about the state of the process.Copyright 2010,Elsevier Inc.All rights Reserved13MultitaskingnGives the illusion that a
8、 single processor system is running multiple programs simultaneously.nEach process takes turns running.(time slice)nAfter its time is up,it waits until it has a turn again.(blocks)Copyright 2010,Elsevier Inc.All rights Reserved14Threading nThreads are contained within processes.nThey allow programme
9、rs to divide their programs into(more or less)independent tasks.nThe hope is that when one thread blocks because it is waiting on a resource,another will have work to do and can run.Copyright 2010,Elsevier Inc.All rights Reserved15A process and two threadsCopyright 2010,Elsevier Inc.All rights Reser
10、vedFigure 2.2the“master”threadstarting a threadIs called forkingterminating a threadIs called joining16MODIFICATIONS TO THE VON NEUMANN MODELCopyright 2010,Elsevier Inc.All rights Reserved17Basics of cachingnA collection of memory locations that can be accessed in less time than some other memory lo
11、cations.nA CPU cache is typically located on the same chip,or one that can be accessed much faster than ordinary memory.Copyright 2010,Elsevier Inc.All rights Reserved18Principle of localitynAccessing one location is followed by an access of a nearby location.nSpatial locality accessing a nearby loc
12、ation.nTemporal locality accessing in the near future.Copyright 2010,Elsevier Inc.All rights Reserved19Principle of localityCopyright 2010,Elsevier Inc.All rights Reservedfloat z1000;sum=0.0;for(i=0;i 1000;i+)sum+=zi;20Levels of CacheCopyright 2010,Elsevier Inc.All rights ReservedL1L2L3smallest&fast
13、estlargest&slowest21Cache hitCopyright 2010,Elsevier Inc.All rights ReservedL1L2L3x sumy z totalA radius r1 centerfetch x22Cache missCopyright 2010,Elsevier Inc.All rights ReservedL1L2L3y sumr1 z totalA radius centerfetch xxmainmemory23Issues with cachenWhen a CPU writes data to cache,the value in c
14、ache may be inconsistent with the value in main memory.nWrite-through caches handle this by updating the data in main memory at the time it is written to cache.nWrite-back caches mark data in the cache as dirty.When the cache line is replaced by a new cache line from memory,the dirty line is written
15、 to memory.Copyright 2010,Elsevier Inc.All rights Reserved24Cache mappingsnFull associative a new line can be placed at any location in the cache.nDirect mapped each cache line has a unique location in the cache to which it will be assigned.nn-way set associative each cache line can be place in one
16、of n different locations in the cache.Copyright 2010,Elsevier Inc.All rights Reserved25n-way set associativenWhen more than one line in memory can be mapped to several different locations in cache we also need to be able to decide which line should be replaced or evicted.Copyright 2010,Elsevier Inc.
17、All rights Reservedx26Example Copyright 2010,Elsevier Inc.All rights ReservedTable 2.1:Assignments of a 16-line main memory to a 4-line cache27Caches and programsCopyright 2010,Elsevier Inc.All rights Reserved28Virtual memory(1)nIf we run a very large program or a program that accesses very large da
18、ta sets,all of the instructions and data may not fit into main memory.nVirtual memory functions as a cache for secondary storage.Copyright 2010,Elsevier Inc.All rights Reserved29Virtual memory(2)nIt exploits the principle of spatial and temporal locality.nIt only keeps the active parts of running pr
19、ograms in main memory.Copyright 2010,Elsevier Inc.All rights Reserved30Virtual memory(3)nSwap space-those parts that are idle are kept in a block of secondary storage.nPages blocks of data and instructions.nUsually these are relatively large.nMost systems have a fixed page size that currently ranges
20、 from 4 to 16 kilobytes.Copyright 2010,Elsevier Inc.All rights Reserved31Virtual memory(4)Copyright 2010,Elsevier Inc.All rights Reservedprogram Aprogram Bprogram Cmain memory32Virtual page numbersnWhen a program is compiled its pages are assigned virtual page numbers.nWhen the program is run,a tabl
21、e is created that maps the virtual page numbers to physical addresses.nA page table is used to translate the virtual address into a physical address.Copyright 2010,Elsevier Inc.All rights Reserved33Page tableCopyright 2010,Elsevier Inc.All rights ReservedTable 2.2:Virtual Address Divided into Virtua
22、l Page Number and Byte Offset34Translation-lookaside buffer(TLB)nUsing a page table has the potential to significantly increase each programs overall run-time.nA special address translation cache in the processor.Copyright 2010,Elsevier Inc.All rights Reserved35Translation-lookaside buffer(2)nIt cac
23、hes a small number of entries(typically 16512)from the page table in very fast memory.nPage fault attempting to access a valid physical address for a page in the page table but the page is only stored on disk.Copyright 2010,Elsevier Inc.All rights Reserved36Instruction Level Parallelism(ILP)nAttempt
24、s to improve processor performance by having multiple processor components or functional units simultaneously executing instructions.Copyright 2010,Elsevier Inc.All rights Reserved37Instruction Level Parallelism(2)nPipelining-functional units are arranged in stages.nMultiple issue-multiple instructi
25、ons can be simultaneously initiated.Copyright 2010,Elsevier Inc.All rights Reserved38PipeliningCopyright 2010,Elsevier Inc.All rights Reserved39Pipelining example(1)Copyright 2010,Elsevier Inc.All rights ReservedAdd the floating point numbers 9.87104 and 6.5410340Pipelining example(2)nAssume each op
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 并行程序设计导论 并行程序设计导论 第二章PPT 并行 程序设计 导论 第二 PPT
限制150内