TI 多内核编程指南.pdf
《TI 多内核编程指南.pdf》由会员分享,可在线阅读,更多相关《TI 多内核编程指南.pdf(53页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、SPRAB27BAugust 2012Multicore Programming GuidePage 1 of 52Submit Documentation Feedback SPRAB27BAugust 2012Please be aware that an important notice concerning availability,standard warranty,and use in critical applicationsof Texas Instruments semiconductor products and disclaimers thereto appears at
2、 the end of this document.Application ReportMulticore Programming GuideMulticore Programming and Applications/DSP SystemsAbstractAs application complexity continues to grow,we have reached a limit on increasing performance by merely scaling clock speed.To meet the ever-increasing processing demand,m
3、odern System-On-Chip solutions contain multiple processing cores.The dilemma is how to map applications to multicore devices.In this paper,we present a programming methodology for converting applications to run on multicore devices.We also describe the features of Texas Instruments DSPs that enable
4、efficient implementation,execution,synchronization,and analysis of multicore applications.Contents1 Introduction.32 Mapping an Application to a Multicore Processor.32.1 Parallel Processing Models.32.2 Identifying a Parallel Task Implementation.93 Inter-Processor Communication.143.1 Data Movement.143
5、.2 Multicore Navigator Data Movement.173.3 Notification and Synchronization.173.4 Multicore Navigator Notification Methods.224 Data Transfer Engines .234.1 Packet DMA.234.2 EDMA.244.3 Ethernet.244.4 RapidIO.244.5 Antenna Interface.254.6 PCI Express.254.7 HyperLink.255 Shared Resource Management.265.
6、1 Global Flags.265.2 OS Semaphores.265.3 Hardware Semaphores.265.4 Direct Signaling.26Page 2 of 52Multicore Programming GuideSPRAB27BAugust 2012Submit Documentation Feedback 6 Memory Management.276.1 CPU View of the Device.286.2 Cache and Prefetch Considerations.296.3 Shared Code Program Memory Plac
7、ement.306.4 Peripheral Drivers.326.5 Data Memory Placement and Access.337 DSP Code and Data Images.347.1 Single Image.347.2 Multiple Images.347.3 Multiple Images with Shared Code and Data.347.4 Device Boot.357.5 Multicore Application Deployment(MAD)Utilities.368 System Debug.388.1 Debug and Tooling
8、Categories.388.2 Trace Logs .398.3 System Trace.509 Summary.5110 References.52SPRAB27BAugust 2012Multicore Programming GuidePage 3 of 52Submit Documentation Feedback 1 IntroductionFor the past 50 years,Moores law accurately predicted that the number of transistors on an integrated circuit would doub
9、le every two years.To translate these transistors into equivalent levels of system performance,chip designers increased clock frequencies(requiring deeper instruction pipelines),increased instruction level parallelism(requiring concurrent threads and branch prediction),increased memory performance(r
10、equiring larger caches),and increased power consumption(requiring active power management).Each of these four areas is hitting a wall that impedes further growth:Increased processing frequency is slowing due to diminishing improvements in clock rates and poor wire scaling as semiconductor devices sh
11、rink.Instruction-level parallelism is limited by the inherent lack of parallelism in the applications.Memory performance is limited by the increasing gap between processor and memory speeds.Power consumption scales with clock frequency;so,at some point,extraordinary means are needed to cool the devi
12、ce.Using multiple processor cores on a single chip allows designers to meet performance goals without using the maximum operating frequency.They can select a frequency in the sweet spot of a process technology that results in lower power consumption.Overall performance is achieved with cores having
13、simplified pipeline architectures relative to an equivalent single core solution.Multiple instances of the core in the device result in dramatic increases in the MIPS-per-watt performance.2 Mapping an Application to a Multicore ProcessorUntil recently,advances in computing hardware provided signific
14、ant increases in the execution speed of software with little effort from software developers.The introduction of multicore processors provides a new challenge for software developers,who must now master the programming techniques necessary to fully exploit multicore processing potential.Task paralle
15、lism is the concurrent execution of independent tasks in software.On a single-core processor,separate tasks must share the same processor.On a multicore processor,tasks essentially run independently of one another,resulting in more efficient execution.2.1 Parallel Processing ModelsOne of the first s
16、teps in mapping an application to a multicore processor is to identify the task parallelism and select a processing model that fits best.The two dominant models are a Master/Slave model in which one core controls the work assignments on all cores,and the Data Flow model in which work flows through p
17、rocessing stages as in a pipeline.Page 4 of 52Multicore Programming GuideSPRAB27BAugust 2012Submit Documentation Feedback 2.1.1 Master/Slave ModelThe Master/Slave model represents centralized control with distributed execution.A master core is responsible for scheduling various threads of execution
18、that can be allocated to any available core for processing.It also must deliver any data required by the thread to the slave core.Applications that fit this model inherently consist of many small independent threads that fit easily within the processing resources of a single core.This software often
19、 contains a significant amount of control code and often accesses memory in random order with multiple levels of indirection.There is relatively little computation per memory access and the code base is usually very large.Applications that fit the Master/Slave model often run on a high-level OS like
20、 Linux and potentially already have multiple threads of execution defined.In this scenario,the high-level OS is the master in charge of the scheduling.The challenge for applications using this model is real-time load balancing because the thread activation can be random.Individual threads of executi
21、on can have very different throughput requirements.The master must maintain a list of cores with free resources and be able to optimize the balance of work across the cores so that optimal parallelism is achieved.An example of a Master/Slave task allocation model is shown in Figure 1.Figure1Master/S
22、lave Processing ModelOne application that lends itself to the Master/Slave model is the multi-user data link layer of a communication protocol stack.It is responsible for media access control and logical link control of a physical layer including complex,dynamic scheduling and data movement through
23、transport channels.The software often accesses multi-dimensional arrays resulting in very disjointed memory access.Task MasterTask ATask BTasks C,D,ETasks F,GSPRAB27BAugust 2012Multicore Programming GuidePage 5 of 52Submit Documentation Feedback One or more execution threads are mapped to each core.
24、Task assignment is achieved using message-passing between cores.The messages provide the control triggers to begin execution and pointers to the required data.Each core has at least one task whose job is to receive messages containing job assignments.The task is suspended until a message arrives tri
25、ggering the thread of execution.2.1.2 Data Flow ModelThe Data Flow model represents distributed control and execution.Each core processes a block of data using various algorithms and then the data is passed to another core for further processing.The initial core is often connected to an input interf
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- TI 多内核编程指南 内核 编程 指南
限制150内