东北大学实时系统.ppt
Real-Time SystemsNan Guan Nan Guan(关楠)(关楠)(关楠)(关楠)Computing EvolutionMainframe computing(60s-70s)Large computers to execute big data processing applicationsDesktop computing&Internet(80s-90s)One computer at every desk to do business/personal activitiesEmbedded Systems(since 00s)Numerous computing devices in every place/person“Invisible”part of the environmentMillions for desktops vs.billions for embedded processorsEmbedded Systems“Computer systems that are embedded into physical objectes ”98%computers in the world are embedded systemsMarket share3Embedded SystemsReal-Time SystemsReal-Time Systems:“The correctness of the computation does not only depend on the logical results,but also the time when the results are produced.”Real-Time Systems are common in Embedded SystemsInteraction between computers and the physical worldReal-Time SystemsReal-Time SystemsReal-Time SystemsReal-Time SystemsReal-Time SystemsWe need to guarantee that all timing constraints are satisfied under any circumstance,BEFORE run-timeHow to do this?How about using a very fast processor?How about executing it and measuring the time?(Testing)11ThestoryaboutasheepinScotland12ThestoryaboutthePathfinderonMarsWhat do we learn?Testing is inadequate13What do we learn?Testing is inadequate,especially for real-time embedded systemsNeed to simulate the stimulus and feedback from the physical world14EventheNASALabfails,thenhowaboutyourproject?What do we learn?Difficult to fix the error,even if it is exposed in testingDifficult to reproduceDifficult to explain15Timing AnalysisHow to rigorously verify the timing correctness?An extremely difficult problem 16Real-Time Computing TechnologySome leading countries USAGermanySwedenFranceItaly 17Research in Real-Time SystemsConferencesRTSS:IEEE Real-Time Systems SymposiumRTAS:IEEE Real-Time and Technology and Applications SymposiumECRTS:Euromicro Conference on Real-time SystemsJournal:Journal of Real-Time Systems18More details about Real-Time Systems in the following19Hard vs Soft Real-Time Systems Definition based on functional criticalityHARD:The failure to meet the timing constraints has disastrous consequencesBraking system,collision detection,etc.SOFT:A few misses of deadlines do no serious harm,and only the systems overall performance degradesVideo playback,etc.Problem:how serious is serious?-A design propertyHard vs Soft Real-Time Systems Definition based on validationValidation:A demonstration by a provably correct,efficient procedure or by exhaustive simulation and testingHARD:The user require the validation that the system always meets the timing constraintsSOFT:No validation is required,or only a demonstration that the job meet some statistical constraint sufficesDesirable Properties of RT Systems(1)Predictabilitythe system behavior is known before it is put into operation e.g.Response times,deadlock freedom,etc.Analyzabilityeasy to analyze if the system can meet all the deadlines Cost optimalitye.g.Energy consumption,memory blocks,etc.Desirable Properties of RT Systems(2)Maintainabilitymodular structure to ease system modification Robustnessmust not collapse when subject to peak load,exception,manage all possible scenarios Fault tolerancehardware and software failures should not cause the system to crash Why Achieving Predictability is Hard?Challenges from hardware featuresCache sharing,processor pipelines,multicores,DMA.Interrupt handling may introduce unbounded delays Resource sharing:priority inversionMemory management(static allocation may not be enough,dynamic data structures e.g.Queue),no virtual memory Communication delays in a distributed environment Why Achieving Predictability is Hard?Challenges from software featuresDifficult to calculate the worst case execution time for tasks(theoretically impossible,halting problem)Bounded loops e.g.For-loops only Complex synchronization patterns between tasks:potential deadlocks(formal verification)Multi-tasking,tasks that share resources Misconceptions about RT ComputingReal-time computing is fast computingReal-time system research is performance engineeringReal-time is time sharingOverall Structure of a RT SystemHardware(CPU,I/O device etc)a clock!A real time OS(function as standard OS,with predictable behavior and well-defined functionality)A collection of RT tasks/processes(share recourses,communicate/synchronizeA Real-Time Control System2023/2/2628A Real-Time Control SystemSelection of the sampling periodThe digital world is discrete!The responsiveness of the overall systemThe dynamic behavior of the plant:keep oscillation small and the system under controlFurther:Multirate SystemsThe example of fly-by-wire avionicsTiming constraintsFinishcontrollawcomputationforeachsmallestsamplingperiods!Further:Multirate SystemsFor exampleThe sampling rates of A,B,and C are 180Hz,90Hz,30Hz resp.The sampling periods are“harmonic”Dothefollowingineach1/180secondcycle:DoADoBonceevery2cyclesDoConceevery6cyclesOutputcommandsWaituntilthebeginningofthenextcycleFurthermore:A Car ControllerActivities of a car control system.Let C=worst case execution time T=(sampling)period D=deadlineControl requirementsConstruct a controller meeting all the deadlines!ObjectsCTDSpeed measurement4205ABS control104040Fuel injection408080Others,e.g.audio,air conditonningSoft deadlinesThe Overall SystemTiming AnalysisSoftware system is complexTiming AnalysisUsually performed bottom-upProgram Level Analysisto bound the worst-case execution time(WCET)of each individual task assuming fully dedicated hardwareSystem Level Analysisthe WCET information of individual tasks is gathered,to investigate the contention of processing capacity among different tasks in a multitasking environmentProgram Level36Execution Time of a ProgramExecution times of programs are intrinsically variable!Void signal_processing()read_signal(&curr_signal);if(curr_signal threshold)signal_transformation();/some+-*/ops.elseerror_handling_routine();/complex error handling operationsThe Distribution of Execution TimesDistributionBCETPossible Execution TimeWCETPredicted Upper BoundObservedBCETObserved WCETObserved Execution TimeExecution TimeWorst-Case GuaranteeThe Quality of Static AnalysisSafetyThe estimated upper bound should always enclose the WCETTightness(Precision)The estimated upper bound should be as close as possible to the actual WCETMain objective:to minimize system over-provisioningComplexityThere is always a trade-off between analysis efficiency and precisionSystem Level40Real-Time SchedulingA Simple Task ModelNon periodic/Aperiodic Tasks Appear once only 3 parameters:A:arriving time C:computing time(resource requirement)D:deadline(relative deadline)42Constraints on TasksTiming constraints:deadline for each task,Relative to arriving time Absolute deadline Precedence constraints Precedence graphs Input/output relation Resource constraints Mutual exclusion Resource access protocols 43Scheduling Problems Given a set of tasks(ready queue)Check if all deadlines can be met with a given schedule(schedulability check)Construct a”feasible”schedule to meet all deadlines Construct an optimal schedule e.g.minimizing response times 44Tasks with the Same Arrival Time Assume a list of tasks(A,C1,D1)(A,C2,D2).(A,Cn,Dn)that arrive at the same time i.e.A How to find a feasible schedule?(there may be many feasible schedules)EDFEDF(Earliest-Deadline-First)order tasks with non-decreasing deadlines.Jackson 1955 Example:(0,1,10)(0,2,3)(0,3,5)Schedule:(0,2,3)(0,3,5)(0,1,10)46EDF:Schedulability Test Order tasks in increasing order of deadlines If C1+.+Ck 6!48EDF:Optimality Assume that Ri is the finishing time of task i,i.e.response time.Let Li=Ri-Di(the lateness for task i)FACT1:EDF is optimal,minimizing the maximum lateness MAXi(Li)FACT2:EDF is optimal If EDF cannt find a feasible schedule for a task set,then no other algorithm can,which means that the task set is non schedulable.Note that even a task set is non schedulable,EDF may minimize the maximal lateness(minimizes e.g.the loss for soft tasks)49Tasks with Different Arrival TimesAssume a list of tasks S=(A1,C1,D1)(A2,C2,D2).(An,Cn,Dn)Preemptive EDF Horn 1974:Whenever new tasks arrive,sort the ready queue according to earlist deadlines Run the first task of the queue 50Preemptive EDF:Schedulability Test At any time,order the tasks according to EDF(A1,C1,D1).(Ai,Ci,Di)If C1+.+Ck=Dk for all k=1,2.i,then the task set is schedulable at the moment If S is schedulable at all time points at which tasks arrive,S is schedulable Why?51Preemptive EDF:ExampleConsider(1,5,11)(2,1,3)(3,4,8)Deadlines are relative to arrival times At 1,(5,11)At 2,(1,3)(4,10)At 3,(4,8)(4,9)52Preemptive EDF:Optimality Assume that Ri is the finishing time/response time of task i.Let Li=Ri-Di(the lateness for task i)FACT1:preemptive EDF is optimal in minimizing the maximum lateness Lmax=MAXi(Li)FACT2:Preemptive EDF is optimal Dertouzos 1974 in finding feasible schedules.53Non Preemptive EDF Run a task until its finished and then sort the queue according to EDF Easy to implement,less overhead(no more context switch than necessary)However it is not optimal,it may not find the feasible schedule even it exists.54Non-preemptive EDF:Example 55“Work-conserving”Non-preemptive EDF If we only consider non-idle algorithms(CPU waiting only if no tasks to run),is EDF is optimal?Unfortunately no!Example T1=(0,10,100)T2=(0,1,101)T3=(1,4,4)Run T1,T3,T2:the 3rd task will miss its deadline Run T2,T3,T1:it is a feasible schedule 56Optimal Non-preemptive Schedule?To construct“optimal”non-preemptive Schedule:“complete search,NP-hard”is neededThe decision should be made according to all the parameters in the whole list of tasks The worst case is to test all possible combinations of n tasks(NP-hard,difficult for large n)57Practical MethodsBacktrack whenever neededSearch until a non-schedulable situation occur,then backtrack Bratleys algorithm Simple and easy to implement but may not find a schedule if n is too big(worst case)58Example(Bratleys Algorithm)59Better Heuristic Methods?Discussion(assignment):Design a new heuristic method to construct good non-preemptive algorithms.60Outline of This CourseProgram Level Timing AnalysisWorst-case Execution Time(WCET)AnalysisHow to deal with the software complexity?How to deal with the hardware complexity?System Level Timing AnalysisReal-Time Scheduling TheoryOtherFormal verification,Real-Time operating systems61WhatHappenedonMars?62TheMarsPathfindermissionwaswidelyproclaimedasflawlessintheearlydaysafteritsJuly4th,1997landingontheMartiansurface.Successesincludeditsunconventionallanding-bouncingontotheMartiansurfacesurroundedbyairbags,deployingtheSojournerrover,andgatheringandtransmittingvoluminousdatabacktoEarth,includingthepanoramicpicturesthatweresuchahitontheWeb.Butafewdaysintothemission,notlongafterPathfinderstartedgatheringmeteorologicaldata,thespacecraftbeganexperiencingtotalsystemresets,eachresultinginlossesofdata.Thepressreportedthesefailuresintermssuchassoftwareglitchesandthecomputerwastryingtodotoomanythingsatonce.LastnightattheIEEEReal-TimeSystemsSymposiumIheardafascinatingkeynoteaddressbyDavidWilner,ChiefTechnicalOfficerofWindRiverSystems.WindRivermakesVxWorks,thereal-timeembeddedsystemskernelthatwasusedintheMarsPathfindermission.Inhistalk,heexplainedindetailtheactualsoftwareproblemsthatcausedthetotalsystemresetsofthePathfinderspacecraft,howtheywerediagnosed,andhowtheyweresolved.Iwantedtosharehisstorywitheachofyou.VxWorksprovidespreemptivepriorityschedulingofthreads.TasksonthePathfinderspacecraftwereexecutedasthreadswithprioritiesthatwereassignedintheusualmannerreflectingtherelativeurgencyofthesetasks.Pathfindercontainedaninformationbus,whichyoucanthinkofasasharedmemoryareausedforpassinginformationbetweendifferentcomponentsofthespacecraft.Abusmanagementtaskranfrequentlywithhighprioritytomovecertainkindsofdatainandoutoftheinformationbus.Accesstothebuswassynchronizedwithmutualexclusionlocks(mutexes).Themeteorologicaldatagatheringtaskranasaninfrequent,lowprioritythread,andusedtheinformationbustopublishitsdata.Whenpublishingitsdata,itwouldacquireamutex,dowritestothebus,andreleasethemutex.Ifaninterruptcausedtheinformationbusthreadtobescheduledwhilethismutexwasheld,andiftheinformationbusthreadthenattemptedtoacquirethissamemutexinordertoretrievepublisheddata,thiswouldcauseittoblockonthemutex,waitinguntilthemeteorologicalthreadreleasedthemutexbeforeitcouldcontinue.Thespacecraftalsocontainedacommunicationstaskthatranwithmediumpriority.Mostofthetimethiscombinationworkedfine.However,veryinfrequentlyitwaspossibleforaninterrupttooccurthatcausedthe(mediumpriority)communicationstasktobescheduledduringtheshortintervalwhilethe(highpriority)informationbusthreadwasblockedwaitingforthe(lowpriority)meteorologicaldatathread.Inthiscase,thelong-runningcommunicationstask,havinghigherprioritythanthemeteorologicaltask,wouldpreventitfromrunning,consequentlypreventingtheblockedinformationbustaskfromrunning.Aftersometimehadpassed,awatchdogtimerwouldgooff,noticethatthedatabustaskhadnotbeenexecutedforsometime,concludethatsomethinghadgonedrasticallywrong,andinitiateatotalsystemreset.Thisscenarioisaclassiccaseofpriorityinversion.How was this debugged?VxWorkscanberuninamodewhereitrecordsatotaltraceofallinterestingsystemevents,includingcontextswitches,usesofsynchronizationobjects,andinterrupts.Afterthefailure,JPLengineersspenthoursandhoursrunningthesystemontheexactspacecraftreplicaintheirlabwithtracingturnedon,attemptingtoreplicatethepreciseconditionsunderwhichtheybelievedthattheresetoccurred.Earlyinthemorning,afterallbutoneengineerhadgonehome,theengineerfinallyreproducedasystemresetonthereplica.Analysisofthetracerevealedthepriorityinversion.How was this problem corrected?Whencreated,aVxWorksmutexobjectacceptsabooleanparameterthatindicateswhetherpriorityinheritanceshouldbeperformedbythemutex.Themutexinquestionhadbeeninitializedwiththeparameteroff;haditbeenon,thelow-prioritymeteorologicalthreadwouldhaveinheritedthepriorityofthehigh-prioritydatabusthreadblockedonitwhileitheldthemutex,causingitbescheduledwithhigherprioritythanthemedium-prioritycommunicationstask,thuspreventingthepriorityinversion.Oncediagnosed,itwascleartotheJPLengineersthatusingpriorityinheritancewouldpreventtheresetstheywereseeing.VxWorkscontainsaClanguageinterpreterintendedtoallowdeveloperstotypeinCexpressionsandfunctionstobeexecutedontheflyduringsystemdebugging.TheJPLengineersfortuitouslydecidedtolaunchthespacecraftwiththisfeaturestillenabled.Bycodingconvention,theinitializationparameterforthemutexinquestion(andthosefortwootherswhichcouldhavecausedthesameproblem)werestoredinglobalvariables,whoseaddresseswereinsymboltablesalsoincludedinthelaunchsoftware,andavailabletotheCinterpreter.AshortCprogramwasuploadedtothespacecraft,whichwheninterpreted,changedthevaluesofthesevariablesfromFALSEtoTRUE.Nomoresystemresetsoccurred.Analysis and LessonsFirstandforemost,diagnosingthisproblemasablackboxwouldhavebeenimpossible.Onlydetailedtracesofactualsystembehaviorenabledthefaul