书签分享收藏举报版权申诉 / 68

立即下载

当前位置：首页 > 生活休闲 > 生活常识 > 查询优化.ppt

查询优化.ppt

上传人：hyn****60

文档编号：82468282

上传时间：2023-03-25

格式：PPT

页数：68

大小：634KB

( 4.5 )

《查询优化.ppt》由会员分享，可在线阅读，更多相关《查询优化.ppt（68页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、Silberschatz,Korth and Sudarshan14.1Database System Concepts 3rd Edition第第14章章:查询优化查询优化n概述n用于代价估计的统计信息n关系代数表达式的转换n基于代价的优化算法n物化视图与视图维护Silberschatz,Korth and Sudarshan14.2Database System Concepts 3rd Edition概述概述n一个给定查询有多种可选择的求值方法等价表达式一个操作有若干不同算法(Chapter 13)n一个查询求值方法的好坏带来的代价差别可能是巨大的例如:执行r X s 后续选择r.A=s

2、.B 比执行一个同样条件的连接慢得多n需要估计操作的代价非常依赖于数据库必须维护的关系的统计信息例如元组数,连接属性的不同值的数目,等等.需要对中间结果估计统计信息以便对复杂表达式计算代价Silberschatz,Korth and Sudarshan14.3Database System Concepts 3rd Edition概述概述(续续)两个等价表达式生成的关系具有相同的属性集合并包含相同的元组集合,尽管属性次序可能不同.Silberschatz,Korth and Sudarshan14.4Database System Concepts 3rd Edition概述概述(续续)n对一

3、个表达式的查询计值方案的生成涉及几个步骤:1.生成逻辑上等价的表达式利用等价规则等价规则将一个表达式转换成另一个等价的表达式.2.注解(Annotate)结果表达式以得到其他查询计划3.基于估计代价估计代价选择最廉价的计划n整个过程称为基于代价的优化基于代价的优化.Silberschatz,Korth and Sudarshan14.5Database System Concepts 3rd Edition用于代价估计的统计信息用于代价估计的统计信息nnr:关系r 中的元组个数.nbr:包含r 中元组的块数.nsr:r 中元组的大小.nfr:r 的块因子即,能放入一块之中的r 的元组数.nV

4、(A,r):r 中出现的A属性上的不同值的个数;等于A(r)的大小.nSC(A,r):关系r 中属性A的选择基数;满足A上等值条件的平均记录数.n若r 中元组在文件中物理排序,则:Silberschatz,Korth and Sudarshan14.6Database System Concepts 3rd Edition关于索引的目录信息关于索引的目录信息nfi:索引i 的内节点的平均扇出,适用于树结构索引如B+树.nHTi:索引i 的层数即高度.对于关系r上A 属性的平衡树索引(如B+-树),HTi=logfi(V(A,r).对于散列索引,HTi 为1.LBi:索引i 的底层索引块数即

5、索引叶子层的块数.Silberschatz,Korth and Sudarshan14.7Database System Concepts 3rd Edition查询代价的度量查询代价的度量n回顾:典型地,磁盘存取是决定性的代价,也相对容易估计.来自磁盘的块传送次数被用作求值的实际代价的度量.假设所有块传送都具有相同代价.实际的优化器不作此假设,并区分顺序的与随机的磁盘存取n我们不包括写输出到磁盘的代价.n记算法A的代价估计为EASilberschatz,Korth and Sudarshan14.8Database System Concepts 3rd Edition选择的大小估计选择的大

6、小估计n等值选择等值选择 A=v(r)SC(A,r):满足选择的记录数SC(A,r)/fr 这些记录将占用的块数例如二分搜索的代价估计为键属性上的等值条件:SC(A,r)=1Silberschatz,Korth and Sudarshan14.9Database System Concepts 3rd Edition统计信息例统计信息例nfaccount=20 (一块可放入account 的20个元组)nV(branch-name,account)=50 (50个分行)nV(balance,account)=500 (500 个不同的balance 值)naccount =10000 (acc

7、ount 有10,000条元组)n假设account 上存在下列索引:属性branch-name上的主B+-树索引属性balance上的次级B+-树索引Silberschatz,Korth and Sudarshan14.10Database System Concepts 3rd Edition涉及比较的选择涉及比较的选择n形如AV(r)的选择(A V(r)的情形是对称的)n令c 表示满足条件的估计元组数.若 min(A,r)和 max(A,r)在目录中可得c=0 if v 1000 (branch (account depositor)n利用连接结合性(Rule 6a)的转换:custom

8、er-name(branch-city=“Brooklyn”balance 1000 (branch (account)depositor)n第二种形式提供了应用“尽早执行选择”规则的机会,导致子表达式 branch-city=“Brooklyn”(branch)balance 1000(account)n因此一系列的转换是有用的Silberschatz,Korth and Sudarshan14.30Database System Concepts 3rd Edition多步转换多步转换(续续)Silberschatz,Korth and Sudarshan14.31Database Sys

9、tem Concepts 3rd Edition投影操作例投影操作例n当我们计算(branch-city=“Brooklyn”(branch)account)我们得到一个关系,其模式为:(branch-name,branch-city,assets,account-number,balance)n利用等价规则8a和8b下推投影;从中间结果删除不必要的属性可得:customer-name(account-number(branch-city=“Brooklyn”(branch)account)depositor)customer-name(branch-city=“Brooklyn”(branc

10、h)account)depositor)Silberschatz,Korth and Sudarshan14.32Database System Concepts 3rd Edition连接次序例连接次序例n对所有关系 r1,r2,及r3,(r1 r2)r3 =r1 (r2 r3)n若r2 r3 很大且r1 r2 较小,我们选择(r1 r2)r3 从而我们计算并存储一个较小的临时关系.Silberschatz,Korth and Sudarshan14.33Database System Concepts 3rd Edition连接次序例连接次序例(续续)n考虑表达式customer-name

11、 (branch-city=“Brooklyn”(branch)account depositor)n可以先计算account depositor,再将结果与下面连接 branch-city=“Brooklyn”(branch)但account depositor 可能是个大关系.n由于更可能仅仅少部分银行客户在位于的分行开帐户,因此更好的做法是先计算 branch-city=“Brooklyn”(branch)accountSilberschatz,Korth and Sudarshan14.34Database System Concepts 3rd Edition等价表达式的枚举等价表达

12、式的枚举n查询优化器利用等价规则来系统地生成等价于给定表达式的表达式n从概念上说,通过重复执行下列步骤直至找不到更多表达式来生成所有等价表达式:对每个当前找到的表达式,利用所有可用的等价规则,并增加新生成的表达式到当前找到的表达式集合中n上述方法的空间时间代价都很大n通过共享公共子表达式可减少空间需求:当利用一等价规则可从E2生成E1,通常两者仅有顶层不同,下面的子树是相同的,故可以共享例如当应用连接结合律时n通过不生成所有表达式可减少时间需求稍后详述Silberschatz,Korth and Sudarshan14.35Database System Concepts 3rd Editio

13、n计值计划计值计划n计值计划确切定义了对每个操作采用什么算法,以及各操作的执行是如何协调的.Silberschatz,Korth and Sudarshan14.36Database System Concepts 3rd Edition计值计划的选择计值计划的选择n选择计值计划时必须考虑计值技术的相互影响:独立地为每个操作都选择最廉价算法未必得到总体最佳算法.例如归并连接可能比散列连接代价高,但是可以提供有序输出,从而减少外层聚合操作的代价.嵌套循环连接可能提供流水线化的机会n实际的查询优化器结合了下列两大类方法的要素:1.搜索所有计划并以基于代价的方式选择最佳计划.2.利用启发式选择计划.

14、Silberschatz,Korth and Sudarshan14.37Database System Concepts 3rd Edition基于代价的优化基于代价的优化n考虑为r1 r2 .rn找到最佳连接次序.n上面的表达式共有(2(n 1)!/(n 1)!种不同的连接次序.当n=7,即有665280,当n=10,结果大于1760亿!n没有必要生成所有连接次序.利用动态规划,r1,r2,.rn的任何子集的最小代价连接次序只计算一次并保存为将来所用.Silberschatz,Korth and Sudarshan14.38Database System Concepts 3rd Edit

15、ion优化中的动态规划优化中的动态规划n要找到 n个关系的最佳连接树:为找到n个关系的集合S 的最佳方案,考虑所有可能的形如S1 (S S1)的方案,其中S1 是S 的任意非空子集.递归计算连接S 的子集的代价,以找出每个方案的代价.选择2n 1 种可能方案中代价最小者.当计算出任意子集的方案,保存该方案并在需要的时候重用之,而不是重新计算动态规划Silberschatz,Korth and Sudarshan14.39Database System Concepts 3rd Edition连接次序优化算法连接次序优化算法procedure findbestplan(S)if(bestplan

16、S.cost )return bestplanS/else bestplanS 早先尚未算出,现在计算之for each S的非空真子集的非空真子集S1P1=findbestplan(S1)P2=findbestplan(S-S1)A=连接P1和P2的结果的最佳算法cost=P1.cost+P2.cost+cost of Aif cost bestplanS.cost bestplanS.cost=costbestplanS.plan=“执行P1.plan;执行P2.plan;利用A连接P1和P2的结果”return bestplanSSilberschatz,Korth and Sudars

17、han14.40Database System Concepts 3rd Edition左深连接树左深连接树n左深连接树左深连接树中,每个连接的右手边输入是个关系,而不是中间连接的结果.Silberschatz,Korth and Sudarshan14.41Database System Concepts 3rd Edition优化的代价优化的代价n利用动态规划,优化的时间复杂度为O(3n).当n=10,此数为59000而不是1760亿!n空间复杂度为O(2n)n为找到n个关系的集合的最佳左深连接树:Consider n alternatives with one relation as r

18、ight-hand side input and the other relations as left-hand side input.Using(recursively computed and stored)least-cost join order for each alternative on left-hand-side,choose the cheapest of the n alternatives.nIf only left-deep trees are considered,time complexity of finding best join order is O(n

19、2n)Space complexity remains at O(2n)nCost-based optimization is expensive,but worthwhile for queries on large datasets(typical queries have small n,generally 10)Silberschatz,Korth and Sudarshan14.42Database System Concepts 3rd Edition基于代价优化中感兴趣的排序次序基于代价优化中感兴趣的排序次序n考虑表达式(r1 r2 r3)r4 r5n感兴趣的排序次序感兴趣的排序

20、次序是指元组的一个特定排序次序,对以后的操作可能有用.生成 r1 r2 r3 结果时在与r4 或r5的公共属性上排序,也许有用,但在仅与r1和r2公共的属性上排序就没用.利用归并连接计算 r1 r2 r3 可能代价更高,但可能提供按一个感兴趣的次序排序的输出.n对n个给定关系集合的每个子集找出最佳连接次序是不够的;必须对每个子集每个感兴趣排序次序找出最佳连接次序前面的动态规划算法做简单扩展通常,感兴趣次序的数目很小且不显著影响时间/空间复杂度Silberschatz,Korth and Sudarshan14.43Database System Concepts 3rd Edition启发式优

21、化启发式优化n即使用了动态规划,基于代价的优化还是很昂贵.n系统可以使用启发式来减少在基于代价方式中必须考虑的可选方案的数目.n启发式优化利用一套规则来转换查询树,这些规则通常(但不是所有情况)都能改善执行性能:尽早执行选择(减少元组数目)尽早执行投影(减少属性数目)在其他类似操作之前执行最具限制性的选择和连接操作.某些系统只使用启发式,其他的结合了启发式与部分基于代价的优化.Silberschatz,Korth and Sudarshan14.44Database System Concepts 3rd Edition典型的启发式优化步骤典型的启发式优化步骤1.分解合取选择成为一个单选择操作

22、序列(Equiv.rule 1.).2.将选择操作移到查询树下方以便尽早执行(Equiv.rules 2,7a,7b,11).3.首先执行能产生最小关系的选择和连接操作(Equiv.rule 6).4.笛卡儿积操作后接选择条件用连接操作替换(Equiv.rule 4a).5.将投影属性列表分解并尽可能移到查询树下方,必要时创建新投影(Equiv.rules 3,8a,8b,12).6.确认其操作可以流水线化的子树,并利用流水线执行之.Silberschatz,Korth and Sudarshan14.45Database System Concepts 3rd Edition查询优化器的结构

23、查询优化器的结构nSystem R/Starburst 优化器只考虑左深连接次序.这减少了优化复杂性并生成适应流水线计值的方案.System R/Starburst 还利用启发式来在查询树中下推选择和投影.nOracle的某些版本中使用了启发式优化:反复选出“最佳”下一步连接的关系从 n 个开始点的每一个开始.从中挑选最佳.n对于使用辅助索引的扫描,某些优化器考虑了包含元组的页在缓冲区中的概率.nSQL的复杂导致了查询优化的复杂如嵌套子查询Silberschatz,Korth and Sudarshan14.46Database System Concepts 3rd Edition查询优化器

24、的结构查询优化器的结构(续续)n某些查询优化器集成了启发式选择和替换存取方案的生成.System R 和Starburst 基于SQL嵌套块概念使用了层次式过程:启发式重写后接基于代价的连接次序优化.n即使使用了启发式,基于代价的查询优化仍然带来大量开销.n这个开销通常被查询执行时刻的节省更多地补偿,尤其是较慢的磁盘存取数目的减少.Silberschatz,Korth and Sudarshan14.47Database System Concepts 3rd EditionOptimizing Nested Subqueries*nSQL conceptually treats nested

25、 subqueries in the where clause as functions that take parameters and return a single value or set of valuesParameters are variables from outer level query that are used in the nested subquery;such variables are called correlation variablesnE.g.select customer-namefrom borrowerwhere exists(select*fr

26、om depositor where depositor.customer-name=borrower.customer-name)nConceptually,nested subquery is executed once for each tuple in the cross-product generated by the outer level from clauseSuch evaluation is called correlated evaluation Note:other conditions in where clause may be used to compute a

27、join(instead of a cross-product)before executing the nested subquerySilberschatz,Korth and Sudarshan14.48Database System Concepts 3rd EditionOptimizing Nested Subqueries(Cont.)nCorrelated evaluation may be quite inefficient since a large number of calls may be made to the nested query there may be u

28、nnecessary random I/O as a resultnSQL optimizers attempt to transform nested subqueries to joins where possible,enabling use of efficient join techniquesnE.g.:earlier nested query can be rewritten as select customer-namefrom borrower,depositorwhere depositor.customer-name=borrower.customer-nameNote:

29、above query doesnt correctly deal with duplicates,can be modified to do so as we will seenIn general,it is not possible/straightforward to move the entire nested subquery from clause into the outer level query from clauseA temporary relation is created instead,and used in body of outer level querySi

30、lberschatz,Korth and Sudarshan14.49Database System Concepts 3rd EditionOptimizing Nested Subqueries(Cont.)In general,SQL queries of the form below can be rewritten as shownnRewrite:select from L1 where P1 and exists(select*from L2 where P2)nTo:create table t1 as select distinct V from L2 where P21 s

31、elect from L1,t1 where P1 and P22P21 contains predicates in P2 that do not involve any correlation variablesP22 reintroduces predicates involving correlation variables,with relations renamed appropriatelyV contains all attributes used in predicates with correlation variablesSilberschatz,Korth and Su

32、darshan14.50Database System Concepts 3rd EditionOptimizing Nested Subqueries(Cont.)nIn our example,the original nested query would be transformed to create table t1 as select distinct customer-name from depositor select customer-name from borrower,t1 where t1.customer-name=borrower.customer-namenThe

33、 process of replacing a nested query by a query with a join(possibly with a temporary relation)is called decorrelation.nDecorrelation is more complicated when the nested subquery uses aggregation,or when the result of the nested subquery is used to test for equality,or when the condition linking the

34、 nested subquery to the other query is not exists,and so on.Silberschatz,Korth and Sudarshan14.51Database System Concepts 3rd Edition物化视图物化视图*n物化视图物化视图是计算并存储内容的视图.n考虑视图create view branch-total-loan(branch-name,total-loan)asselect branch-name,sum(amount)from loangroupby branch-namen如果频繁需要总贷款额,那么物化上述视

35、图将很有用节省了查找许多元组并累加其数额的工作Silberschatz,Korth and Sudarshan14.52Database System Concepts 3rd Edition物化视图维护物化视图维护n保持物化视图相对基础数据为最新的任务称为物化视图维护物化视图维护n物化视图可通过每次更新时重新计算得到维护n更好的选择时利用增量视图维护增量视图维护利用对数据库关系的改变来计算物化视图的改变,从而得到更新n视图维护可通过下列做到手工定义视图定义中每个关系上的插入,删除,修改触发器每当数据库关系被更新时利用手写代码更新视图数据库直接支持Silberschatz,Korth and

36、Sudarshan14.53Database System Concepts 3rd Edition增量视图维护增量视图维护n对一个关系或表达式的改变(插入和删除)称为其差异差异从r 中插入和删除的元组集合记为 ir 及 drn为简化我们的描述,我们只考虑插入和删除更新元组替换为先删除元组再插入更新后元组n我们描述当给定输入的改变时,如何计算每个关系操作结果上的改变n我们然后概述如何处理关系代数表达式Silberschatz,Korth and Sudarshan14.54Database System Concepts 3rd Edition连接操作连接操作n考虑物化视图v=r s 以及对r

37、的更新n令 rold 和 rnew 表示关系r 的新旧状态n考虑对r 的插入情况:我们可以将 rnew s 写为(rold ir)s并将上式重写为(rold s)(ir s)但(rold s)正是物化视图的旧值,故视图的增量改变正是 ir sn因此,对插入 vnew=vold(ir s)n类似地对删除 vnew=vold (dr s)Silberschatz,Korth and Sudarshan14.55Database System Concepts 3rd Edition选择和投影操作选择和投影操作n选择:考虑视图v=(r).vnew=vold(ir)vnew=vold-(dr)n投影

38、是更复杂的操作R=(A,B),且r(R)=(a,2),(a,3)A(r)具有单个元组(a).如果我们从r 中删除元组(a,2),我们不能从A(r)中删除元组(a),但是若我们又删除了(a,3),我们就应该删除该元组n对投影A(r)中的每个元组,我们将保存一个计数,即它被导出了多少次当对r 插入元组时,如果结果元组已经在A(r)中,我们就递增它的计数,否则加入一条新元组且计数值为1当对r 删除元组时,我们递减A(r)中对应元组的计数值若计数值变成0,我们从A(r)中删除该元组Silberschatz,Korth and Sudarshan14.56Database System Concepts

39、 3rd Edition聚合操作聚合操作ncount:v=Ag gcount(B)(r).当插入元组集合ir 时对ir 中的每条元组r,若对应分组已经存在于v中,我们递增其计数,否则增加一条新元组,其count=1当删除元组集合dr 时对ir 中的每条元组,我们查找t.A在v中的分组,并从该组计数中减去1.若计数变成0,从 v 中删除分组t.A的元组nsum:v=Ag gsum(B)(r)我们以类似于计数的方式来维护总和,只是要加/减B值而非对计数的加/减1另外我们维护计数以便发现没有元组的分组.这样的分组要从v中删除不能简单的测试 sum=0(why?)n为处理avg,我们维护sum 和co

40、unt 分别聚合值,最后再相除Silberschatz,Korth and Sudarshan14.57Database System Concepts 3rd Edition聚合操作聚合操作(续续)nmin,max:v=Ag gmin(B)(r).处理r 上插入是直截了当的.删除时维护聚合值 min 及 max 可能更昂贵.We我们必须检查同组中的 r 的其他元组以找出新的最大最小值Silberschatz,Korth and Sudarshan14.58Database System Concepts 3rd Edition其他操作其他操作n集合交:v=r s当插入一条元组到r 时我们检查

41、它是否存在于s中,如果是则加入到v 中.当从r 删除一条元组时,若它存在于交集中则删除之.更新s 是对称的其他集合操作(并与差)以类似方式处理.n外连接的处理方式基本与连接相同,但需要一些额外工作Silberschatz,Korth and Sudarshan14.59Database System Concepts 3rd Edition处理表达式处理表达式n为处理整个表达式,我们导出计算对每个子表达式结果的增量改变的表达式,从最小表达式开始.n例如考虑 E1 E2,其中E1和E2 都可能是复杂表达式假设要插入E1的元组集合由D1给出早先已算出,因为较小子表达式先处理则要插入E1 E2的元组

42、集合为 D1 E2这正是维护连接的通常的方法 Silberschatz,Korth and Sudarshan14.60Database System Concepts 3rd Edition查询优化与物化视图查询优化与物化视图n重写查询以利用物化视图:有物化视图v=r s 可用用户提交了查询 r s t我们可以重写查询为v t 是否这么做取决于两种选择的代价估计n用视图定义代替物化视图的使用:有物化视图v=r s 可用,但是上面无任何索引用户提交查询 A=10(v).再假设 s 在公共属性B上有索引,r 在属性A上有索引.此查询的最佳方案可能是用r s 替换 v,这导致查询方案A=10(r)

43、sn应该扩展查询优化器以考虑所有上述替代方案并选择最佳总体方案 Silberschatz,Korth and Sudarshan14.61Database System Concepts 3rd Edition物化视图选择物化视图选择n物化视图选择物化视图选择:“什么是最好的该物化的视图集合?”.此决定必须在系统负载负载的基础上做出n索引与物化视图一样,索引选择索引选择问题密切相关,尽管更简单些.n某些数据库系统提供工具来帮助数据库管理员做出索引和物化视图的选择.End of Chapter(Extra slides with details of selection cost estimat

44、ion follow)Silberschatz,Korth and Sudarshan14.63Database System Concepts 3rd EditionSelection Cost Estimate ExamplenNumber of blocks is baccount=500:10,000 tuples in the relation;each block holds 20 tuples.nAssume account is sorted on branch-name.V(branch-name,account)is 5010000/50=200 tuples of the

45、 account relation pertain to Perryridge branch200/20=10 blocks for these tuplesA binary search to find the first record would take log2(500)=9 block accessesnTotal cost of binary search is 9+10-1=18 block accesses(versus 500 for linear scan)branch-name=“Perryridge”(account)Silberschatz,Korth and Sud

46、arshan14.64Database System Concepts 3rd EditionSelections Using IndicesnIndex scan search algorithms that use an index;condition is on search-key of index.nA3(primary index on candidate key,equality).Retrieve a single record that satisfies the corresponding equality condition EA3=HTi+1nA4(primary in

47、dex on nonkey,equality)Retrieve multiple records.Let the search-key attribute be A.nA5(equality on search-key of secondary index).Retrieve a single record if the search-key is a candidate key EA5=HTi+1Retrieve multiple records(each may be on a different block)if the search-key is not a candidate key

48、.EA3=HTi+SC(A,r)Silberschatz,Korth and Sudarshan14.65Database System Concepts 3rd EditionCost Estimate Example(Indices)nSince V(branch-name,account)=50,we expect that 10000/50=200 tuples of the account relation pertain to the Perryridge branch.nSince the index is a clustering index,200/20=10 block r

49、eads are required to read the account tuples.nSeveral index blocks must also be read.If B+-tree index stores 20 pointers per node,then the B+-tree index must have between 3 and 5 leaf nodes and the entire tree has a depth of 2.Therefore,2 index blocks must be read.nThis strategy requires 12 total bl

50、ock reads.Consider the query is branch-name=“Perryridge”(account),with the primary index on branch-name.Silberschatz,Korth and Sudarshan14.66Database System Concepts 3rd EditionSelections Involving ComparisonsnA6(primary index,comparison).The cost estimate is:where c is the estimated number of tuple

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

16 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 查询优化

得力文库 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：查询优化.ppt
链接地址：https://www.deliwenku.com/p-82468282.html