《径向基函数网络解读.ppt》由会员分享,可在线阅读,更多相关《径向基函数网络解读.ppt(45页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、1第四章第四章径向基函数网络径向基函数网络 Radial-Basis Function Networks2 BP多层前馈网络是应用极为广泛的模型。但是其学习算法具有计算量大、学习速度慢等缺点。径向基函数(Radial Basis Function,RBF)理论为多层前馈网络的学习提供了一种新颖而有效的手段。RBF网络不仅具有良好的推广能力,而且计算量小、速度快。和小波基函数神经网络、样条函数神经网络、正交函数神经网络类似,RBF网络属于核函数模型类。一、概述一、概述3.Input layerNonlinear transformation layer(generates local recep
2、tive fields)Linear outputlayer一、概述一、概述和MLP/BP网络类似,RBF网络是一个前馈网络模型。4.Wkj xdx(d-1)x2x1 input layer hidden layer(receptive fields)Output layerzcz1.zk netkyj 1 H jLinear act.function一、概述一、概述5From a function approximation perspectivethis is equivalent to implementing a complex function(corresponding to th
3、e nonlinearly separable decision boundary)using simple functions(corresponding to the linearly separable decision boundary)Implementing this procedure using a network architecture,yields the RBF networks,if the nonlinear mapping functions are radial basis functions.RBF网络的功能网络的功能 一、概述一、概述6l 若已知 和 ,通过
4、线性内插来逼近 设:分别代表 与 和 的距离则即 可表示为已知函数值的加权和(归一化权)l若推广到基于多个已知函数值的插值,则有:在P0个 中,只有那些与 距离小的起更大的作用一、概述一、概述7比如:有8样本(已知函数值)只要用四个样本 就可完成逼近的 内插如何选择有效的邻近节点(邻近样本)?如何决定加权系数?RBF神经网络能解决!一、概述一、概述8 给定一个给定一个n n 维空间中点集及相应实值维空间中点集及相应实值,i=1,2i=1,2n n,设计一个函数设计一个函数f(x)f(x),使它满足插值条件:,使它满足插值条件:RBFRBF:用范基函数加权用范基函数加权 将插值条件代入,得到关于
5、将插值条件代入,得到关于m个个未知未知w w的的m m个方程。个方程。传统方法:传统方法:通过学习,设法得到相应的参数通过学习,设法得到相应的参数Radial Basis Functions:Radial-basis functions were introduced in the solution of the real multivariate interpolation problem.Basis Functions:A set of functions whose linear combination can generate an arbitrary function in a gi
6、ven function space.Radial:Symmetric around its center9From a classification perspective:在低维空间非线性可分的问题总可以映射到一个高维空间,使其在此高维空间中为线性可分。RBF的输出单元部分构成一个单层感知机,只要合理选择隐单元数(高维空间的维数)和作用函数,就可以把原来的问题映射为一个线性可分问题。在RBF网络中,输入到隐层的映射是非线性的,而隐层到输出的映射则是线性的。一、概述一、概述10 圈圈1 1和圈和圈2 2中的样本数据分别属于一类,圈外样本属于中的样本数据分别属于一类,圈外样本属于另一类。另一类
7、。RBFRBF如何划分这两类?(非线性分类)如何划分这两类?(非线性分类)12x1x2-+-例例1+11x1x2(c1,x)11y 设:c1,c2 和 r1,r2 分别是圈1和圈2的中心和半径,样本x=(x1,x2)(c2,x)(c1,x)=1 if distance of x from c1 less than r1 and 0 otherwise(c2,x)=1 if distance of x from c2 less than r2 and 0 otherwise :Hyperspheric radial basis function一、概述一、概述12 通过隐层特征空间(c,x))的
8、作用,圈2中的样本被映射到(0,1),圈1 中的样本被映射到(1,0),圈外的样本均被映射到(0,0).这一两分类问题在隐层特征空间中变成线性可分!2(c1,x)-+-101(c2,x)1一、概述一、概述13二、二、RBF Network 性能性能.UjiWkj xdx(d-1)x2x1 inputnodes hidden layer RBFs(receptive fields)outputnodeszcz1.zk netkyj 1 H j x1xduJi:spread constantLinear act.function14Physical meanings:The radial basi
9、s function for the hidden layer.This is a simple nonlinear mapping function(typically Gaussian)that transforms the d-dimensional input patterns to a(typically higher)H-dimensional space.The complex decision boundary will be constructed from linear combinations(weighted sums)of these simple building
10、blocks.uji:The weights joining the first to hidden layer.These weights constitute the center points of the radial basis functions.:The spread constant(s).These values determine the spread(extend)of each radial basis function.Wjk:The weights joining hidden and output layers.These are the weights whic
11、h are used in obtaining the linear combination of the radial basis functions.They determine the relative amplitudes of the RBFs when they are combined to form the complex function.15nRBFRBF网络是一个两层前馈网网络是一个两层前馈网n隐层对应一组径向基函数,实现非线性映射隐层对应一组径向基函数,实现非线性映射 每一个隐层单元Ok的输出:l k 是高斯分布的期望值,又称中心值;k是宽度,控制围绕中心 的分布 l
12、每个隐单元基函数的中心可以看作是存储了一个已知的输入。当输 入 X 逼近中心时,隐单元的输出变大。这种逼近的测度可采用 Euclidean距离:|x-|n 输出单元进行加权线性组合,输出单元输出单元进行加权线性组合,输出单元j的输出为:的输出为:l 隐节点数对应所求问题,一般而言,等于学习样本数;二、二、RBF Network 性能性能 16l 三个隐单元具有不同的中心值。l 对某个输入值(如箭特头所示),RBF3输出最大。因为输入离 3 最近。l 每个RBF有一个接收场,即输入空间的某个区域或子空间(有生理 基础)1-Dimensional Gaussian Distribution二、二、
13、RBF Network 性能性能 17The hallmark of RBF networks is their use of nonlinear receptive fieldsThe receptive fields nonlinearly transforms(maps)the input feature space,where the input patterns are not linearly separable,to the hidden unit space,where the mapped inputs may be linearly separable.The hidden
14、 unit space often needs to be of a higher dimensionalityCovers Theorem(1965):A complex pattern classification problem that is nonlinearly separable in a low dimensional space,is more likely to be linearly separable in a high dimensional space.Nonlinear Receptive Fields二、二、RBF Network 性能性能 18 Center
15、of the functionSpread of the functionl 当中心确定后,分布就确定了基函数对输入的响应效果.l 高斯函数的分布越大,函数逼近就越平滑。但是如分布太大,意味作 需要很多隐节点来逼近一个曲折的函数,通用性变差。l 高斯函数若分布太小,这意味作需要很多隐节点来逼近一个平滑的函 数,网络的通用性较差。因为,此时一个隐单元函数仅对应样本集中 一个样本点,overfitting of training data=poor generalization on test dateGaussian functions are radially symmetric(RBF)二、
16、二、RBF Network 性能性能 19l 输入与高斯中心越近,隐节点的响应越大l 高斯基函数径向对称,即对于与中心径向距离相同的输 入,隐节点输出相同l 一般而言,基函数非线性形式对网络性能影响不大,关 键是函数中心的选取。l高斯函数具备如下优点:表示形式简单,即使对于多变量输入也不增加太多的 复杂性;光滑性好,任意阶导数均存在;表示简单、解析性好,便于进行理论分析二、二、RBF Network 性能性能 20权重需调整权重固定为1二、二、RBF Network 性能性能 21Multiquadrics for some and Inverse multiquadrics for some
17、 andGaussian functions for some andn隐节点的激励函数采用径向对称且衰减的非负非线性函数,隐节点的激励函数采用径向对称且衰减的非负非线性函数,二、二、RBF Network 性能性能 22 三、三、LearningWhat do we have to learn for a RBF NN with a given architecture?The centers of the RBF activation functionsthe spreads of the Gaussian RBF activation functionsthe weights from
18、the hidden to the output layerDifferent learning algorithms may be used for learning the RBF network parameters.23 设置:训练样本集训练样本集:任一样本:k=1,2,N.。实际输出:实际输出:期望输出:期望输出:j=1,2,J当基函数为高斯函数时:当基函数为高斯函数时:其中,其中,为高斯函数的方差为高斯函数的方差为高斯函数的中心,为高斯函数的中心,三、三、Learning24Centers:are selected at randomcenters are chosen rando
19、mly from the training setNote that H=N,for this caseSpreads:are chosen by normalization:Then the activation function of hidden neuron becomes:Learning Algorithm 1 三、三、Learning25不等宽度:对隐单元i而言,取其他中心与ci的距离的均值作为其宽 度 :其中,h为隐节点数;为聚类中心间最大距离。等宽度:n宽度确定其中,h为隐节点数 ci i为i单元的中心 三、三、Learning26Weights:are computed b
20、y solving a set of linear equations(by means of the pseudo-inverse method.)For an example ,consider the output of the networkWe would like for each example,that is 三、三、Learning27This can be re-written in matrix form for one example and for all the examples at the same time 三、三、Learning28let then we
21、can write If is the pseudo-inverse of the matrix we obtain the weights using the following formula 三、三、Learning29Learning Algorithm 1:summary 三、三、Learning30Too many receptive Fields?In order to reduce the artificial complexity of the RBF,we need to use fewer number of receptive fields.How about usin
22、g a subset of training data,say M curse of dimensionalityFFNN networks may have one or more hidden layers.39l Neuron Model:In RBF,the neuron model of the hidden neurons is different from the one of the output nodes.The hidden layer of RBF is non-linear,the output layer of RBF is linear Typically in
23、FFNN,hidden and output neurons share a common neuron model(are usually non-linear).The hidden neurons of an MLP compute the inner product between an input vector and their weight vector;RBFs compute the Euclidean distance between an input vector and the radial basis centers四、四、RBF与与MLP比较比较40Approxim
24、ation:RBF NN using Gaussian functions construct local approximations to non-linear I/O mapping.typically only a few hidden units are active for a given inputFF NN construct global approximations to non-linear I/O mapping.Typically many hidden units contribute to the output for a given input四、四、RBF与与
25、MLP比较比较41ldecision boundariesMLP partition feature space with hyper-planes;RBF decision boundaries are hyper-ellipsoidslAll the parameters in an MLP are trained simultaneously;parameters in the hidden and output layers of an RBF network are typically trained separately using an efficient,faster hybr
26、id algorithm四、四、RBF与与MLP比较比较42ClassificationMLPs separate classes via hyperplanesRBFs separate classes via hyperspheresLearningMLPs use distributed learningRBFs use localized learningRBFs train fasterX2X1MLPX2X1RBF四、四、RBF与与MLP比较比较43l training times The distributed representation of MLPs causes the e
27、rror surface to have multiple local minima and nearly flat regions with very slow convergence.As a result training times for MLPs are usually larger than those for RBFs四、四、RBF与与MLP比较比较44 generalization MLPs exhibit better generalization properties than RBFs in regions of feature space outside of the local neighborhoods defined by the training set.On the other hand,extrapolation far from training data is oftentimes unjustified and dangerousl MLPs typically require fewer parameters than RBFs to approximate a non-linear function with the same accuracy四、四、RBF与与MLP比较比较45四、四、RBF与与MLP比较比较
限制150内