NLP自然语言处理—N-gram language model.ppt

上传人：豆****

文档编号：87440752

上传时间：2023-04-16

格式：PPT

页数：23

大小：361.50KB

( 4.5 )

《NLP自然语言处理—N-gram language model.ppt》由会员分享，可在线阅读，更多相关《NLP自然语言处理—N-gram language model.ppt（23页珍藏版）》请在得力文库 - 分享文档赚钱的网站上搜索。

1、NLP自然语言处理N-gramlanguagemodelLanguage ModelsFormal grammars(e.g.regular,context free)give a hard“binary”model of the legal sentences in a language.For NLP,a probabilistic model of a language that gives a probability that a string is a member of a language is more useful.To specify a correct probabili

2、ty distribution,the probability of all sentences in a language must sum to 1.N-Gram Model FormulasWord sequencesChain rule of probabilityBigram approximationN-gram approximationEstimating ProbabilitiesN-gram conditional probabilities can be estimated from raw text based on the relative frequency of

3、word sequences.To have a consistent probabilistic model,append a unique start()and end()symbol to every sentence and treat these as additional words.Bigram:N-gram:Generative Model&MLEAn N-gram model can be seen as a probabilistic automata for generating sentences.Relative frequency estimates can be

4、proven to be maximum likelihood estimates(MLE)since they maximize the probability that the model M will generate the training corpus T.Initialize sentence with N1 symbolsUntil is generated do:Stochastically pick the next word based on the conditional probability of each word given the previous N 1 w

6、a large corpus of text to estimate good parameter values.Model can be evaluated based on its ability to predict a high probability for a disjoint(held-out)test corpus(testing on the training corpus would give an optimistically biased estimate).Ideally,the training(and test)corpus should be represent

7、ative of the actual application data.May need to adapt a general model to a small amount of new(in-domain)data by adding highly weighted small corpus to original training data.Unknown WordsHow to handle words in the test corpus that did not occur in the training data,i.e.out of vocabulary(OOV)words?

8、Train a model that includes an explicit symbol for an unknown word().Choose a vocabulary in advance and replace other words in the training corpus with.Replace the first occurrence of each word in the training data with.Evaluation of Language ModelsIdeally,evaluate use of model in end application(ex

9、trinsic,in vivo)RealisticExpensiveEvaluate on ability to model test corpus(intrinsic).Less realisticCheaperVerify at least once that intrinsic evaluation correlates with an extrinsic one.PerplexityMeasure of how well a model“fits”the test data.Uses the probability that the model assigns to the test

10、corpus.Normalizes for the number of words in the test corpus and takes the inverse.Measures the weighted average branching factor in predicting the next word(lower is better).Sample Perplexity EvaluationModels trained on 38 million words from the Wall Street Journal(WSJ)using a 19,979 word vocabular

11、y.Evaluate on a disjoint set of 1.5 million WSJ words.UnigramBigramTrigramPerplexity962170109SmoothingSince there are a combinatorial number of possible word sequences,many rare(but not impossible)combinations never occur in training,so MLE incorrectly assigns zero to many parameters(a.k.a.sparse da

12、ta).If a new combination occurs during testing,it is given a probability of zero and the entire sequence gets a probability of zero(i.e.infinite perplexity).In practice,parameters are smoothed(a.k.a.regularized)to reassign some probability mass to unseen events.Adding probability mass to unseen even

13、ts requires removing it from seen ones(discounting)in order to maintain a joint distribution that sums to 1.Laplace(Add-One)Smoothing“Hallucinate”additional training data in which each possible N-gram occurs exactly once and adjust estimates accordingly.where V is the total number of possible(N1)-gr

14、ams(i.e.the vocabulary size for a bigram model).Bigram:N-gram:Tends to reassign too much mass to unseen events,so can be adjusted to add 01(normalized by V instead of V).Advanced SmoothingMany advanced techniques have been developed to improve smoothing for language models.Good-TuringInterpolationBa

15、ckoffKneser-NeyClass-based(cluster)N-gramsModel CombinationAs N increases,the power(expressiveness)of an N-gram model increases,but the ability to estimate accurate parameters from sparse data decreases(i.e.the smoothing problem gets worse).A general approach is to combine the results of multiple N-

16、gram models of increasing complexity(i.e.increasing N).InterpolationLinearly combine estimates of N-gram models of increasing order.Learn proper values for i by training to(approximately)maximize the likelihood of an independent development(a.k.a.tuning)corpus.Interpolated Trigram Model:Where:Backof

17、fOnly use lower-order model when data for higher-order model is unavailable(i.e.count is zero).Recursively back-off to weaker models until data is available.Where P*is a discounted probability estimate to reserve mass for unseen events and s are back-off weights(see text for details).A Problem for N

18、-Grams:Long Distance DependenciesMany times local context does not provide the most useful predictive clues,which instead are provided by long-distance dependencies.Syntactic dependencies“The man next to the large oak tree near the grocery store on the corner is tall.”“The men next to the large oak

19、tree near the grocery store on the corner are tall.”Semantic dependencies“The bird next to the large oak tree near the grocery store on the corner flies rapidly.”“The man next to the large oak tree near the grocery store on the corner talks rapidly.”More complex models of language are needed to hand

20、le such dependencies.SummaryLanguage models assign a probability that a sentence is a legal string in a language.They are useful as a component of many NLP systems,such as ASR,OCR,and MT.Simple N-gram models are easy to train on unsupervised corpora and can provide useful estimates of sentence likelihood.MLE gives inaccurate parameters for models trained on sparse data.Smoothing techniques adjust parameter estimates to account for unseen(but not impossible)events.此此课件下件下载可自行可自行编辑修改，修改，仅供参考！供参考！感感谢您的支持，我您的支持，我们努力做得更好！努力做得更好！谢谢!

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

15 金币

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: NLP自然语言处理N-gram language model NLP 自然语言处理 gram

得力文库 - 分享文档赚钱的网站所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

限制150内

关于本文

本文标题：NLP自然语言处理—N-gram language model.ppt
链接地址：https://www.deliwenku.com/p-87440752.html