机器学习生物信息学方法Machine Learning Approaches to Bioinformatics.doc
《机器学习生物信息学方法Machine Learning Approaches to Bioinformatics.doc》由会员分享,可在线阅读,更多相关《机器学习生物信息学方法Machine Learning Approaches to Bioinformatics.doc(339页珍藏版)》请在得力文库 - 分享文档赚钱的网站上搜索。
1、machine learning approachesto bioinformaticsSCIENCE, ENGINEERING, AND BIOLOGY INFORMATICSSeries Editor: Jason T. L. Wang(New Jersey Institute of Technology, USA)Published:Vol. 1:Advanced Analysis of Gene Expression Microarray Data(Aidong Zhang)Vol. 2:Life Science Data Mining(Stephen T. C. Wong & Chu
2、ng-Sheng Li)Vol. 3:Analysis of Biological Data: A Soft Computing Approach(Sanghamitra Bandyopadhyay, Ujjwal Maulik & Jason T. L. Wang)Vol. 4:Machine Learning Approaches to Bioinformatics (Zheng Rong Yang)Forthcoming:Vol. 5:Biodata Mining and Visualization: Novel Approaches(Ilkka Havukkala)machine le
3、arning approaches to bioinformaticszheng rong yangUniversity of Exeter, UK World ScientificN E W J E R S E Y L O N D O N S I N G A P O R E B E I J I N G S H A N G H A I H O N G K O N G TA I P E I C H E N N A IPublished byWorld Scientific Publishing Co. Pte. Ltd.5 Toh Tuck Link, Singapore 596224USA o
4、ffice: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601UK office: 57 Shelton Street, Covent Garden, London WC2H 9HEBritish Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library.Science, Engineering, and Biology Informatics Vol. 4MACHINE LE
5、ARNING APPROACHES TO BIOINFORMATICSCopyright 2010 by World Scientific Publishing Co. Pte. Ltd.All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval syst
6、em now known or to be invented, without written permission from the Publisher.For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the
7、publisher.ISBN-13 978-981-4287-30-2ISBN-10 981-4287-30-XPrinted in Singapore.PREFACEBioinformatics has been one of the most important multidisciplinary subjects in the last century. Initially, the major task of bioinformatics research was to handle large genomic data for knowledge extraction and for
8、 making predictions. More recently, the practices of bioinformatics have extended from genomics to proteomics, metabolomics, and most importantly systems biology. In addition to most traditional bioinformatics exercises which focus on large database management and sequence homology alignment for mol
9、ecular structure prediction and function annotation, modelling biological data using statistical/ machine learning has been an important trend. This part of the exercise has gained great attention because it can help carry out efficient, effective, and accurate knowledge extraction and prediction mo
10、del construction. However, the application of machine learning approaches in bioinformatics researches and practices has a series of challenges compared with other applications. The challenges include data size, data quality, and the imbalance between different data resources. These challenges are p
11、articularly obvious in systems biology research. For instance, genomics data size has a scale of around 25K, but proteomics data size can reach up to a scale of millions. Currently, it is hard to use modern computers to handle such large scale data in one machine learning model. Furthermore, due to
12、experimental variation, tissue corruption, and equipment resolution, most metabolite data suffer a problem of data quality. This casts a challenge in machine learning model construction in terms of data noise and missing data. In using next generation sequencing equipment such as Illumina, we are fa
13、ced with tega-byte of fragments of sequences. The challenge is how to assemblyvviMachine Learning Approaches to Bioinformaticsthese fragments accurately without any reference sequences. An urgent requirement in systems biology proposes to use different sources of data for analysing systems behaviour
14、. This then casts a challenge about how to efficiently incorporate these data with different resolutions, with different data format, with different data quality, and with different data dimensionalities in one machine learning model. This book therefore tries to discuss some of these challenges.Thi
15、s book is written based on my teaching and research notes in bioinformatics in the past ten years. I thank Prof Jason Wang and the publisher for inviting me to write this book. The book is written mainly for postgraduates and researchers at the start of their bioinformatics research and practice. Th
16、e pre-requisite to using this book is some basic linear algebra and statistics knowledge. The book can be used for both advanced undergraduate and postgraduate teaching reference. Readers are encouraged to be familiar with basic R programming before using this book as most case studies presented in
17、the book are implemented in R.The book is composed of three parts. The first part covers several unsupervised learning approaches which can be used in bioinformatics. For instance, multidimensional scaling is commonly used in bioinformatics for biological data visualisation. Various cluster analysis
18、 approaches as well as self-organising map have been used for biological pattern recognition. After data partitioning, molecules can then be clustered leading to prototype pattern discovery and new hypothesis generation.The second part mainly discusses supervised learning approaches. In many bioinfo
19、rmatics projects, a typical question is how to accurately predict unknowns based on experimental data. For instance, how can we identify the most important genes for most efficient and accurate disease diagnosis? Additionally, given a huge number of molecular sequence data in which most functions ar
20、e still unknown, how can we make prediction models based on limited information of known functions in sequence data? This part therefore introduces several commonly used supervised learning algorithms as well as their applications to bioinformatics.PREFACEviiThe third part of this book introduces th
21、e concepts relevant to computational systems biology which is now the most important research targets in bioinformatics. Computational systems biology research mainly focuses on large biological systems aiming to reveal the complex interplay between molecules and molecular entities. Gene network, sy
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 机器学习生物信息学方法Machine Learning Approaches to Bioinformatics 机器 学习 生物 信息学 方法 Machine
链接地址:https://www.deliwenku.com/p-80673437.html
限制150内