文本挖掘（英文版）

文本挖掘（英文版）

《文本挖掘（英文版）》是 2009年08月人民郵電出版社出版的圖書，作者是費爾德曼。該書中涵蓋了核心文本挖掘操作、文本挖掘預處理技術、分類、聚類、信息提取、信息提取的機率模型、預處理套用、可視化方法、連結分析、文本挖掘套用等內容，很好地結合了文本挖掘的理論和實踐。

內容簡介

《文本挖掘(英文版)》是一部文本挖掘領域名著，作者為世界知名的權威學者。《文本挖掘(英文版)》非常適合文本挖掘、信息檢索領域的研究人員和實踐者閱讀，也適合作為高等院校計算機及相關專業研究生的數據挖掘和知識發現等課程的教材。

作者簡介

Ronen FeIdmarl，機器學習、數據挖掘和非結構化數據管理的先驅人物。以色列Bar一liarl大學數學與計算機科學系高級講師、數據挖掘實驗室主任，Clearforest公司（主要為企業和政府機構開發下一代文本挖掘套用）合作創始人、董事長，現在還是紐約大學Stern商學院的副教授。

James Sanger風險投資家，商業數據解決方案、網際網路套用和IT安全產品領域公認的行業專家。他於1982年與人合夥創立了ABS Vetllures公司。此前，他是DB Capital紐約公司的常務董事他本科畢業於賓夕法尼亞大學，研究生就讀於牛津大學和利物浦大學他是IEEE和美國人工智慧協會（AAAI）會員。

媒體推薦

“……我購買了這本書。這本書絕對是非常值得擁有的參考書。”

——L.Venkata Subramaniam，IBM印度研究實驗室

“一本由該領域最重要專家鯿寫的文本挖掘導論。這本書寫得非常好。完美地結合了文本挖掘的理論和實踐，既適合研究人員又適合實踐者……極力推薦那些沒有任何計算語言學背景而想鑽研文本挖掘領域的人閱讀本書。”

——Rada Mihalcea，北德克薩斯大學

文本挖掘已經成為令人興奮的新興研究領域。本書由世界知名的權威學者編寫，除了講解核心文本挖掘和鏈路檢測算法及技術之外，還介紹了高級預處理技術。並考慮了知識表示方面的因素以及可視化方法。此外。書中還探討了有關技術在實踐中的套用，很好地兼顧了文本挖掘的理論和實踐

目錄

I. Introduction to Text Mining 1

I.1 Defining Text Mining 1

I.2 General Architecture of Text Mining Systems 13

II. Core Text Mining Operations 19

II.1 Core Text Mining Operations 19

II.2 Using Background Knowledge for Text Mining 41

II.3 Text Mining Query Languages 51

III. Text Mining Preprocessing Techniques 57

III.1 Task-Oriented Approaches 58

III.2 Further Reading 62

IV. Categorization 64

IV.1 Applications of Text Categorization 65

IV.2 Definition of the Problem 66

IV.3 Document Representation 68

IV.4 Knowledge Engineering Approach to TC 70

IV.5 Machine Learning Approach to TC 70

IV.6 Using Unlabeled Data to Improve Classification 78

IV.7 Evaluation of Text Classifiers 79

IV.8 Citations and Notes 80

V. Clustering 82

V.1 Clustering Tasks in Text Analysis 82

V.2 The General Clustering Problem 84

V.3 Clustering Algorithms 85

V.4 Clustering of Textual Data 88

V.5 Citations and Notes 92

VI. Information Extraction 94

VI.1 Introduction to Information Extraction 94

VI.2 Historical Evolution of IE: The Message Understanding Conferences and Tipster 96

VI.3 IE Examples 101

VI.4 Architecture of IE Systems 104

VI.5 Anaphora Resolution 109

VI.6 Inductive Algorithms for IE 119

VI.7 Structural IE 122

VI.8 Further Reading 129

VII. Probabilistic Models for Information Extraction 131

VII.1 Hidden Markov Models 131

VII.2 Stochastic Context-Free Grammars 137

VII.3 Maximal Entropy Modeling 138

VII.4 Maximal Entropy Markov Models 140

VII.5 Conditional Random Fields 142

VII.6 Further Reading 145

VIII. Preprocessing Applications Using Probabilistic and Hybrid Approaches 146

VIII.1 Applications of HMM to Textual Analysis 146

VIII.2 Using MEMM for Information Extraction 152

VIII.3 Applications of CRFs to Textual Analysis 153

VIII.4 TEG: Using SCFG Rules for Hybrid Statistical–Knowledge-Based IE 155

VIII.5 Bootstrapping 166

VIII.6 Further Reading 175

IX. Presentation-Layer Considerations for Browsing and Query Refinement 177

IX.1 Browsing 177

IX.2 Accessing Constraints and Simple Specification Filters at the Presentation Layer 185

IX.3 Accessing the Underlying Query Language 186

IX.4 Citations and Notes 187

X. Visualization Approaches 189

X.1 Introduction 189

X.2 Architectural Considerations 192

X.3 Common Visualization Approaches for Text Mining 194

X.4 Visualization Techniques in Link Analysis 225

X.5 Real-World Example: The Document Explorer System 235

XI. Link Analysis 244

XI.1 Preliminaries 244

XI.2 Automatic Layout of Networks 246

XI.3 Paths and Cycles in Graphs 250

XI.4 Centrality 251

XI.5 Partitioning of Networks 259

XI.6 Pattern Matching in Networks 272

XI.7 Software Packages for Link Analysis 273

XI.8 Citations and Notes 274

XII. Text Mining Applications 275

XII.1 General Considerations 276

XII.2 Corporate Finance: Mining Industry Literature for Business Intelligence 281

XII.3 A “Horizontal” Text Mining Application: Patent Analysis Solution Leveraging a Commercial Text Analytics Platform 297

XII.4 Life Sciences Research: Mining Biological Pathway Information with GeneWays 309

Appendix A: DIAL: A Dedicated Information Extraction Language forText Mining 317

A.1 What Is the DIAL Language? 317

A.2 Information Extraction in the DIAL Environment 318

A.3 Text Tokenization 320

A.4 Concept and Rule Structure 320

A.5 Pattern Matching 322

A.6 Pattern Elements 323

A.7 Rule Constraints 327

A.8 Concept Guards 328

A.9 Complete DIAL Examples 329

Bibliography 337

Index 391

相關詞條

相關搜尋

熱門詞條

聯絡我們