Semi-Supervised and Unsupervised Machine Learning
Publication Date: December 2010 Hardback 256 pp.
The book addresses the main topics and techniques used for the rapid design, adaptation, and improvement of high-performance statistical spoken language dialog systems. Over the past few years, statistical methods (or pattern recognition techniques) have been applied to many areas involving the processing and treatment of text data. Spoken language dialog systems are no exception. For example, automated troubleshooting agents - third generation dialog systems performing problem solving tasks over the phone - attempt to manage the high complexity of dialog by escalating the speech utterance input by the user to a smaller subsystem which carries out specific steps relevant to the input speech. This strategy is commonly referred to as “automatic call routing”, and requires an appropriate analysis and categorization of the user’s speech components, which would previously have been transcribed into text using an automatic speech recognition module. The performance of the call routing module has a significant and direct influence on the overall success or failure of the dialog flow.
Focusing on third generation dialog systems, this book begins with a survey of techniques used for text-mining, supervised text categorization and information retrieval, as well as typical data preparation and feature reduction models. It also illustrates the main commonalities and differences between these fields. However, one main challenge and driver of current and future research is related to adaptability and portability issues: it is a matter of fact that third generation dialog systems, especially in the problem solving area, are subject to continuous domain fluctuations. Moreover, in a broad application domain, the systems will need to operate with different contextual data, and still provide adequate performances with minimum effort or time cost. The dialog system needs to be capable of adapting to new or different data. This means that there is a need for unsupervised analysis and knowledge discovery tools which assist the dialog in detecting potential new topics (e.g. an emerging problem type), or in discovering the structure of a new data collection, for which nothing else is known or assumed. In this field of application the book also provides an interesting survey of unsupervised methods including cluster analysis, cluster content representation and synthesis, cluster evaluation, unsupervised detection of the true number of clusters/classes in a data-set, and ensembles/agreement of clustering approaches.
Part 1. State of the Art
2. State of the Art in Clustering and Semi-Supervised Techniques.
Part 2. Approaches to Semi-Supervised Classification
3. Semi-Supervised Classification Using Prior Word Clustering.
4. Semi-Supervised Classification Using Pattern Clustering.
Part 3. Contributions to Unsupervised Classification – Algorithms to Detect the Optimal Number of Clusters
5. Detection of the Number of Clusters through Non-Parametric Clustering Algorithms.
6. Detecting the Number of Clusters through Cluster Validation.
About the Authors
Amparo Albalate is a research assistant at the University of Ulm, Institute of Information Technology, Germany, pursuing her PhD on statistical language understanding for Spoken Language Dialog Systems.
Wolfgang Minker is Professor at the University of Ulm, Institute of Information Technology, Germany.