THESIS
2002
x, 154 leaves : ill. ; 30 cm
Abstract
Association and classification mining have long been considered as separate research and application areas. The starting point in this thesis is the observation of some key underlying similarities between these two apparently different areas. This observation makes possible the study of well-known classification techniques from an association mining perspective. This different perspective may enable a better understanding of the classification algorithms and help in devising improved or hybrid versions by combining elements from areas that would otherwise be considered incompatible. Our work on building local Bayesian classifiers from itemsets discovered with association mining methods is a result of this different perspective. A new classifier LB, or Local Bayes, is proposed in our wo...[
Read more ]
Association and classification mining have long been considered as separate research and application areas. The starting point in this thesis is the observation of some key underlying similarities between these two apparently different areas. This observation makes possible the study of well-known classification techniques from an association mining perspective. This different perspective may enable a better understanding of the classification algorithms and help in devising improved or hybrid versions by combining elements from areas that would otherwise be considered incompatible. Our work on building local Bayesian classifiers from itemsets discovered with association mining methods is a result of this different perspective. A new classifier LB, or Local Bayes, is proposed in our work and we show that it is very competitive against established and state of the art classification methods. In addition, LB shows how this different perspective in looking at classification methods can lead to concrete algorithms with superior performance.
Text classification is one of the traditional areas for the application of Machine Learning and Data Mining methods. While traditional classification algorithms have successfully been used for text classification, text collections are particularly suitable for association mining. This is because they usually contain hundreds or thousands of features (for example words) and these features tend to appear in text documents with certain dependencies. Moreover, these dependencies tend to appear locally and for a specific context and the content of words tends to change according to this context. The words “association” and “mining”, for example, express different notions when used separately than the expression “association mining”. Based on this observation we investigate the applicability of our context-specific classifier called LB in the domain of text classification. Our results are competitive with the most established text classification methods and our approach presents certain advantages, such as a good trade off between accuracy and scalability.
Facing the particular challenges of the text classification problem, we also propose a new text preprocessing method that simultaneously performs both feature and instance selection. This intuitive method is computationally efficient and, most importantly, our experimental results show that it produces outstanding results.
Post a Comment