University of Sussex
Browse
9524-44419-1-PB.pdf (947.95 kB)

Dataless text classification with descriptive LDA

Download (947.95 kB)
conference contribution
posted on 2023-06-08, 20:34 authored by Xingyuan Chen, Yunqing Xia, Peng Jin, John Carroll
Manually labeling documents for training a text classifier is expensive and time-consuming. Moreover, a classifier trained on labeled documents may suffer from overfitting and adaptability problems. Dataless text classification (DLTC) has been proposed as a solution to these problems, since it does not require labeled documents. Previous research in DLTC has used explicit semantic analysis of Wikipedia content to measure semantic distance between documents, which is in turn used to classify test documents based on nearest neighbours. The semantic-based DLTC method has a major drawback in that it relies on a large-scale, finely-compiled semantic knowledge base, which is difficult to obtain in many scenarios. In this paper we propose a novel kind of model, descriptive LDA (DescLDA), which performs DLTC with only category description words and unlabeled documents. In DescLDA, the LDA model is assembled with a describing device to infer Dirichlet priors from prior descriptive documents created with category description words. The Dirichlet priors are then used by LDA to induce category-aware latent topics from unlabeled documents. Experimental results with the 20Newsgroups and RCV1 datasets show that: (1) our DLTC method is more effective than the semantic-based DLTC baseline method; and (2) the accuracy of our DLTC method is very close to state-of-the-art supervised text classification methods. As neither external knowledge resources nor labeled documents are required, our DLTC method is applicable to a wider range of scenarios.

History

Publication status

  • Published

File Version

  • Published version

Journal

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

Publisher

Association for the Advancement of Artificial Intelligence Press

Volume

3

Page range

2224-2231

Event name

29th AAAI Conference on Artificial Intelligence (AAAI-15)

Event location

Austin, Texas, USA

Event type

conference

Event date

January 25–30, 2015

ISBN

9781577357018

Department affiliated with

  • Informatics Publications

Full text available

  • Yes

Peer reviewed?

  • Yes

Legacy Posted Date

2015-04-23

First Open Access (FOA) Date

2015-04-23

First Compliant Deposit (FCD) Date

2015-04-23

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC