Projects per year
Abstract
Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentence-level information is ignored, and (2) external semantic knowledge regarding documents, sentences and words are not exploited for the training. To address these issues, we propose a variational autoencoder (VAE) NTM model that jointly reconstructs the sentence and document word counts using combinations of bag-of-words (BoW) topical embeddings and pre-trained semantic embeddings. The pre-trained embeddings are first transformed into a common latent topical space to align their semantics with the BoW embeddings. Our model also features hierarchical KL divergence to leverage embeddings of each document to regularize those of their sentences, thereby paying more attention to semantically relevant sentences. Both quantitative and qualitative experiments have shown the efficacy of our model in 1) lowering the reconstruction errors at both the sentence and document levels, and 2) discovering more coherent topics from real-world datasets.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing |
| Editors | Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih |
| Place of Publication | Stroudsburg PA USA |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 1042-1052 |
| Number of pages | 11 |
| ISBN (Electronic) | 9781955917094 |
| DOIs | |
| Publication status | Published - 2021 |
| Event | Empirical Methods in Natural Language Processing 2021 - Online, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 https://2021.emnlp.org/ (Website) https://aclanthology.org/2021.emnlp-main.0/ (Proceedings) https://aclanthology.org/2021.findings-emnlp.0/ (Proceedings - findings) |
Conference
| Conference | Empirical Methods in Natural Language Processing 2021 |
|---|---|
| Abbreviated title | EMNLP 2021 |
| Country/Territory | Dominican Republic |
| City | Punta Cana |
| Period | 7/11/21 → 11/11/21 |
| Internet address |
|
Projects
- 1 Finished
-
Time series classification for new-generation Earth observation satellites
Petitjean, F. (Primary Chief Investigator (PCI))
1/06/17 → 31/12/20
Project: Research