Prediction of age, sentiment, and connectivity from social media text

Thin Nguyen, Dinh Phung, Brett Adams, Svetha Venkatesh

Research output: Chapter in Book/Report/Conference proceedingConference PaperResearchpeer-review

10 Citations (Scopus)

Abstract

Social media corpora, including the textual output of blogs, forums, and messaging applications, provide fertile ground for linguistic analysis material diverse in topic and style, and at Web scale. We investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and author mood, of a large corpus of blog posts, to analyze the impact of age, emotion, and social connectivity. These properties are found to be significantly different across the examined cohorts, which suggest discriminative features for a number of useful classification tasks. We build binary classifiers for old versus young bloggers, social versus solo bloggers, and happy versus sad posts with high performance. Analysis of discriminative features shows that age turns upon choice of topic, whereas sentiment orientation is evidenced by linguistic style. Good prediction is achieved for social connectivity using topic and linguistic features, leaving tagged mood a modest role in all classifications.

Original languageEnglish
Title of host publicationWeb Information System Engineering, WISE 2011 - 12th International Conference, Proceedings
Pages227-240
Number of pages14
DOIs
Publication statusPublished - 19 Oct 2011
Externally publishedYes
Event12th International Conference on Web Information System Engineering, WISE 2011 - Sydney, Australia
Duration: 13 Oct 201114 Oct 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6997 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th International Conference on Web Information System Engineering, WISE 2011
CountryAustralia
CitySydney
Period13/10/1114/10/11

Cite this