Using linguistic and topic analysis to classify sub-groups of online depression communities

Thin Nguyen, Bridianne O’Dea, Mark Larsen, Dinh Phung, Svetha Venkatesh, Helen Christensen

Research output: Contribution to journalArticleResearchpeer-review

28 Citations (Scopus)


Depression is a highly prevalent mental health problem and is a co-morbidity of other mental, physical, and behavioural disorders. The internet allows individuals who are depressed or caring for those who are depressed, to connect with others via online communities; however, the characteristics of these discussions have not yet been fully explored. This work aims to explore the textual cues of online communities interested in depression. A total of 5,000 posts were randomly selected from 24 online communities. Five subgroups of online communities were identified: Depression, Bipolar Disorder, Self-Harm, Grief/Bereavement, and Suicide. Psycholinguistic features and content topics were extracted from the posts and analysed. Machine learning techniques were used to discriminate the online conversations in the depression communities from the other subgroups. Topics and psycholinguistic features were found to be highly valid predictors of community subgroup. Clear discrimination between linguistic features and topics, alongside good predictive power is an important step in understanding social media and its use in mental health.

Original languageEnglish
Pages (from-to)10653-10676
Number of pages24
JournalMultimedia Tools and Applications
Issue number8
Publication statusPublished - Apr 2017
Externally publishedYes


  • Depression
  • Feature extraction
  • Language styles
  • Mental health
  • Social media
  • Textual cues
  • Topics
  • Web community
  • Web-logs

Cite this