Multimodal depression detection: leveraging textual and image-based social media data

Citation

Abstract

Depression is a prevalent and serious mental health condition characterized by persistent feelings of sadness, loss of interest, and significant impairment in daily functioning. The increasing global burden of depression highlights the urgency of developing more effective and timely detection methods. Traditional diagnostic methods face limitations such as subjective assessments, limited access to professional care, societal stigma, and delays in early detection, underscoring the necessity for innovative and accessible approaches. To address these challenges, our goal was to develop a robust multimodal machine learning framework to predict early depression interventions through textual & visual data obtained from social media platforms. By integrating user-generated textual posts with visual information from shared images, this study leverages advanced computational models, particularly CLIP, Time2Vec, BLIP-2, and Cross-Modal model, to capture intricate emotional and behavioral patterns associated with depressive symptoms. CLIP’s capability to align textual and visual representations, combined with Time2Vec’s efficiency in capturing temporal dynamics, significantly enhances the accuracy and depth of depression detection. Also, BLIP-2 frozen state time-sensitive effectiveness and cross-modal integration of fundamental embedding techniques helped us to compare and evaluate the model’s effectiveness. This multimodal approach effectively combines insights from both data modalities, overcoming the limitations of single-modality analyses and providing a better comparative understanding of users’ mental states. This research thoroughly discusses and addresses various challenges related to data quality, ethical considerations, and complexities associated with real-world multimodal datasets. Ethical handling of sensitive personal data, maintaining user privacy, and confirming transparency in algorithmic prediction decision giving which are critical aspects that have been meticulously considered throughout this study. The findings underscore the potential of social media as an effective platform for proactive mental health monitoring, early intervention, and the promotion of mental well-being. Further research is recommended to incorporate additional modalities like video and speech or audio, enhance real-time decision-making capabilities and ensure linguistic and cultural inclusivity. Addressing these future directions will significantly increase the practical applicability and generalizability of depression detection systems, paving the way for broader societal benefits and improved mental health outcomes.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 87-91).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.

Publisher Link

Type

Thesis