Analyzing Schizophrenic-prone text from social media content: a novel approach through ML and NLP

Citation

Abstract

Schizophrenia is one of the destructive personality disorders where people have unusual interpretations of reality and are lured to develop harmful actions if not diagnosed promptly. This study focuses on identifying language patterns indicative of schizophrenic-prone texts in online communication and intends to contribute to the development of early intervention techniques in mental health utilizing ML and NLP methods. This study used two datasets to examine language patterns associated with schizophrenia in social media posts. The first dataset, Pre existing obtained from a repository focused on identifying schizophrenia-related postings, functions as a standard for comparison and evaluation. The second dataset, New scrapped obtained by extracting information from subreddits associated with schizophrenia, offers a more extensive range of language patterns. The dual-phase technique entails training models using the existing dataset and evaluating their performance on the newly collected dataset. The research uses various models, including transformer model BERT, recurrent neural network model Bi-LSTM, and GRU, as well as machine learning models such as Support Vector Classifier, Logistic Regression, Multinomial Naive Bayes, Random Forest, and Decision Tree to predict whether textual data is suggestive of schizophrenia. The language patterns of schizophrenic-prone texts differ from texts written by mentally-healthy individuals, encompassing phonological, morphological, and syntactic aspects. These models can analyze linguistic patterns and acquire knowledge about them. The results achieved after the training of the models are outstanding. The DistilBERT transformer model achieves 97% and 84% accuracy, GRU achieves high accuracy rates of 91% and 79%, the logistic regression machine learning model demonstrates impressive efficiency with accuracy rates of 93% and 83% respectively for Pre existing and New scrapped dataset. In order to ensure the models can effectively handle new data, we conducted a contemporary comparison. This analysis revealed that consistent data collection is necessary for accurate predictive results.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 80-81).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2024.

Publisher Link

Type

Thesis