Background The speech of patients with schizophrenia is often described as monotonous, flat and without emotion. Distinctive speech patterns are qualitatively assessed in the diagnostic process and deeply impact the quality of everyday social interactions. In this project, we investigate and model speech patterns of people with schizophrenia contrasting them with matched controls and in relation to positive and negative symptoms. We employ both traditional measures (pitch mean and range, pause number and duration, speech rate, etc.) and 2) non-linear techniques measuring the temporal structure (regularity and complexity) of speech. Our aims are (1) to achieve a more fine-grained understanding of the speech patterns in schizophrenia than has previously been achieved using traditional, linear measures of prosody and fluency, and (2) to employ the results in a supervised machine-learning process (discriminant function) to classify speech production as either belonging to the control or the schizophrenia group, based solely on acoustic features. Methods We analyze the speech production of 57 Danish adults with first-episode of schizophrenia (23F 34M, Mean Age=22.93 SD=3.46) and 57 matched controls. All participants underwent extensive diagnostic interviews and symptoms-related questionnaires: Schedules for Clinical Assessment in Neuropsychiatry and Scale for Assessment of Negative/Positive Symptoms for schizophrenia (SANS and SAPS). Our analysis is based on previously acquired narratives of the Frith-Happé triangles with 8 narratives per subject. We extracted basic measures of pause behavior (Number of Pauses, Average Length), fundamental frequency (Mean, Range) and speech rate as well as measures of regularity and complexity of the temporal dynamics (Detrended Fluctuation Analysis and Recurrence Quantification Analysis) of these three aspects of speech. The most relevant features were selected via ElasticNet (10-fold cross-validation, Alpha=.5). Diagnosis was predicted using a 10-fold cross-validated discriminant function (Mahalanobis rule). Accuracy was balanced using Variational Bayesian mixed-effects inference. SANS and SAPS scores were predicted using a 10-fold cross-validated multiple linear regression. Both analyses were iterated 1000 to test for stability of results. Results: Voice dynamics allowed discrimination of patients with schizophrenia from healthy controls with a balanced accuracy of 85.68% (p<0.000001, Confidence Intervals: 82.50–86.97%), a sensitivity of 81.27% and a specificity of 86.97%. Voice dynamics explained 26.76% (measured as Adjusted R Square, p<0.000001, CI=26.07-27.45%) of the variance of SANS scores and 20.33% (p<0.00000001, CI=19.76-20.90%) of SAPS scores. In comparison to healthy controls, the model developed characterizes schizophrenics’ speech as: i) Slower and with longer pauses; ii) Less structured, that is, with fewer repetitions of fundamental frequency sequences; iii) More “stable”, that is, the same low level of regularity is kept constant over time, while the controls tend to vary the amount of repetitions over time. Discussion: The study points toward the usefulness of non-linear time series analyses techniques in picking out the subtle differences that characterize the unusual voice characteristics of people with schizophrenia and in relating them to the symptoms. Automated analysis of voice dynamics reveals potential for the assessment and monitoring of the disorder. Future work includes further validation of the approach, as well as more detailed investigation of the relation between speech patterns and other symptoms.
Main Research Area:
Schizophrenia International Research Conference, 2014