Skip to main content


Psychometric properties and validation of a four-item version of the Strauss–Carpenter scale in bipolar disorder



Bipolar disorder is a chronic illness that impairs functioning and affects the quality of life of patients. The onset of this illness usually occurs at an early age, and the risk of relapse remains high for decades. Thus, due to the great clinical relevance of identifying long-term predictors of functioning in bipolar disorder, Strauss and Carpenter developed a scale composed of items known to have prognostic value.


To determine the clinical usefulness of the four-item Strauss–Carpenter scale in bipolar disorder, a 1-year prospective follow-up study was carried out. The internal consistency, convergent and discriminant validity, and test–retest reliability of the scale were assessed. We also compared the Strauss–Carpenter scale with the reference scales Global Assessment Functioning (GAF), Clinical Global Impression for Bipolar Disorder, the Modified Version (CGI-BIP-M) and the Sheehan Disability Scale (Sheehan). Additionally, a cut-off point for remission was established.


The total sample was composed of 98 patients with a diagnosis of bipolar disorder. The four-item version of the Strauss–Carpenter scale showed to have appropriate psychometric properties, comparable to those of reference scales. The best cut-off point for remission was 14.


The four-item version of the Strauss–Carpenter scale has suitable validity and reliability for the assessment of functioning in patients with bipolar disorder.


Bipolar disorder is a chronic illness that affects functioning and the quality of life of patients both, during manic and depressive episodes and during remission (Rosa et al. 2008; Burdick et al. 2010; Jiménez et al. 2012; Cotrena et al. 2015). Patients show different clinical characteristics, severity, comorbidities and treatment response. The prevalence of bipolar disorder I and II ranges from 2 to 4 % (Kessler et al. 1994, 2005, 2012).

Since the onset of bipolar disorder usually occurs at an early age and the risk of relapse persists for decades until the age of 70 years (Angst et al. 2002), the identification of long-term predictors of bipolar disorder risk is highly relevant to the clinical management and treatment of patients, as well as to the development of preventive strategies (Pinna and Manchia 2014).

In a longitudinal study, Strauss and Carpenter (1972) designed a four-item scale based on four areas of outcome dysfunction which had been used as criteria of outcome in other studies. The authors demonstrated that this scale was effective in predicting outcomes in schizophrenia with a follow-up of 2 years. Some decades later, Poirier et al. (2004) validated the French translation of a revised version of the latter scale in patients with schizophrenia (SCOCS-R). The scale showed a high reliability and validity. Nieman et al. (2013) analyzed an extended version of the Strauss–Carpenter scale (Strauss and Carpenter 1974) and found that it was effective in predicting transition to a first psychotic episode in patients at high risk of psychosis. This scale has been used in many studies on first psychotic episodes that included patients diagnosed either with schizoaffective disorder or bipolar disorder (Melle et al. 2000; Castro-Fornieles et al. 2011; Evensen et al. 2012; Barbeito et al. 2014; Jordan et al. 2014; Parellada et al. 2015). However, although the scale has been translated into Spanish for its use in schizophrenia (Ahuir et al. 2009), it has not been translated yet for its use in bipolar disorder, despite that Spanish is the second-most spoken language in the world. The objective of this study was to assess the reliability and validity of the four-item Strauss–Carpenter prognostic scale in measuring functioning in patients with bipolar disorder in the Spanish population.



A 1-year follow-up longitudinal study was performed to validate a brief version (four items) of the Strauss–Carpenter scale for patients with bipolar disorder. The reliability, internal consistency, convergent and predictive validity, and prognostic capacity of the scale were assessed. To such purpose, the level of correlation between the score obtained on the four-item scale and two reference variables of functioning was evaluated.


A total of 98 patients formed the final sample of the study. All subjects were aged between 18 and 65 years and met the diagnostic criteria for bipolar disorder type I as assessed by the Structured Clinical Interview for The Diagnostic for the Statistical Manual of Mental Disorders, 4th edition (DSM-IV), Axis I Disorders (SCID-I) (American Psychiatric Association 1994). The study was carried out at Araba University Hospital, Vitoria, Spain. All patients were included after informed consent for participation was obtained. The exclusion criteria were: organic brain disorder, comorbidity with organic mental retardation or clinical decompensation requiring hospitalization in an acute inpatient unit.


The original scale was translated from English to Spanish by two independent bilingual translators who were familiar with the content and purposes of the scale. Each of them made a forward translation of the scale and then both translations were merged into the final version of the scale. Subjects were evaluated using a protocol that included the following scales: the Strauss–Carpenter prognostic scale (Strauss and Carpenter 1972), was employed to assess the psychosocial functioning of patients. This scale consists of four items rated from 0 to 4 on a Likert-type scale and yields a total score that is calculated by the addition of all item scores: the higher the score, the better is the prognosis; Global Assessment Functioning (GAF) (Endincott et al. 1976), which also evaluates the general functioning of patients and in which a higher score indicates better functioning; the Clinical Global Impression for Bipolar Disorder, Modified Version (CGI-BIP-M) (Vieta et al. 2002), which assesses the severity of the disease; and the Sheehan Disability Scale (Sheehan et al. 1996), which assesses functional impairment in patients. The interpretation of the two latter scales is inverse; therefore, higher scores indicate higher gravity. All subjects were evaluated using this protocol at baseline and at 1-year follow-up.

Other relevant clinical and socio-demographic variables were also collected, such as age, gender, civil status, educational level, suicide attempts, substance use or number of hospitalizations and episodes in the last year.

The study was approved by the Ethics and Research Committee of the Araba University Hospital.

Statistical analysis

The internal consistency of the Strauss–Carpenter scale was examined by assessing the homogeneity of items using Cronbach’s alpha.

Convergent validity was calculated using Pearson’s correlation coefficient between the total score on Strauss–Carpenter scale and scores on the reference scales at baseline (GAF and Sheehan), as we considered they were continuous variables. The Spearman correlation was used to assess correlations with the CGI-BIP-M scale, as it has an ordinal measure. The predictive validity of the scale was assessed by calculating Spearman correlations between items of the Strauss–Carpenter scale (since they have an ordinal measure) and the reference scales at 1 year. In the case of the total score on the Strauss–Carpenter scale, its relation with the GAF and Sheehan scales was assessed by Pearson correlations. Consistency between values and test–retest reliability of the Strauss–Carpenter was evaluated by comparing baseline and 1-year values by intra-class correlation coefficients (ICC).

Finally, ROC curves were used to evaluate the discriminant capacity of the scale. The area under the curve (AUC) and cutoff point for remission were also determined.

All statistical analyses were performed using the IMB SPSS statistical software package versions 23 and R 3.1.2 (R Core Team 2014).


Socio-demographic data

Of the 98 patients included in the sample, men accounted for 66.3 % of the sample, and the mean age was 29.38 (8.11) years. Most subjects were single (85.4 %) and had primary education (41.2 %); seven (7.5 %) had attempted suicide. Regarding substance use, 27.7 % used alcohol, 48 % smoked cannabis and 28.9 % took other drugs (Table 1).

Table 1 Socio-demographic data at baseline (n = 98)

Psychometric characteristics

Internal consistency

Cronbach’s alpha for the items of the Strauss–Carpenter scale was 0.677.

Convergent validity

Pearson correlation between the total score on Strauss–Carpenter and the reference scales was significant and in the expected direction, both, for the CGI-BP-M and the Sheehan scale (p < 0.001) (Table 2).

Table 2 Convergent validity of the four-item Strauss–Carpenter scale

Test–retest reliability

Test–retest reliability was calculated using ICC. The total score on Strauss–Carpenter was found to have a good intra-class correlation (Table 3).

Table 3 Reliability analysis

Predictive validity

Table 4 shows the correlation between the Strauss–Carpenter scale (both of each item separately and of the total score) and 1-year values on the reference scales. A significant correlation was found between the Social Activity item (Item 3) and the total score on the Strauss–Carpenter scale and the three reference scales. The strongest correlation was observed with the Sheehan scale (rho = −0.50 and r = −0.57, respectively). The Hospitalization item (Item 1) and the Symptoms item (Item 4) showed to be significantly correlated with both CGI-BIP-M and Sheehan. Of note, the Hospitalization item was more strongly correlated with the CGI-BIP-M scale (rho = −0.37), whereas the Symptoms item was more significantly correlated with the Sheehan scale (rho = −0.48). Finally, a significant relationship was observed between the Work item (item 2) and GAF and Sheehan scale, but not with the CGI-BIP-M. Again, the strongest correlation was observed with the Sheehan scale (rho = −0.38) (Table 4).

Table 4 Correlations between Strauss and Carpenter and the reference scales

Discriminant capacity

The discriminant capacity of the four items of the Strauss–Carpenter scale was assessed using ROC curves. The area under the curve (AUC) was 0.784 (95 % CI 0.695–0.874), which indicates a good discriminant capacity, as it is close to 1, the maximum value (Fig. 1). Moreover, the best correlation between sensitivity and specificity (70 and 64.2 %, respectively) was obtained using a cutoff point of 14 in the total score on the four-item Strauss–Carpenter scale.

Fig. 1

ROC curve of the four-item Strauss–Carpenter scale


Currently, most instruments for assessing functional impairment in patients with bipolar disorder are based on global measures. The global assessment functioning (GAF) (Endincott et al. 1976) is the most commonly used tool for the evaluation of functioning. Nevertheless, several studies suggest that this scale might be mediated by symptoms (Samara et al. 2014; Suzuki et al. 2015).

There are few instruments that assess different areas of impairment and also have a prognostic value. The Functioning Assessment Short Test (FAST) validated by Rosa et al. (2007) is divided into six specific areas of functioning: Autonomy, Occupational Functioning, Cognitive Functioning, Finances, Personal Relationships and Leisure. This scale showed strong psychometric properties in the assessment of cognitive impairment in patients with bipolar disorder. Poirier et al. (2004) validated the translation of a revised version of the Strauss–Carpenter scale in schizophrenia. This version consisted of nine items and showed high reliability and validity. Ahuir et al. (2009) analyzed an extended version of the Strauss–Carpenter scale (Strauss and Carpenter 1974) in schizophrenia and obtained high values of validity and reliability. This confirmed its good predictive properties. However, no reliable and valid instrument has been designed yet that assesses functioning and also has a prognostic value in bipolar disorder.

The results obtained in this study showed that the four-item Strauss–Carpenter scale for patients with bipolar disorder have adequate psychometric characteristics for this population, both in terms of reliability and validity. Therefore, this is an adequate instrument with discriminant and prognostic capacity.

Regarding the internal consistency of the scale, although a psychometric instrument is generally considered reliable if Cronbach’s α > 0.70 (Bland and Altman 1997), in this study we have obtained a value of 0.677, which approaches this limit to be considered acceptable. Besides, it must be considered that this coefficient is affected by the length of the scale (Streiner 2003).

With regard to convergent validity, we did not observe a significant correlation between the Strauss–Carpenter and the GAF scale, although the CGI-BIP-M and the Sheehan scale were found to be strongly correlated. Further, as expected, these correlations were inverse.

Ahuir et al. (2009) also demonstrated that the 17-item version of the Strauss–Carpenter scale has a high convergent validity. In this case, the scale correlated significantly with the CGI, the World Health Organization Disability Assessment Schedule (WHO-DAS), the Positive And Negative Syndrome Scale (PANSS) and the Satisfaction With Life Domains Scale (SLDS), and also with the GAF.

Poirier et al. (2004) also obtained a high convergent validity although, in this case, correlation was with the Social and Occupational Functioning Assessment Scale (SOFAS).

In the analysis of the test–retest reliability, the items and total score on the Strauss–Carpenter scale showed adequate and high intra-class correlation coefficients (except for the hospitalization item). This confirms the stability and consistency of the first assessment of the test. The result for the Hospitalization item could be explained because one of the exclusion criteria was that patients were not hospitalized at recruitment, but they could have been hospitalized during follow-up. Thus, this item could have more variability in the test–retest.

Regarding the predictive validity of the four-item scale, the items with the best prognostic value were those that assess social activity and symptoms (Items 3 and 4), which showed a high correlation with the reference scales. The total score on Strauss–Carpenter was also significantly related to the three reference scales, showing a remarkably close relationship with the Sheehan scale (r = −0.573). As expected, this correlation supports the hypothesis on the predictive value of the scale. This agrees with the results published by Ahuir et al. (2009), who observed a significant correlation (p < 0.01) between Strauss and Carpenter and GAF, CGI and WHO-DAS.

Finally, the ROC curves confirmed that the four-item Strauss–Carpenter scale has a high discriminant capacity, with an area under the curve of 0.874. Moreover, we found that a cutoff point of 14 optimizes sensitivity (the probability that the test correctly detects subjects with a poor prognosis of functioning) and specificity (the probability that the test correctly detects subjects with a good prognosis of functioning).

This study has some limitations such as the small sample size and the short duration of follow-up (a year). Future studies should analyze the psychometric properties of the four-item Strauss–Carpenter scale in larger epidemiological samples and with a longer follow-up. Another limitation is that there was no control group in this study. Future studies should include a control group to confirm the results of this study.


The adaptation of the four-item Strauss–Carpenter scale to patients with bipolar disorder has adequate psychometric properties and is an acceptable and very useful instrument, not only for it shortness, but also for its prognostic capacity. This could have great clinical relevance for the clinical and pharmacological management of patients. In addition, the use of this tool facilitates earl intervention to prevent the unfavorable evolution of patients with a worse prognosis.



Strauss and Carpenter revised outcome criteria scale


Global Assessment Functioning


Clinical Global Impression for Bipolar Disorder Modified Version


intra-class correlation coefficient


area under the curve


Functioning Assessment Short Test


World Health Organization Disability Assessment Schedule


Positive and Negative Syndrome Scale


Satisfaction with Life Domains Scale


Social and Occupational Functioning Assessment Scale


  1. Angst F, Stassen HH, Clayton PJ, Angst J. Mortality of patients with mood disorders: follow-up over 34–38 years. J Affect Disord. 2002;68:167–81.

  2. American Psychiatric Association. Diagnostic and statistical manual of mental disorders (DSM-IV). 4th ed. Washington-DC: APA; 1994.

  3. Ahuir M, Bernardo M, de la Serna E, Ochoa S, Carlson J, Escartín G, et al. Adaptación y validación española de la Escala Pronóstica para la Esquizofrenia de Strauss y Carpenter. Revista de Psiquiatría y Salud Mental. 2009;2(4):150–9.

  4. Barbeito S, Vega P, Ruiz de Azua S, Balanza-Martinez V, Colom F, Lorente E, et al. Integrated treatment of first episode psychosis with online training (e-learning): study protocol for a randomised controlled trial. Trials. 2014;15:416.

  5. Bland JM, Altman DG. Cronbach’s alpha. BMJ. 1997;314:572.

  6. Burdick KE, Goldberg JF, Harrow M. Neurocognitive dysfunction and psychosocial outcome in patients with bipolar I disorder at 15-year follow-up. Acta Psychiatr Scand. 2010;122(Suppl 6):499–506.

  7. Castro-Fornieles J, Baeza I, de la Serna E, Gonzalez-Pinto A, Parellada M, Graell M, et al. Two-year diagnostic stability in early-onset first-episode psychosis. J Child Psychol Psychiatry. 2011;52(Suppl 10):1089–98.

  8. Cotrena C, Branco LD, Shansis FM, Fonseca RP. Executive function impairments in depression and bipolar disorder: association with functional impairment and quality of life. J Affect Disord. 2015;190:744–53.

  9. Endincott J, Spitzer RL, Fleiss JL, Cohen J. The global assessment scale. A procedure for measuring overall severity of psychiatric disturbance. Arch Gen Psychiatry. 1976;33:766–71.

  10. Evensen J, Røssberg JI, Barder H, Haahr U, Wt Hegelstad, Joa I, et al. Apathy in first episode psychosis patients: a 10 year longitudinal follow-up study. Schizophr Res. 2012;136(Suppl 1–3):19–24.

  11. Jiménez E, Arias B, Castellví P, Goikolea JM, Rosa AR, Fañanás L, et al. Impulsivity and functional impairment in bipolar disorder. J Affect Disord. 2012;136(Suppl 3):491–7.

  12. Jordan G, Lutgens D, Joober R, Lepage M, Iyer SN, Malla A. The relative contribution of cognition and symptomatic remission to functional outcome following treatment of a first episode of psychosis. J Clin Psychiatry. 2014;75(Suppl 6):e566–72.

  13. Kessler RC, McGonagle KA, Zhao S, Nelson CB, Hughes M, Eshleman S, et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States. Results from the National Comorbidity Survey. Arch Gen Psychiatry. 1994;51(1):8–19.

  14. Kessler RC, Chiu WT, Demler O, Merikangas KR, Walters EE. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. Arch Gen Psychiatry. 2005;62(6):617–27.

  15. Kessler RC, Petukhova M, Sampson NA, Zaslavsky AM, Wittchen H-U. Twelve-month and lifetime prevalence and lifetime morbid risk of anxiety and mood disorders in the United States. Int J Methods Psychiatr Res. 2012;21(3):169–84.

  16. Melle I, Friis S, Hauff E, Vaglum P. Social functioning of patients with schizophrenia in high-income welfare societies. Psychiatr Serv. 2000;51(Suppl 2):223–8.

  17. Nieman DH, Velthorst E, Becker HE, de Haan L, Dingemans PM, Linszen DH, et al. The Strauss and Carpenter prognostic scale in subjects clinically at high risk of psychosis. Acta Psychiatr Scand. 2013;127(Suppl 1):53–61.

  18. Parellada M, Castro-Fornieles J, Gonzalez-Pinto A, Pina-Camacho L, Moreno D, Rapado-Castro M, et al. Predictors of functional and clinical outcome in early-onset first-episode psychosis: the child and adolescent first episode of psychosis (CAFEPS) study. J Clin Psychiatry. 2015;76(Suppl 11):e1441–8.

  19. Pinna M, Manchia M. Prognostic models in bipolar disorder: can the prediction of the long-term clinical course rely on the integration of clinical and molecular data? Biomark Med. 2014;8(Suppl 3):371–4.

  20. Poirier S, Bureau V, Lehoux C, Bouchard RH, Maziade M, Pelletier S, et al. A factor analysis of the Strauss and Carpenter revised outcome criteria scale: a validation of the French translation. J Nerv Ment Dis. 2004;192(Suppl 12):864–7.

  21. R Core Team R. A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. 2014. URL

  22. Rosa AR, Sánchez-Moreno J, Martínez-Aran A, Salamero M, Torrent C, Reinares M, et al. Validity and reliability of the functioning assessment short test (FAST) in bipolar disorder. Clin Pract Epidemiol Ment Health. 2007;3:5.

  23. Rosa AR, Franco C, Martinez-Aran A, Sanchez-Moreno J, Reinares M, Salamero M, et al. Functional impairment in patients with remitted bipolar disorder. Psychother Psychosom. 2008;77:390–2.

  24. Samara MT, Engel RR, Millier A, Kandenwein J, Toumi M, Leucht S. Equipercentile linking of scales measuring functioning and symptoms: examining the GAF, SOFAS, CGI-S, and PANSS. Eur Neuropsychopharmacol. 2014;24(Suppl 11):1767–72.

  25. Sheehan DV, Harnett-Sheehan K, Raj BA. The measurement of disability. Int Clin Psychopharmacol. 1996;11(Suppl 3):89–95.

  26. Strauss JS, Carpenter WT Jr. The prediction of outcome in schizophrenia. I. Characteristics of outcome. Arch Gen Psychiatry. 1972;27(Suppl 6):739–46.

  27. Strauss JS, Carpenter WT Jr. The prediction of outcome in schizophrenia. II. Relationships between predictor and outcome variables: a report from the WHO international pilot study of schizophrenia. Arch Gen Psychiatry. 1974;31(Suppl 1):37–42.

  28. Suzuki T, Uchida H, Sakurai H, Ishizuki T, Tsunoda K, Takeuchi H, et al. Relationships between global assessment of functioning and other rating scales in clinical trials for schizophrenia. Psychiatry Res. 2015;227(Suppl 2–3):265–9.

  29. Streiner DL. Starting at the beginning: an introduction to coefficient alpha and internal consistency. J Pers Assess. 2003;80(1):99–103.

  30. Vieta E, Torrent C, Martínez-Arán A, Colom F, Reinares M, Benabarre A, et al. A user-friendly scale for the short and long term outcome of bipolar disorder: the CGI-BP-M. Actas Esp Psiquiatr. 2002;30(Suppl 5):301–4.

Download references

Authors’ contributions

All authors collaborated in the recruitment of patients. GP developed the hypothesis and design of the study and managed literature searches. SA undertook statistical analysis and wrote the draft manuscript, and all authors supervised and made contributions to it. GP reviewed the final draft thoroughly and was responsible for the last version. All authors read and approved the final manuscript.


We would like to thank the following institutions: the Spanish Government, cofinancing FEDER, Carlos III Health Institute (PI13/00451, PI12/02077, PS09/02002); the Basque Foundation for Health Innovation and Research (BIOEF); Networking Center for Biomedical Research in Mental Health (CIBERSAM) (13BICIB04); the University of the Basque Country (GIC12/84); local grants from the Department of Education, Linguistic Policy and Culture of the Basque Country Government (2013111162) and “FI-STAR” (FI-STAR project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No 604691) (UE/2012/FI-STAR). The psychiatric research department in University Hospital Araba is supported by the Stanley Research Foundation (03-RC-003). We want to thank all patients with bipolar disorder who agreed to collaborate in this study.

Competing interests

Dr. González-Pinto has received grants and served as consultant, advisor or CME speaker for the following entities: Almirall, AstraZeneca, Bristol-Myers Squibb, Cephalon, Eli Lilly, Glaxo-Smith-Kline, Janssen-Cilag, Jazz, Johnson & Johnson, Lundbeck, Merck, Otsuka, Pfizer, Sanofi-Aventis, Servier, Shering-Plough, Solvay, Rovi, Roche, Ferrer, the Spanish Ministry of Science and Innovation (CIBERSAM), the Ministry of Science (Carlos III Health Institute), the Basque Government, the Stanley Medical Research Institute, and Wyeth.

The rest of the authors have no competing interests to declare.

Author information

Correspondence to Ana González-Pinto.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Alberich, S., Barbeito, S., González-Ortega, I. et al. Psychometric properties and validation of a four-item version of the Strauss–Carpenter scale in bipolar disorder. Int J Bipolar Disord 4, 22 (2016).

Download citation


  • Strauss–Carpenter
  • Functioning
  • Bipolar
  • Prognosis
  • Outcome