The course of bipolar disorder in pregnant versus non-pregnant women

Background and rationale Although it has been suggested that pregnancy may influence the course of bipolar disorder (BD), studies show contradictory results. Until now, no studies included a finegrained validated method to report mood symptoms on a daily basis, such as the lifechart method (LCM). The aim of the present study is to investigate the course of BD during pregnancy by comparing LCM scores of pregnant and non-pregnant women. Methods Study design: Comparison of LCM scores of two prospective observational BD cohort studies, a cohort of pregnant women (n = 34) and a cohort of non-pregnant women of childbearing age (n = 52). Main study parameters are: (1) proportions of symptomatic and non-symptomatic days; (2) symptom severity, frequency, and duration of episodes; (3) state sequences, longitudinal variation of symptom severity scores. Results No differences in clinical course variables (symptomatic days, average severity scores, frequency, and duration of episodes in BD were found between pregnant and non-pregnant women. With a combination of State Sequence Analysis (SSA) and cluster analysis on the sequences of daily mood scores three comparable clusters were found in both samples: euthymic, moderately ill and severely ill. The distribution differences between pregnant and non-pregnant women were significant, with a majority of the pregnant women (68%) belonging to the moderately ill cluster and a majority of the non-pregnant women (46%) to the euthymic cluster. In pregnant women the average daily variation in mood symptoms as assessed with Shannon’s entropy was less than in non-pregnant women (respectively 0.43 versus 0.56). Conclusions Although the use of daily mood scores revealed no difference in overall course of BD in pregnant versus non-pregnant women, more pregnant than non-pregnant women belonged to the moderately ill cluster, and during pregnancy the variation in mood state was less than in non-pregnant women. Further research is necessary to clarify these findings.


Introduction and rationale
Bipolar disorder (BD) is a recurrent mental illness characterized by depressive, hypomanic, and/or manic episodes separated by euthymic intervals and usually manifests in young adulthood (Goodwin and Jamison 2007). The lifetime prevalence ranges from 1.3% to 2.4% (Merikangas et al. 2011;Graaf et al. 1996).
Patients with BD show considerable illness-related morbidity (Post et al. 2003;Ferrari et al. 2016) and the disorder significantly influences their wellbeing and social, occupational, and general functioning (Altshuler et al. 2006;Bonnin et al. 2012;Parker et al. 2018).
In clinical practice, women with BD often ask their physician about the impact of pregnancy on the course of their illness. However, research into the relation between pregnancy and course of BD is scarce. Unlike repeated findings that the postpartum period has a negative influence on the course of BD, with an overall risk of relapse of 35% (66% or medication-free women and 23% for women who used prophylactic medication) (Wesseloo et al. 2016), the impact of pregnancy itself on the course BD is still uncertain, and various studies have reported conflicting results (McNeil et al. 1984;Sharma and Persad 1995).
While some population-based studies suggest that pregnancy could be protective with low rates of new onset and relapse during this period (Munk-Olsen et al. 2006, clinical studies provide conflicting findings. Most of the older studies are retrospective, and most of the prospective studies report high recurrence rates in women who discontinue mood stabilizers (Sharma and Pope 2012).
In a retrospective study (Viguera et al. 2011), clinical data were pooled of 2252 pregnancies of women with BD and unipolar depression. Rates of affective episodes and risk factors were identified during pregnancy and the postpartum period. Among women with BD, 23% had illness episodes during pregnancy, compared to 4.6% of women with unipolar depression. Risk factors were younger age at illness onset, previous postpartum episodes, shorter duration of illness, having fewer children, and not being married. Freeman et al. (2002) interviewed 30 women with BD after pregnancy with a structured clinical interview, and found that 15 (50%) reported no change or fewer mood symptoms during pregnancy, while the other half reported more symptoms. The experience of worsening of mood symptoms during pregnancy also predicted postpartum recurrence (Freeman et al. 2002). A limitation of this study is that the assessment was retrospective and thus prone to recall bias.
Grof et al. found a protective effect of pregnancy on the frequency and duration of mood episodes in a sample of 28 women with BD (with 56 pregnancies), who had become pregnant prior to receiving successful lithium prophylactic treatment (Grof et al. 2000). Retrospectively, they compared illness severity during the 9 months of pregnancy with the 9 months before pregnancy intraindividually. The recurrence risk during pregnancy was markedly lower and recurrences of mood episodes were significantly shorter during pregnancy in comparison to pre-pregnancy.
In a recent review of the influence of pregnancy on the course of BD we concluded that despite the importance of the topic there is a paucity of evidence on recurrence rates of mood episodes during pregnancy among women with BD (Stevens et al. 2019). Another review also stated that the literature cannot answer the question of how pregnancy affects the course of BD, but merely informs us about the effect of discontinuation of medication in pregnancy (Salim et al. 2018). Retrospective studies are more sensitive to recall bias and results are therefore less reliable, while prospective studies focused mainly on the effect of discontinuation of medication and hardly on the relation between pregnancy and course of illness.
Moreover, none of these studies used a detailed mood monitoring method to assess course of illness during pregnancy. The LifeChart Method (LCM) Denicoff et al. 2000) is a prospective assessment of fluctuations and severity in mood on a daily basis, resulting in more precision, with less risk for recall bias as compared with retrospective self reported data that were collected in most studies (Draisma et al. 2015).
The aim of the present study is to prospectively investigate the relationship between pregnancy and the course of BD.

Methods
LCM-data of two observational, prospective cohort studies were used to compare the course of BD in pregnant versus non-pregnant women.

Study samples
The research samples were (1) pregnant Dutch women with an established DSM-IV diagnosis of BD, who participated in the Sleepreg-BD study between 2012 and 2018 (n = 34), and (2) non-pregnant Dutch women of childbearing age with an established DSM-IV diagnosis of BD, who participated in The Stanley Foundation Bipolar Treatment Outcome Network study between 1995 and 2000 (n = 52).
The Sleepreg-BD study was a multi-site study in the Netherlands investigating the effect of sleep disturbance in pregnancy and the perinatal period on postpartum psychopathology in Dutch women with BD (Stevens et al. 2014). Pregnant women, aged 18-45 year, with a diagnosis of BD were asked to fill in the LCM from week 13 of pregnancy till 12 weeks postpartum. To avoid including women whose pregnancy ended prematurely, we recruted pregnant women at the end of the first trimester of pregnancy. To investigate the illness course during pregnancy, for this study LCM postpartum scores were not included in the analysis. The LCM was completed by 46 women, of these 34 met the inclusion criterion of at least 60 days of LCM reports.
The Stanley Foundation Bipolar Network (SFBN) was a multi-site research program coordinated at the National Institute of Mental Health (NIMH), with four clinical centers in the USA, two in Germany and one in the Netherlands, as described in detail elsewhere Post et al. 2001). Its main aim was to evaluate the long-term illness course and define longitudinal illness patterns, and to investigate the effectiveness of conventional and novel pharmacological treatments in a large group of patients with BD. The SFBN database consisted of over 900 patients, including n = 174 from the Netherlands. For this current study we included data of the 52 non-pregnant women, age 18-45 year, and included in the Netherlands, who had completed at least one full year of LCM. Since the pregnant women delivered a maximum of 272 daily severity scores, we selected the 272 completed last lifechart ratings from the first year of prospective follow-up of the SFBN women for comparison with the Dutch women of the Sleepreg-BD study.

Assessment instruments and outcome variables Patient and clinician questionnaires:
The Stanley Foundation Bipolar Treatment Outcome Network designed two questionnaires to collect basic demographic and clinical data from both patients and clinicians. These questionnaires (Questionnaire for Bipolar disorder, QBP) generate a comprehensive overview of the characteristics of participants as well as illness and treatment history Suppes et al. 2001), and were translated into Dutch.
QBP reports of medication use were transformed into categories (0: no medication; 1: one mood stabilizer; 2: more than one mood stabilizer; 3: mood stabilizer and other psychotropic medications; 4: only other psychotropic medications).

LCM
The LCM provides a graphic representation of minor mood swings and major mood episodes, and can be used both retrospectively and prospectively (Denicoff et al. 2000(Denicoff et al. ,1997. The impact of daily mood symptoms on functioning is rated on a 5-point scale (0 = no dysfunction or euthymia, 1 = mild, 2 = low moderate, 3 = high moderate, 4 = severe dysfunction). Since severity is rated for two poles: mania and depression, this results in nine different ratings. Patients of both cohort studies were asked to complete a prospective LCM on a daily basis during the entire study period. Illness course variables could be calculated, thus allowing longitudinal assessment of illness patterns (Born et al. 2009;Kupka et al. 2007;Nolen et al. 2004). Relevant variables were: number, duration, and severity of mood symptoms and mood episodes, and proportion of time ill during the observation period. A validation study reported high correlations between LCM ratings and ratings on the Young Mania Rating Scale (YMRS) (r = 0.656, p < 0.001) and the Inventory of Depressive Symptomatology-Clinician (IDS-C) (r = 0.875, p < 0.001) (Denicoff et al. 2000). Draisma et al. found a Spearman's Rho of 0.61 between LCM depression scores and Clinical Global Impression (CGI-BD) rated depression, and a Rho of 0.63 between LCM mania scores and CGI-C rated mania in a sample of Dutch patients with BD (n = 137), denoting a strong association between LCM ratings and CGI scores (Draisma et al. 2015).
Thus, main study parameters where threefold: (1) demographical variables at baseline: age, marital status, educational level and work status at baseline; (2) clinical variables at baseline consisting of diagnosis, illness duration, age of onset, number of lifetime manic and depressive episodes, number of hospitalizations, use of medication, lifetime alcohol/drugs abuse, number of serious suicide attempts; and (3) LCM derived clinical variables such as proportion of time ill or impaired, number of days with scores not equal to five (i.e. the euthymic state), average illness severity scores, average duration of episodes, and frequency of episodes.

Statistical analysis
Analyses were done in four steps: (1) demographic and clinical characteristics of the pregnant and non-pregnant samples were compared with descriptive statistics; (2) complete series of illness states -the LCM mood scores -were analyzed and typified for the two samples with the use of state sequence analysis (SSA); (3) cluster analysis was performed on the state sequences; and (4) two regression analyses were done, one on clusters with a set of predictors and another on variation in daily states with the same set of predictors.
The set of predictors were illness duration, marital status, work status, educational level, use of medication (Viguera et al. 2011;Akdeniz et al. 2003).
Differences in demographical and clinical variables were analyzed with descriptive statistics, using t-tests for continuous variables with normal distribution, Mann-Whitey tests for non-normal continuous variables and chi square tests for categorical variables. Descriptive analyses were performed in SPSS version 27. Missing lifechart scores of the 34 pregnant women within the observation period (less than 0.3% of the data) were imputed through intrapolation of the surrounding daily scores.
Complete series of illness states as expressed in fluctuations of daily severity scores of the pregnant and non-pregnant women were analysed with SSA (Roux et al. 2019;Gabadinho et al. 2011). The time series of a maximum of 272 daily severity scores resulted in socalled sequences of successive states. The goal of SSA is to describe complete sequences of events as trajectories of subjects though possible states. It is a non-parametric approach with no assumptions regarding underlying processes and aimed at description of sequences as a whole (Courgeau 2018).
Dissimilarities between state sequences were calculated with a dissimilarity measure based on the number of operations necessary to translate a specific sequence into another. The similarities and differences were based on optimal matching techniques. Subsequently, clustering methods were applied to build similar types of sequences (Abbott 1990). By clustering sequences, groups were formed that were as homogenous as possible within the group and as different as possible from other groups. In the analysis, hierarchical clustering was applied using Ward's linkage on the distance matrix. The clustering procedures resulted in a typology of sequence courses. Sequences within a cluster had the lowest dissimilarity scores with each other, and between clusters dissimilarity was optimal. Distributions of women over the clusters found in the sample of pregnant women (n = 34) were subsequently compared to those found in the non-pregnant women (n = 52).
Shannon entropy, an indicator of diversity of states (~ severity scores) within sequences was calculated for all day. Entropy can vary between 0 (all sequences in the same state at time T), to 1 (maximum diversity at time T).
Multinomial regression of clusters on the predictors was performed. In the same vain regression of entropy on this set of covariates was applied.
The package TraMineR (version 2.2-0.1) within R-software was used for sequence analysis.

Results
First, descriptive statistics with respect to the comparisons of sample characteristics are presented in Tables 1, 2, 3. Next the distribution of states over days is presented in an index plot for all women in Fig. 1. Cluster solutions for both samples are also presented graphically as index plots in Fig. 2. Mean time spent in a cluster is given in Fig. 3. Results of multinomial regression of clusters in the combined set of samples on predictors are provided in Table 4. Finally, results of regression of entropy on the same set of predictors are presented.
No age differences were present between the samples. Pregnant women were significantly more often married, attained higher educational levels and were more often unable to work (Table 1). Pregnant women had less previous hospitalisations and used less medication than the non-pregnant group (Table 2). Of the pregnant women three women used antidepressant medication without a mood stabilizer, while in the non-pregnant group none of the women used an antidepressant without a mood stabiliser. No significant differences were found regarding lifetime illness duration, age of onset, lifetime number of experienced manic or depressive episodes, lifetime substance abuse or dependence, and number of attempted suicides.
Comparison of the course of BD in both samples showed no significant differences in LCM variables (Table 3) regarding episodes or number of ill days during the observation episode.

Sequence analysis results
Distributions of proportions of severity scores per day were plotted for the whole observation periods in an index plot (Fig. 1). Figure 1 shows that most time of women with BD was spent in the euthymic state. This state covered the largest surface per day during the whole observation period. The percentage of women in this state was the highest, followed by 'mildly depressed' and 'mildly manic' . Figure 2 gives the graphic results of a cluster analysis with three clusters for the separate samples.
Women in the clusters 'severely ill' of both samples showed sequences with mainly severe and moderate depressive or manic days and few euthymic states. The second set of clusters, 'moderately ill' contained state sequences with some illness, days with mild to moderately ill days with respect to both mania as well as depression. Finally, clusters 'euthymic' were typified by sequences with mainly euthymic days, the main area of these clusters were taken up by euthymic states. This three cluster solution shows that the severely ill cluster in the non-pregnant sample consisted of 21% (11/52) of the participants and in the pregnant sample 18% (6/34) of the participants. The euthymic cluster in the pregnant sample consisted of 16% (5/34) the women, whereas this euthymic cluster in the non-pregnant sample was larger, consisting of 46% (24/52) of the women. The majority of the pregnant women were clustered as 'moderately ill': the percentage in this cluster is twice as high as the percentage of non-pregnant women in this cluster (68% and 32%, respectively). These distribution differences between samples were significant: χ 2 = 48.02, df = 2, p-value < 0.001.  Number of days spent in each of the nine states obviously differed over the three clusters. Evidently, more days were spent in the euthymic state in the euthymic cluster as compared to the other clusters. Interestingly, daily scores were not normally distributed over mania and depression. There was a tendency that more days were rated as depressed (left side of the cluster figures) than manic (right side of the figures) (Fig. 3).
The level of entropy of scores, i.e., the diversity of states on each day for both samples, for the non-pregnant women were higher (average 0.56) than for the pregnant women (average 0.43).
In multinomial regression the dependent variable (cluster) has three different categorical values. Regression of the three clusters on three demographic variables (educational level, marital status and work) and two clinical variables (medication, illness duration) showed only one significant relation, namely between educational level: the odds of going from cluster 1 (severely ill) to cluster 2 (moderately ill). Thus, the higher the level of education, the higher the odds of going from severely ill to moderately ill (Table 4).
Subsequently, the results of regression of entropy (n = 86) on the same set of predictors is shown (Table 5). The only covariate that produced a significant effect in regression of entropy on predictors was educational level. This suggests that the lower the educational level, the more instability occurred in the course of BD.

Discussion
While previous prospective studies have investigated risk of, and/or time to, recurrence of mood episodes during pregnancy in BD (Viguera et al. 2007;Bergink et al. 2012), this is the first study comparing fine-graded prospective illness course with the lifechart method (LCM) in pregnant and non-pregnant women with BD. No differences were found in illness severity variables, such as number of days ill, including days depressed, hypomanic, or manic, or average severity scores. However, with a cluster analysis of LCM data within both samples to reveal longitudinal illness patterns, within the study period, more pregnant women were moderately ill, whereas more nonpregnant women were euthymic.
In addition, pregnant women showed less variation of mood states than non-pregnant women. Women with a higher educational level in both samples were more likely to belong to a cluster moderately ill than to a cluster severely ill. An explanation for the relation between educational level and less mood instability could be that more adequate psycho-education is obtained by higher educated women. The higher educational level in the cluster moderately ill than in the cluster severely ill could be due to the fact that those women had a milder course of BD and therefore were able to get a higher education level.
Constructing typologies/groups, according to temporal data, of patients with BD has been done by other investigators. Post et al. determined the severity of illness in the first 258 outpatients in the SFBN who had 1 year of prospective LCM ratings (Post et al. 2003). Their typologies contain groups of patients who remained severely and almost continuously ill (26.4%), intermittently ill (40.7%), or minimally ill (32.9%) over the course of that year. The patterns of the ratings were visually assigned to the three groups by two independent investigators. Nowadays, with the application of state sequence analysis, clustering methods can be used to build similar types of illness course and so defining groups that are as homogenous as possible on the one hand and as different as possible from other groups on the other. This clustering method does not depend on subjective clinician ratings, but can be applied automatically using algorithms, enhancing reliability.
A possible explanation for the difference between our two samples, especially the higher proportion of euthymic days in non-pregnant women versus pregnant women, could be that more non-pregnant women used psychotropic medication during the observation period. The use of medication prevents recurrences in pregnant and non-pregnant women (Stevens et al. 2019;Larsen and Saric 2017;Kishi et al. 2020;Geddes and Miklowitz 2013). In more detail, in our study 23.5% of pregnant women used no medication at all, compared to none (0%) of the non-pregnant women, whereas 38.5% of the nonpregnant women used more than one mood stabilizer compared to none of the pregnant women. Although the use of antidepressant medication without a mood stabilizer is not recommended in bipolar disorder because of the risk of mood instability and switch to mania (Pacchiarotti et al. 2013), three pregnant women but none of the non-pregnant women used an antidepressant without a mood stabilizer.
All women in both samples received psychiatric treatment and most of the pregnant women had pre-pregnancy consultations. It is possible that women with a more severe BD, who would not stop medication because of fear for recurrence, decided not to attempt to conceive, which could explain why the number of medications used in the pregnant women in our study was relatively low.
In general, women with a childwish are recommended to be stabilized for a period of at least six months before becoming pregnant (Thomson and Sharma 2018). In a study of 70 women with BD, 45% who sought consultation for treatment options and risks during pregnancy  had been advised to avoid pregnancy by a health care professional before consultation (Viguera et al. 2002).
After the consultation 37% chose to avoid pregnancy with the most commonly reported reason being fears of adverse effect of medication on the development of the fetus (56%) or concerns of genetic transmission of BD to offspring (22%). There still is a paucity of systematic data on the effects of pregnancy on the course of BD. In our study, with use of daily mood monitoring, no differences were found in the course of BD in pregnant versus non-pregnant women. Comparing trajectories may be a better way to study the effect of pregnancy on the course of BD. A major challenge would be to investigate the role of pregnancy on the 'naturalistic' (i.e. untreated) course of BD, since it is recommended to prescribe some form of preventive medication in patients with BD also during pregnancy (Yatham et al. 2018).
Our study has several strengths. It is the first study comparing prospective daily mood ratings in a sample of pregnant and non-pregnant women with BD. The samples did not differ in demographic characteristics such as nationality and age, nor in clinical variables such as type of diagnosis (BD I or II), illness duration, age of onset, lifetime number of manic and depressive episodes, lifetime substance abuse, and serious suicide attempts. Also, this is the first study using SSA and clustering methods on LCM data to reveal different clusters of illness course.
However, there are also several limitations. Since the design consists of two cohorts from different studies, and cases were not assigned at random or via case control procedures, the comparability of both samples remains uncertain. The pregnant women had a higher educational level and were more often married than the non-pregnant women. The use of medication differed between the two cohorts, with a potential impact on the course of BD. Also, LCM data from pregnant women were reported from week 12 of pregnancy untill giving birth, hence data on the first trimester are not included. An obvious limitation of this study is the relatively small sample sizes of pregnant and non-pregnant women. Selection bias cannot be ruled out completely: the fact that pregnant women more often belonged to the moderately ill cluster and non-pregnant women to the euthymic cluster may be due to the possibility that pregnant moderately ill women were more likely to participate, or that euthymic non-pregnant women were more inclined to enter such a study than moderately ill non-pregnant women. Randomization in a study with regard to a comparison of pregnant and non-pregnant women with BD however is not feasible. Obviously more research with larger sample sizes about the effect of pregnancy on the course of BD is required. However, the fact that these small samples did not differ with respect to descriptive clinical course variables, such as number of manic and number of ill days, yet did show clear differences in illness pattern during the observation period adds further strength to the approach of sequential analysis of daily mood monitoring in BD, even with small sample sizes.

Conclusion
No differences in average values of clinical course variables in BD were found among pregnant women compared to non-pregnant women. The application of SSA to reveal patterns in the overall course in the observational period did show differences in proportions of pregnant versus non-pregnant women distributed over three clusters of sequences. More pregnant women showed a moderately ill pattern of daily mood scores whereas more non-pregnant women showed a euthymic pattern. One explanation is that more non-pregnant women used (more) psychotropic medications than pregnant women, which may have a protective effect. Pregnant women showed less variation in mood than non-pregnant women.
To answer the question whether pregnancy influences the course of BD, ideally, we need large, prospective case-control studies comparing pregnant and non-pregnant women. Another option is to compare within individuals the course of BD during pregnancy and during the year before.