Smartphone-based activity measurements in patients with newly diagnosed bipolar disorder, unaffected relatives and control individuals

Background In DSM-5 activity is a core criterion for diagnosing hypomania and mania. However, there are no guidelines for quantifying changes in activity. The objectives of the study were (1) to investigate daily smartphone-based self-reported and automatically-generated activity, respectively, against validated measurements of activity; (2) to validate daily smartphone-based self-reported activity and automatically-generated activity against each other; (3) to investigate differences in daily self-reported and automatically-generated smartphone-based activity between patients with bipolar disorder (BD), unaffected relatives (UR) and healthy control individuals (HC). Methods A total of 203 patients with BD, 54 UR, and 109 HC were included. On a smartphone-based app, the participants daily reported their activity level on a scale from −3 to + 3. Additionally, participants owning an android smartphone provided automatically-generated data, including step counts, screen on/off logs, and call- and text-logs. Smartphone-based activity was validated against an activity questionnaire the International Physical Activity Questionnaire (IPAQ) and activity items on observer-based rating scales of depression using the Hamilton Depression Rating scale (HAMD), mania using Young Mania Rating scale (YMRS) and functioning using the Functional Assessment Short Test (FAST). In these analyses, we calculated averages of smartphone-based activity measurements reported in the period corresponding to the days assessed by the questionnaires and rating scales. Results (1) Smartphone-based self-reported activity was a valid measure according to scores on the IPAQ and activity items on the HAMD and YMRS, and was associated with FAST scores, whereas the majority of automatically-generated smartphone-based activity measurements were not. (2) Daily smartphone-based self-reported and automatically-generated activity correlated with each other with nearly all measurements. (3) Patients with BD had decreased daily self-reported activity compared with HC. Patients with BD had decreased physical (number of steps) and social activity (more missed calls) but a longer call duration compared with HC. UR also had decreased physical activity compared with HC but did not differ on daily self-reported activity or social activity. Conclusion Daily self-reported activity measured via smartphone represents overall activity and correlates with measurements of automatically generated smartphone-based activity. Detecting activity levels using smartphones may be clinically helpful in diagnosis and illness monitoring in patients with bipolar disorder. Trial registration clinicaltrials.gov NCT02888262

Background Activity and energy level are core symptoms of bipolar disorder (BD) (Kupfer et al. 1974). Hypomanic and manic episodes are characterized by increased energy, enhanced engagement in social activities and increased psychomotor activity (Carlson and Goodwin 1973;Faurholt-Jepsen et al. 2015;Frye et al. 2009), whereas depressive episodes are often associated with loss of energy, withdrawal from social activities and psychomotor retardation or agitation (Lewinsohn and Graf 1973;Sobin and Sackeim 1997). Several studies have suggested that increased activity is a persistent and prominent symptom of (hypo)mania (Bauer et al. 1991;Benazzi 2007;Cheniaux et al. 2014), whereas decreased activity has been reported in patients with BD during depressive episodes and remission compared with healthy control individuals (HC) (Crescenzo et al. 2017;Faurholt-Jepsen et al. 2012;Scott et al. 2017). The relevance of increased activity in hypomania was recently stressed in the DSM-5 where elevated activity or energy is a mandatory core criterion of (hypo)mania in addition to elevated or irritable mood (DSM-5 2013) resulting in a substantial reduction in the prevalence of (hypo)manic episodes diagnosed with DSM-5 compared with DSM-IV (Fredskild et al. 2019).
Previous studies investigating activity and energy levels in patients with BD have primarily relied on observerbased ratings, self-reported questionnaires, and wristand thoracic worn accelerometers/heart rate sensors (Faurholt-Jepsen et al. 2012;Krane-Gartiser et al. 2014). Retrospective questionnaires are prone to recall bias (Stone et al. 2003), and self-reported physical activity is often over-estimated compared to objective measurements (Vancampfort et al. 2016). Further, wrist-worn accelerometer relies on only one parameter. Smartphones can collect subjective as well as objective measurements of activity relatively unobtrusively and during naturalistic settings and provides a platform where several parameters reflecting activity can be combined. Several studies have found smartphone-based self-reports of activity feasible to collect daily smartphone-based recordings in real-time in patients with BD (Faurholt-Jepsen et al. 2015;Matthews et al. 2016;Tsanas et al. 2017). Similarly, automatically generated smartphone-based data might capture changes in activity within speech, mobility and social interaction in patients with BD (Faurholt-Jepsen et al. 2015;Beiwinkel et al. 2016;Grünerbl et al. 2012;Palmius et al. 2017;Rohani et al. 2018). Our group recently found that by applying advanced machine learning algorithms to analyze automatically generated smartphone-based data, including screen features and call-and text logs, it was possible to discriminate between patients with BD and HC. The findings suggest that smartphonebased automatically generated data may represent a potential diagnostic marker for bipolar disorder that in future may be clinically useful (Faurholt-Jepsen et al. 2019). Nevertheless, the validity of daily self-reported and automatically generated smartphone-based activity has not been systematically validated. Further, it has not been investigated whether daily self-reported and automatically generated smartphone-based activity differs between patients with newly diagnosed BD, unaffected first-generation relatives (UR), and HC. In this study, all patients with BD were included independently of their mood state.

Aims of the study
The present study had three aims: Firstly, to investigate daily smartphone-based selfreported and automatically generated activity, respectively, against validated measurements of activity including (1) a validated questionnaire for physical activity, (2) activity assessed by trained clinicians according to activity items on validated rating scales of severity of depression and mania, respectively, (3) functioning according to clinical assessment with a validated rating system.
Secondly, to investigate daily smartphone-based selfreported activity and automatically generated activity against each other.
Thirdly, to investigate differences in daily self-reported and automatically generated smartphone-based activity in patients with newly diagnosed BD, UR, and HC.
We hypothesized that (1) daily smartphone-based self-reported and automatically generated activity represents valid measurements of activity according to questionnaires and clinical ratings of activity (2) daily smartphone-based self-reported and automatically generated activity are associated, and (3) activity level is decreased for patients with newly diagnosed BD Conclusion: Daily self-reported activity measured via smartphone represents overall activity and correlates with measurements of automatically generated smartphone-based activity. Detecting activity levels using smartphones may be clinically helpful in diagnosis and illness monitoring in patients with bipolar disorder. Trial registration clinicaltrials.gov NCT02888262 Keywords: Bipolar disorder, Smartphone, Remote monitoring, Activity, Electronic monitoring compared with HC individuals and intermediary for UR individuals.

Study design
The present study is part of the larger ongoing Bipolar Illness Onset studies (BIO study) (Kessing et al. 2017). Three groups of participants were included: patients with BD, UR, and HC. All participants underwent The Schedules of Clinical Assessment in Neuropsychiatry (SCAN) interview (Wing et al. 1990) and a diagnosis of BD (or the lack of ) was provided according to the International Classification of Diseases 10th version ICD-10 ( WHO 1992).
All participants were assessed at baseline and every year for up to three years. Patients with BD were contacted every third month to identify new ongoing affective episodes. If the patients were in a new ongoing affective episode at the time of contact, they were scheduled for a new appointment with researchers on the BIO-team.

Study participants
Patients with BD: Patients with newly diagnosed BD living in the Capital Region of Denmark are offered a twoyear program at the Copenhagen Affective Disorder Clinic Copenhagen, Denmark . Inclusion criteria were newly diagnosis of BD or newly diagnosis of a single manic episode according to the ICD-10 and an age of 15-70 years.
Unaffected relatives: Unaffected first-degree relatives, siblings or children, to the patients included in the BIOstudy, were recruited after permission from patients with BD. Exclusion criteria were any previous or current psychiatric diagnosis lower than F34.0 according to ICD-10 (i.e., organic mental disorders, mental and behavioral disorders due to psychoactive substance use including alcohol, schizophrenia or other psychotic disorders, affective disorders).
Healthy control individuals: Healthy control persons were recruited among blood donors, age 15-70, from the Blood Bank at Rigshospitalet, Copenhagen. Exclusion criteria were treatment requiring psychiatric disorder in the individual or one of the individuals' first-degree family members.
At all visits, three observer-based rating scales and one self-reported questionnaire were administered in addition to daily smartphone-based self-reported and automatically generated activity measures.

Observer-based ratings of activity
In all three groups, the severity of depressive and manic symptoms for the past three days was clinically evaluated using the Hamilton Depression Rating Scale 17-items (HAMD) (Hamilton 1967) and the Young Mania Rating Scale (YMRS) (Young et al. 1978), respectively. On the HAMD-17 we used sub-item 8 addressing psychomotor retardation and sub-item 9 addressing psychomotor agitation. We assumed that an observerbased rating of psychomotor retardation and agitation to some degree reflects activity and energy level. If psychomotor retardation/agitation is rated high, we assume that patients have more difficulties being both physically and socially active. Therefore, we used this sub-item. On the YMRS we used sub-item 2 evaluating the level of motor activity and sub-item 6 addressing the pressure of speech. These items were specifically chosen to investigate whether smartphone-based selfreported and automatically generated activity measurements reflect these clinically assessed activity measurements. The Functional Assessment Short Test (FAST) was included to investigate if changes in daily smartphone-based self-reported or automatically generated activity are reflected in changes in functioning, as assessed by clinical researchers. The Functional Assessment Short Test is specifically developed for bipolar disorder and addresses six areas of functioning for the past 14 days: autonomy, occupational functioning, cognitive functioning, financial issues, interpersonal relationship and leisure time. All items are rated from 0 (no difficulties) to 5 (severe difficulties). The test has a high test-retest reliability and has been validated against the Global Assessment of Functioning scale (GAF) (Rosa et al. 2007).

Self-reported physical activity questionnaire
Self-reported physical activity level was assessed by using The International Physical Activity Questionnairesshort form (IPAQ). IPAQ is a widely used questionnaire to address the level of physical activity and sedentary behavior (Lee et al. 2011).
The IPAQ provides information regarding time spent in four intensity levels: (1) vigorous-intensive activity, (2) moderate-intensity activity, (3) walking and (4) sedentary for the past seven days (Craig et al. 2003). Summary measurements of overall self-reported physical activity are reported as a continuous variable metabolic equivalent task (MET minutes a week), representing the energy expended during the physical activity. Higher scores correspond to higher activity. The questionnaire was included to investigate whether smartphone-based selfreported and automatically generated activity is associated with patient-rated physical activity.

Smartphone-based monitoring
All participants downloaded a smartphone-based app, Monsenso, on their smartphones. The Monsenso system consists of an app, where participants can self-monitor symptoms, and a web-based interface allowing clinicians and researchers to access participant's self-reported data (Bardram et al. 2013). The Monsenso app can be downloaded on both iPhone and Android smartphones and daily self-monitoring of symptoms were accessible on both iPhone and Android; however, in the present study automatically generated smartphone-based data were only accessible from participants using Android smartphones. Participants used their phones. Participants with no smartphone or participants having an iPhone were offered the opportunity to borrow an Android smartphone (LG Nexus 5) and use it as their primary phone during the study. The Monsenso system has a daily reminder function and self-reported data can be entered retrospectively for up to two days. Unaffected relatives and HC were asked to provide self-reported activity level daily for at least one month and preferably three months. Patients with BD were asked to report daily activity level for a minimum of three months. The BIO study is a comprehensive study and smartphone-based monitoring is only one part of the study.

Smartphone-based activity measurements
In DSM-V a core symptom of bipolar disorder is changes in activity/energy level. Currently, there is no consensus regarding the definition and measurements of activity. In this study, we have investigated whether smartphonebased self-reported and automatically-generated activity measures can be used to monitor activity levels in patients with BD. We have not differentiated between physical and non-physical activity.
In the Monsenso app the three groups (patients with BD, UR, and HC) scored their daily activity level on a 7-point scale (−3, −2, −1, 0, 1, 2, 3). The daily activity level refers to the patient's overall activity level for the day, it could refer to social-as well as physical activity, goal-directed activity, hyperactivity or another aspect of activity defined by the participant. For patients with BD daily mood symptoms were collected on a 9-point scale from depressed to manic (−3, −2, −1, −0.5, 0, 0.5, 1, 2, 3). Self-reported neutral mood was defined as a self-reported mood score of −0.5, 0, or 0.5. In the present study, a limited number of automatically generated attributes were available including (1) number of steps; (2) incoming-and outgoing text messages; (3) call duration and number of incoming-, outgoing-and missed calls; (4) seconds the screen is on (referred to as screen time) and number of times the screen is turned on. Of these attributes, we hypothesized that step counts to most likely reflect physical activity and the other attributes most likely to reflect social activity.

Statistical methods
All hypothesis and statistical analyses were planned á priori. We investigated smartphone-based activity measurements against: (1) IPAQ addressing physical activity for the past week; (2) subitem 8 and 9 HAMD-17 and subitem 2 and 6 on YMRS, addressing items that are related to activity and energy level for the past three days; and (3) FAST, addressing functioning the past 2 weeks. In these analyses, we calculated averages of smartphone-based activity measurements reported in the period corresponding to the days assessed by the questionnaires and rating scales. Therefore, only visits where participants had provided data on self-reported activity corresponding to the days of the questionnaire were included. All participants were included in the analyses.
Secondly, we validated daily smartphone-based selfreported activity against the automatically generated activity measurements. In these analyses we only used days where participants provided both self-reported and automatically generated data. Thirdly, differences in activity measurements between the three groups were investigated. The following activity measurements were included: self-reported smartphone-based activity, automatically generated smartphone-based activity, physical activity (IPAQ), activity assessed according to activity sub-items on HAMD and YMRS, respectively, and FAST. Smartphone-based self-reported and automatically generated activity collected during the whole study period were used regardless of affective episodes. The participants were assessed annually and additionally, patients with BD were booked for a new appointment with a researcher from the BIO-team if they were experiencing a new ongoing affective episode. Therefore, some participants provided repeated measurements of clinical assessed activity.
Linear mixed-effect models were used in all analyses. This model can account for participant-specific correlations by including familial relationship and participants id number as random effects. In analyses comparing the difference in activity between the patients with BD, UR and HC groups were used as a fixed effect. For each comparison, we considered an unadjusted model and a model adjusted for age and sex. The model accounts for unbalanced data and allows us to use all data points from each study participant during follow-up and not only complete datasets. Thus, one of the advantages of the linear mixed model (LMM) analysis is that it implicitly imputes missing data from dropouts under the assumption that these are missing at random. As such, handling missing data is embedded in the LMM procedure.
Model control was performed for each analysis. Prior studies investigating smartphone-based self-reported and automatically generated activity are scarce and no standard measurements were accessible. Thus, due to the explorative nature of the study adjustment for multiple testing was not done and p-values < 0.05 (two-tailed) were considered statistically significant. All analyses were conducted using the Statistical Package of the Social Sciences (SPSS) Version 22.

Ethical considerations
The Bipolar Illness Onset (BIO) study has been approved by the ethics committee in the Capital Region, Copenhagen, Denmark (ref. nr. H-7-2014-007) and the Danish Data Protection Agency, Capital Region of Copenhagen (protocol no.: RHP-2015-023). The study was conducted in accordance with the Declaration of Helsinki and all participants provided written informed consent.
First degree relatives (UR) were compensated with a gift card equivalent of 40 USD, whereas patients with BD and HC participants were not compensated. The UR were compensated to enhance recruitment of participants to the study.

Socio-demographic and clinical characteristics
In the study period from September 2016 to February 2019, 240 patients with BD, 66 UR and 118 HC were included in the BIO-study cohort. Self-reported data on activity measured via smartphones were collected from 203 patients with BD, 54 UR, and 109 HC. Primary reasons for non-participation were time, surveillance concerns or that the participant had no smartphone and did not want to borrow a smartphone. Participants possessing an Android smartphone additionally provided automatically generated smartphone-based data (75 patients with BD, 15 UR, and 32 HC).
Socio-demographic and clinical characteristics are presented in Table 1. Among the patients with BD there was no statistically significant difference between patients who provided smartphone-based recordings (203 patients) and those who did not (37 patients with BD) and between patients who had an iPhone vs. Android smartphone with regard to age, sex, educational level, or illness duration (p's > 0.5).
During the study period, participants provided a total of 48,747 daily self-reported smartphone-based activity ratings. The self-reported activity ratings measured via smartphones were provided for a median of 106  92-393] for HC. All UR and HC and 93% of patients with BD provided above one month of automatically generated smartphone-based data. Self-reported data were provided above one month for 80% of participants. The patients with BD were seen annually and upon development of a new mood episode. In this study, the patients with BD contributed with 337 visits. For the majority of visits, patients were in full or partial remission (65%) (HAMD and YMRS < 14), 27% had a HAMD score ≥ 14 and 7% had a YMRS score ≥ 14.

Daily self-reported activity via smartphones
The upper part of Table 2 presents associations between daily self-reported smartphone-based activity and selfreported physical activity according to scores on the IPAQ, functioning according to scores on the FAST and activity items on the HAMD and YMRS rating scales. Daily self-reported smartphone-based activity was statistically significantly associated with all validity measurements, except in relation to item 6 on the YMRS rating scale (speech).

Daily automatically generated smartphone-based activity measurements
The remaining part of Table 2 presents similar associations for automatically generated smartphone-based activity. None of the automatically generated smartphone-based activity features were associated with scores on IPAQ.
Step count and screen time were associated with FAST. The number of outgoing calls was associated with item 2 on the YMRS rating scale, the number of incoming calls was positively associated with item 9 on HAMD rating scale, number of times the screen was turned on was negatively associated with item 8 on HAMD and screen time was positively associated with both item 8 and 9 on HAMD. The rest of the automatically generated smartphone-based activity measurements were not associated with clinically validated activity measurements.

Daily self-reported smartphone-based activity versus automatically generated smartphone-based activity measurements
As can be seen from Table 3, daily self-reported activity via smartphone was statistically significantly associated with all measurements of smartphone-based activity, except missed calls and incoming calls, which were borderline statistically significant.
Daily smartphone-based self-reported and automatically generated activity measurements in patients with newly diagnosed BD, UR, and HC.
As can be seen from Fig. 1 and the upper part of Table 4, patients with BD had a statistically significantly lower mean level of daily self-reported mean activity level and fewer days with high activity and more days with low activity compared with HC. In sub-analysis, where only days at which patients had self-reported remitted mood (−0.5 to 0.5 on the mood scale) were included, patients with BD also had statistically significantly lower mean activity levels compared with UR and HC. Unaffected relatives did not differ from HC individuals on any measure of self-reported activity.
The midpart of Table 4 shows that patients with BD had statistically and significantly lower number of steps, more missed calls and longer duration of calls per day compared with HC individuals. Unaffected relatives had a lower number of steps per day compared with HC individuals. There were no statistically significant differences in any other automatically generated smartphonebased measure of activity.

Discussion
This study is the first to systematically validate smartphone-based activity measurements in bipolar disorder and to compare both daily self-reported and automatically generated smartphone-based activity among patients with BD, UR, and HC individuals. Overall, we confirmed our three hypotheses. Firstly, smartphonebased self-reported activity was a valid measure according to scores on the IPAQ and activity items on the HAMD and YMRS, and was associated with FAST scores, whereas automatically generated smartphonebased activity measurements were weakly correlated with these measurements. Secondly, daily self-reported smartphone-based activity measurements and automatically generated smartphone-based activity measurements

Table 1 Socio-demographic and clinical characteristics of patients with bipolar disorder (BD), unaffected first-degree relatives (UR) and healthy control individuals (HC) at baseline
Continuous variables are presented as median [interquartile range] and p-values are calculated based on differences in mean between the tree groups using mixed models. Categorical data are presented as % (n) and p-values are calculated by using the chi-square test.  Episode at baseline f Full or partiel remisison, % (n) 60.6 (123)

HAMD-17
Hypomania/mania, % (n) 5.9 (12) Depression, % (n) 33.0 (67) Mixed, % (n) 0.5 (1)  Significant p values are given in italic IPAQ The Physical Activity Questionnaire-short form, a measure of physical activity 7 days prior to assessment, HAMD-17 The Hamilton Depression Rating Scale 17-item. Sub-item 8 and 9 addressing psychomotor retardation and agitation, respectively, YMRS total The Young Mania Rating Scale. Subitem 2 and 6 addressing motor activity and speech, respectively, FAST The Functional Assessment Short Test, a measure of global functioning 14 days prior to assessment a Smartphone-based activity measurements: Averages of smartphone-based activity ratings were calculated for the current day and 3 days before ratings with HAMD-17 and YMRS, 7 days prior for rating IPAQ and 14 days prior for FAST.

Table 3 Associations between self-reported a and automatically generated b smartphone-based data for all participants in the study owning an Android smartphone
Significant p values are given in italic Model 1 unadjusted, Model 2 adjusted for age and sex. a Self-reported smartphone-based activity rated on a scale from −3 to + 3. correlated with all measurements (except missed calls and incoming calls that were borderline statistically significant). Thirdly, patients with BD had decreased daily self-reported activity compared with HC, and UR did not differ from HC individuals. According to automatically generated smartphone-based data, patients with BD had decreased physical (number of steps) and social activity (more missed calls) but a longer call duration compared with HC, whereas UR had decreased physical activity, only.

Self-reported and automatically generated smartphone-based activity
Changes in activity level is a central feature in patients with BD. However, no clear consensus concerning the definition or assessment of the term activity exists (Scott et al. 2017). Several terms have been used to describe different aspects of activity (e.g. hyperactivity, goaldirected activity, behavioral activation) (Scott et al. 2017). These terms reflect changes in both psychomotor activities, body movement, and behavior (Lewinsohn and Graf 1973). Also, there is no consensus regarding how "activity" can or should be measured. Notably, we found that daily self-reported smartphone-based activity was associated with clinically assessed measurements of energy/activity, psychomotor retardation and agitation, and functioning in addition to all measurements of automatically generated smartphone-based activity (except missed calls and incoming calls, which were borderline statistically significant). This result shows the advantages of remotely monitoring. Remotely reported activity levels are a new area of research in bipolar disorder. We are aware of one other study on remotely reported activity, only, finding associations, although weak, between remotely self-reported energy level and activity items on validated questionnaires (Tsanas et al. 2016). Other studies that have investigated self-reported activity have used the term "energy" to evaluate activity/energy levels (Tsanas et al. 2016;Abdullah et al. 2016). Although we in this study, investigated self-reported daily activity reflecting overall activity as defined by the participant, it is likely that self-reported energy and self-reported activity reflect two different aspects of activity/energy, and it would be interesting to investigate these aspects further.  Unexpectedly, in the present study, the associations between automatically generated smartphone-based data and the clinical measurements of activity (sub-items on YMRS and HAMD-17) were not as compelling as hypothesized. An explanation for this discrepancy may be that all of our automatically generated smartphonebased activity attributes were investigated separately. Future studies should consider integrating several smartphone-based features including both self-reported data and automatically generated data and apply machine learning methods to develop a composite marker to estimate overall activity level. A composite marker reflecting overall activity may have clinical utility in both diagnosis and treatment monitoring of bipolar disorder .

Table 4 Estimated differences in activity in patients with bipolar disorder (BD), unaffected first-degree relatives (UR) and healthy control individuals (HC)
In contrast, the participants' daily self-reported activity via smartphone was associated with all measurements of automatically generated smartphone-based activity (except missed calls and incoming calls, which were borderline statistically significant). This finding is in accordance with results from a few other studies reporting an association between remotely collected self-reported energy in patients with BD and automatically generated smartphone-based data  and between remotely collected self-reported energy and motor activity measured by actigraphy (Merikangas et al. 2018).

Differences in activity level between patients with bipolar disorder, unaffected relatives and healthy control individuals
A lower mean level of activity has been reported during remission in patients with BD compared with HC (Crescenzo et al. 2017;Scott et al. 2017) and first-degree relatives, respectively (Pagani et al. 2016). In line with this, we found a lower mean level of self-reported smartphone-based activity in patients with BD compared with HC. Other studies investigating remotely reported activity level either have a small sample size (Schwartz et al. 2016) or have not reported findings regarding differences in self-reported activity between groups (Tsanas et al. 2016). Decreased activity level has previously been associated with mood level (Merikangas et al. 2018) and related to affective episodes (Rosa et al. 2010). Moreover, reduced engagement in activities may be predictive of forthcoming depressions (Weinstock and Miller 2008).
Recently, our group published a study presenting automatically generated smartphone-based data as a potential diagnostic behavioral marker for BD, also discriminating patients with BD during euthymia from healthy control individuals (Faurholt-Jepsen et al. 2019). In the present study, step counts, missed calls and call duration differed between patients with BD and HC substantiating the validity of these physical and social activity measurements. Remarkably, the number of incoming calls and text messages did not differ between the three groups, which might reflect that patients in the study who were newly diagnosed with BD have a normal social network. A recent review found inconsistency in the association between text-messages and affective states and concluded that text-logs should be interpreted with caution due to competing communication platforms (Rohani et al. 2018). Nevertheless, the design and measurements in studies investigating automatically generated smartphone-data are highly heterogenous and with small sample sizes, which might explain inconsistencies across studies.

Limitations
First, automatically generated smartphone-data allow us to collect data on behavioral activities unobtrusively and could have potential as a valid state marker and possible trait marker for bipolar disorder . However, in this study, only a few of the automatically generated smartphone-based data differed between the three groups. An explanation could be that smartphone-based automatically generated data could only be collected from participants with Android smartphones, which resulted in a low number of participants, especially UR. Therefore, results should be interpreted with caution and negative findings could be due to type II errors.
Second, we have included all data in the analysis, regardless of the patient with BD's mood state and therefore, we cannot capture mood state-dependent variations in activity level. Assessment of such changes will be possible during the longitudinal part of this study were the sample size is larger and patients have provided smartphone-based recordings for a longer period. Third, all participants included in this study were also a part of the larger BIO study. In a study solely investigating smartphone-based monitoring, the level of adherence may have been higher. Fourth, participants used their own phones. Consequently, data were gathered from multiple different platforms, which might cause some heterogeneity in data not accounted for in this study. Also, the collection of automatically generated smartphone-based data may be highly influenced by the time period the data were collected. During the past ten years, there have been changes in the use of text messages due to emergence of alternative communications platforms such as social media, Snapchat etc. and the technology software used by smartphones have advanced considerably (Alhabash and Ma 2017). To obtain a comprehensive understanding of the putative relationship between smartphone-based activity and a participant's general activity, future studies may need to address person-specific variations in phone usage and the transient popularity of commercial communication applications. In this study, we accounted for some of this variation by adjusting for age and sex. Fifth, there are a number of limitations in using step counts to address physical activity. Participants can carry their phone in different places (pocket, handbag, jacket, etc.) and are likely not to use the phone during some physical activities as swimming or cycling). Therefore, a better estimate for physical activity would be to combine step counts with other attributes such as self-reported activity, location data and/or collect data from a wearable device. Sixth, other automatically generated features such as accelerometer, ambient light and microphone could have provided useful information. However, these parameters are battery consuming and to enhance long-term adherence to the smartphone-application these parameters were not included. Seventh, participants agreeing to participate in this study might represent a sub-population with a more technology-friendly approach increasing the risk of selection bias. Further, our healthy control group were recruited among blood donors and might represent a "super-healthy" control group. Eightly, two-thirds of the patients included in the study had a BD type II and two-thirds were female and therefore findings may not be generalized to all patients with BD.

Strengths
First, the study comprised of 366 systematically recruited participants including patients with newly diagnosed BD with a median age of 28 years and their URs. Additionally, patients with BD were diagnosed at a specialized mood disorder clinic and diagnosis (and lack of diagnosis) was verified for all participants with a SCAN-interview conducted by trained assessors. Secondly, we used clinically validated activity measurements from the HAMD-17, YMRS, and FAST rating scales. Thirdly, the Monsenso system used in the study is well validated and importantly fulfilling safety of data storage and privacy requirements.

Conclusion
Daily self-reported smartphone-based activity measurements represent a valid marker for overall activity and correlate with measurements of automatically generated smartphone-based activity. Daily collected smartphone data of activity differs between individuals with BD, UR, and HC. The study suggests that daily selfreported smartphone-based activity measurements and some automatically generated smartphone-based activity measurements represent clinical meaningful markers that may be clinically useful in diagnosis and treatment monitoring of bipolar disorder.