An evidence map of actigraphy studies exploring longitudinal associations between rest-activity rhythms and course and outcome of bipolar disorders

Background Evidence mapping is a structured approach used to synthesize the state-of-the-art in an emerging field of research when systematic reviews or meta-analyses are deemed inappropriate. We employed this strategy to summarise knowledge regarding longitudinal ecological monitoring of rest-activity rhythms (RAR) and disease modifiers, course of illness, treatment response or outcome in bipolar disorders (BD). Structure We had two key aims: (1) to determine the number and type of actigraphy studies of in BD that explored data regarding: outcome over time (e.g. relapse/recurrence according to polarity, or recovery/remission), treatment response or illness trajectories and (2) to examine the range of actigraphy metrics that can be used to estimate disruptions of RAR and describe which individual circadian rhythm or sleep–wake cycle parameters are most consistently associated with outcome over time in BD. The mapping process incorporated four steps: clarifying the project focus, describing boundaries and ‘coordinates’ for mapping, searching the literature and producing a brief synopsis with summary charts of the key outputs. Twenty-seven independent studies (reported in 29 publications) were eligible for inclusion in the map. Most were small-scale, with the median sample size being 15 per study and median duration of actigraphy being about 7 days (range 1–210). Interestingly, 17 studies comprised wholly or partly of inpatients (63%). The available evidence indicated that a discrete number of RAR metrics are more consistently associated with transition between different phases of BD and/or may be predictive of longitudinal course of illness or treatment response. The metrics that show the most frequent associations represent markers of the amount, timing, or variability of RAR rather than the sleep quality metrics that are frequently targeted in contemporary studies of BD. Conclusions Despite 50 years of research, use of actigraphy to assess RAR in longitudinal studies and examination of these metrics and treatment response, course and outcome of BD is under-investigated. This is in marked contrast to the extensive literature on case–control or cross-sectional studies of actigraphy, especially typical sleep analysis metrics in BD. However, given the encouraging findings on putative RAR markers, we recommend increased study of putative circadian phenotypes of BD.


Introduction
In the 2017 update on the global burden of disease demonstrates that, the four most burdensome non-communicable conditions (in terms of DALYs: disability adjusted Open Access *Correspondence: bruno.etain@inserm.fr 3 Université Paris Diderot, Sorbonne Paris Cité, UMR-S 1144, 75013 Paris, France Full list of author information is available at the end of the article life years) are: cardiovascular disease (CVD), cancers, musculoskeletal conditions, and mental disorders (GBD 2017;Whiteford et al. 2013). In individuals aged 15-49, mental disorders in general, and mood disorders specifically, are regarded as the most burdensome conditions worldwide. In the last 3 to 4 decades, considerable progress has been made in the management of CVD and cancer through precision medicine. The application of precision medicine is in its infancy in psychiatry, but funding initiatives (such as the European H2020 platform) are supporting studies of personalized diagnostics and therapeutics across a range of mental conditions including bipolar disorders (BD) (Schumann et al. 2014). Like precision approaches to other chronic disorders, these studies have largely focused on more quantifiable metrics (such as 'omics' and brain scanning) as an attempt to increase the biological validity of observed symptoms (Frey et al. 2013;Hellwig and Domschke 2019;MacDonald et al. 2019). This strategy also reflects the proposals described in the RDOC (research domain criteria) paradigm that identifies key domains and analytic units for translational research (Cuthbert and Insel 2013;Insel 2014). Although personalized diagnostics for BD are likely to remain a long-term aspiration, there is emerging evidence that multi-platform, integrated science approaches could offer a pathway for stratifying individuals with BD to improve outcome prediction, namely treatment response, relapse and recurrence, and individual illness trajectories (Scott et al. 2018). However, as these efforts progress, there is a need to extent the search for phenotypes beyond laboratory settings and to test potential biomarkers in real-world settings, such as field and cohort studies, comparative effectiveness research, and large-scale pragmatic trials (Brietzke et al. 2019;Scott et al. 2019).
One of the domains identified by RDOC is the arousal and regulatory system, and RDOC suggests that sleepwake cycle and circadian rhythms (which we will refer to as rest-activity rhythms or RAR) represent a core construct for investigation (Insel 2014;Smagula 2016). This is of relevance to BD, as converging evidence indicates that RAR disruptions can differentiate BD cases from healthy controls (HC) and/or other comparator groups (Ritter et al. 2011;Geoffroy et al. 2015;Ng et al. 2015;De Crescenzo et al. 2017;Scott et al. 2017). Further, real-time monitoring of most of these phenomena may be possible with wrist-worn devices. Of course, there has been an exponential increase in the development and use of consumer grade wearables (and smart phone apps) in clinical populations (Faurolt-Jepson et al. 2012;Krane-Gartiser et al. 2019). However, many demonstrate low interdevice reliability, none of the apps or programmes are currently validated against 'gold standard' measures and few can accurately estimate key markers of circadian timing (Zee et al. 2014;FDA 2019;Depner et al. 2020). The lack of regulatory approval and absence of basic research on or rigour of testing their performance quality means consumer devices are not currently recommended for use in this research field (Baron 2018;FDA 2019;Moar 2018). Currently, RDOC indicates that actigraphy could be applied as an 'analytic unit' for longitudinal studies of circadian rhythms and sleep-wake cycles (Cuthbert 2014). Such studies are now underway, but there is no consensus regarding the most useful actigraphy-derived RAR parameters to estimate and/or report Smagula 2016. Also, a preliminary scoping exercise of this topic suggested that relatively few actigraphy studies have employed a longitudinal or prospective design in BD. Therefore, we decided to collate any existing publications on these issues and synthesise the current state of knowledge.
This paper has three aims: (a) To determine the number and type of actigraphy studies of samples comprised wholly or partly of individuals with BD that explore data regarding: outcome over time (e.g. relapse/recurrence according to polarity, or recovery/remission), treatment response or illness trajectories. (b) To examine the range of actigraphy metrics that can be used to estimate disruptions of RAR and describe which individual circadian rhythm or sleep-wake cycle parameters are most consistently associated with outcome over time in BD. (c) To demonstrate the use of evidence mapping as a process for synthesising the state-of-the-art in the field of ecological monitoring of RAR and disease modifiers, course of illness, treatment response or outcome in BD.

Methods
As the use of evidence-mapping dictates the structure of the paper, we provide a brief overview of this approach in Table 1. Below, we outline the application to the project described in this article.

Rationale for use of mapping
As noted in the introduction, the goal of this evidence map is to present an overview of the extent, range, nature, and findings of clinical research on RAR metrics measured by actigraphy and a range of potential outcomes of BD. Like many evidence maps, the rationale for using this process was determined by prior knowledge of the field. Many research groups, including our own, have synthesized research findings or undertaken meta-analyses of Scott et al. Int J Bipolar Disord (2020) 8:37 pooled data regarding BD cases compared with HC or other (comparator) groups (Ritter et al. 2011;Geoffroy et al. 2015). Some of those projects identified small-scale studies that used real-time monitoring of BD cases and/ or that examined associations between baseline actigraphy recordings and longitudinal course and outcome of BD (Scott et al. 2017). Nearly all of those projects were undertaken several decades ago and differences in sampling, diagnosis or research methods meant the publications failed to meet eligibility criteria for systematic reviews focused on a specific research question (De Crescenzo et al. 2017). Further, it was known that most of the reported data were insufficient (in terms of amount or quality) to merit inclusion in meta-analyses (Ng et al. 2015). As such, an evidence map offered a realistic option to gain an understanding of existing research in this area, especially capturing insights from multiple small-scale studies and mapping provides a means of summarizing the range of publications and findings in a transparent way (Vallarino et al. 2015). The potential advantage over a basic narrative or selective review is that the search process is reproducible, and the map specifically addresses gaps in the knowledgebase (Hetrick et al. 2010). However, findings are described in brief, so a map does not aim to offer the detailed lists of citations and extensive tables included in systematic reviews or meta-analyses.

Mapping process
The process usually incorporates four steps (clarifying focus; describing boundaries and 'coordinates' for mapping; literature search; charting the output maps). After consultation among the co-authors, the questions and scope of the mapping project were defined as follows- Step 1: Clarification of the focus of the map.
It was agreed that the focus would be: (a) What evidence exists on this topic and based on a review of study characteristics and methodologies: what conclusions can be drawn about the quality of evidence? (b) What areas are, or are not, well-researched? (c) Based on the available evidence, what advice can be offered to clinicians and investigators in the field?

Step 2: Description of the eligibility criteria (boundaries for inclusion and exclusion), specification of co-ordinates to be mapped (study characteristics associated with outcome/response, etc.) and definition of key variables (RAR metrics):
(a) Studies were eligible if: Inclusion criteriai) The sample or cohort was wholly or partly comprised of BD cases that met clinical or research diagnostic criteria ii) It reported associations between any RAR actigraphy metric recorded at baseline and a clinical or researchdefined outcome assessed at follow-up in BD cases (or vice versa). Note: For the purposes of the map, the focus was on BD cases, so we extracted informa-

Table 1 Brief overview of evidence mapping
Evidence Mapping Evidence mapping is defined as-"a form of knowledge synthesis that addresses an exploratory research question aimed at mapping key concepts, types of evidence, and gaps in research related to a defined area or field by systematically searching, selecting, and synthesizing existing understanding" The gold standard methods used in evidence-based research are systematic review and meta-analysis. These classic approaches are both rigorous and provide readers with detailed information about narrow questions (e.g. the efficacy of drug X for disorder Y). However, developments from this archetypal model are now being developed to meet more diverse needs regarding evidence synthesis (Arksey 2003;Miake-Lye et al. 2016). The obvious example is the use of rapid reviews (which address urgent topics and do not always fully adhere to systematic methodologies) or scoping reviews (that identify extensive bodies of literature but do not usually provide detailed synthesis) (Colquhoun et al. 2014). In this context, evidence mapping has emerged as another process that allows researchers and their audience to develop an understanding of the extent and distribution of evidence in on a broad topic, highlighting what is known and also where gaps exist (Katz et al. 2003;Hetrick et al. 2010). Although it applies systematic and replicable methodology, the process is iterative, as expert consensus and preliminary literature searches inform the next steps in the review, so the final product is broader and less detailed than the traditional approaches (Vallarino et al. 2015). Furthermore, the goal is not to provide detailed statistical comparisons, but rather to offer an essential snapshot of what is or is not known at this moment in time. It is presumed that an evidence map for any given field will be followed up in the future when research develops to a point where it becomes justifiable to apply more formal reviews and pooled analyses The depth of evidence synthesized will differ depending on its purpose of an evidence map e.g. an overview of the extent, range and nature of research activity might entirely exclude any details of study findings (Miake-Lye et al. 2016 tion on inter-individual or intra-individual changes in actigraphy or outcomes in the clinical cases (i.e. we aimed to avoid focusing on findings of case-control studies or cross-sectional comparisons of RAR metrics between BD and HC, which have been reviewed many times before). iii) The publication reported any association between RAR metrics recorded by actigraphy and the outcome or response of BD in the short or long term (this could include e.g. naturalistic observational studies of treatment interventions and/or randomized controlled trials). iv) Presence of comorbidities was not an exclusion criterion, but details were noted. v) Findings from self-report questionnaires, consumer grade wearables or smartphone apps were eligible if the study also reported data for actigraphy recordings and/or the RAR findings extracted from questionnaires or wearables were reported alongside the equivalent metric derived from actigraphy.

Exclusion criteria-
The map aimed to capture the extent of the existing literature, so there were minimal exclusion criteria. The most significant criterion was that studies that failed to report any measures (quantity and/or or timing) of daytime activity were ineligible. Also excluded were single case reports or studies where the recruitment of participants was not based on a current diagnosis of BD or depression (UP and BD) e.g. studies of individuals at risk of BD or of individuals attending medical clinics (actigraphy has been used to assess treatment outcome in CVD and cancer).
(b) Study Coordinates. It was agreed that the following would be mapped: year and geographic location of study (we also noted if the study specified seasonality, etc.); sample characteristics (demographics; diagnoses e.g. BD subtypes, inclusion of UP and BD; phase of illness at recruitment); community or clinical setting (e.g. in-or outpatient); duration of study; duration of actigraphy recording; and outcomes reported. The latter could include: outcome prediction; response to introduction of treatment or treatment withdrawal; acute or long-term treatment effects; longitudinal course of BD (continuous or repeated cross-sectional assessment) in terms of illness progression (naturalist follow-up or associated with treatment introduction or withdrawal), symptom exacerbation, relapse/recurrence/ recovery/remission and/or putative disease course modifiers (as defined in a study). Also eligible were: naturalistic observational studies in clinician settings (unspecified/ uncontrolled treatment), naturalistic monitoring of illness with unspecified/uncontrolled treatment and/or randomized controlled trials that incorporated actigraphy (if not identified via previous criteria).
(c) Definitions of Rest-Activity Rhythms. It is clear from actigraphy studies in BD that the metrics reported have differed by location (i.e. geography) and decade of study. However, researchers have repeatedly acknowledged the close connection between sleep and circadian structure and that actigraphy data can be used for 'rhythmometric analysis' of RAR (Calogiuri et al. 2013;Smagula, 2016;Wirz-Justice, 2007). As such, all the metrics listed in Table 2 are considered as potentially relevant RAR markers in this project. Table 2 gives an overview of the key RAR metrics that have been derived from actigraphy recordings in psychiatry (definitions and descriptions of each metric are provided in Table 3 in Appendix) (Calogiuri et al. 2013;Ancoli-Israel et al. 2003). As shown, investigators have focused on different rhythmometric procedures, which need to be considered when interpreting the map. For instance, early studies were likely to report parametric statistics (especially in the USA) (Nelson et al. 1979). Three variables (mesor, acrophase and amplitude) were estimated using cosinor methods (with the p value signifying the probability that the data really show circadian periodicity), whilst more recently these metrics have been derived using regression techniques (which report similar variables but assume more complex patterns and rhythms and are more robust for larger study populations) (Fernandez and Hermida 1998). Non-parametric methods report a wider range of variables to describe the quantity and timing of activity and rest, and especially provide insights into variability/stability of rhythms and any RAR disruptions (Calogiuri et al. 2013;Natale et al. 2009). They are often preferred to parametric models and it is argued that non-parametric models better represent the complexity of RAR than cosinor models (van Someren et al. 1997). Variables derived from basic sleep analysis (such as total sleep time: TST) are probably the most widely reported measures in research in BD. One reason for this is that these sleep quantity are much easier to estimate from raw data (and do not rely on more complex algorithms). Sleep variables are useful for estimating duration and fragmentation of sleep patterns but are less reflective of circadian rhythmicity. Estimation of variability in values for each sleep parameter is encouraged in contemporary literature on RAR and greater reporting of sleep onset/offset/midpoint or sleep regularity index has been employed to give a greater insight into rhythmometrics (Bei et al. 2016).

Step 3: Literature Search
The search approach was the same as utilized in systematic reviews. Search terms were applied to PubMed, MEDLINE, PsycINFO, EMBASE, CINAHL and Web of Science databases No limits were set for language, type of study or date of publication (the original telemetric motion sensors and actometers were used in BD in the early 1970s). The search terms included the full range of RAR metrics and potential outcomes listed in Step 2, supplemented by searches using names of medications (e.g. lithium, carbamazepine) or for research groups known to have contributed publications when this topic initially emerged in the literature. This approach was used to ensure the broadest possible range of publications were identified. Furthermore, hand-searches were undertaken of all reference lists of studies noted in narrative reviews, systematic reviews, and meta-analyses on this or similar topics. The final list of eligible studies was generated based on broad relevance to the scope of this map. Key drivers of selection of publications were that: a) The study reported the use of an actigraphy device to undertake daytime or 24-h monitoring of RAR in BD b) The study endpoint was described in terms of a recognizable clinical outcome or measure of response or change in RAR (if combined with baseline or followup clinical measures).

Step 4. Charting: screening and positioning the relevant evidence within the map
Based on the specified aims of the study, we agreed three core outputs: a) Nature & quality of research: a Table of study coordinates was planned. Co-authors would review the Table independently and produce a brief written synopsis of the key characteristics of the available research. This information was used to formulate a consensus on the nature and quality of the extant literature. b) Extent of research: two maps would be generated.
The first would describe year and country of study to provide a snapshot of research in this field by geography and over time. The second map would give an overview of the most common methodologies and research themes. These would be identified by extracting information regarding study coordinates and coding each of these separately, e.g. setting (e.g. inpatient); type (e.g. naturalistic observational; treatment outcome; response criteria; etc.); reported interventions (e.g. proportion of individuals taking lithium; numbers allocated to psychological intervention); whether interventions were compared (e.g. lithium versus quetiapine). These data could be combined to show the overall pattern of research. c) A map would be generated to identify the range of RAR parameters reported across all studies and which individual metrics appeared to be consistently found to be associated with course and outcome of BD. Such a map can offer a heuristic framework and be the starting point for a systematic review or metaanalysis at a future date.

Results
Twenty-seven independent studies (reported in 29 publications) were included in the map. Table 4 in Appendix gives an overview of the studies and the core characteristics. To maintain the simplicity expounded in evidence mapping we have not included all the citations after every statement (the citations are in the reference list and the key studies can be easily identified from Table 3).
In the studies in the evidence map, the median age of all participants was 45 years and 60% were female. The median sample size was 15 per study (range: 2-75); 484 individuals with BD (out of a total of 560) participated in actigraphy and the median duration of recordings was 7 days (range 1-210). Studies before 1990 tended to be smaller, but sample size did not appear to be determined by decade of publication, with several small-scale projects or BD subsamples (included in larger samples or cohorts) being followed-up longitudinally in recent studies. Overall, 17 studies comprised wholly or partly of inpatients (63%). Two studies reported daytime activity monitoring only, one study used actigraphy primarily to confirm reported patterns of hypersomnia whilst a RCT examined changes in objective levels of morning activity following a psychological intervention targeted at sleep inertia.
There was evidence that statistical analyses of RAR have become more sophisticated over time, but earlier studies were more likely to address directly the association between RAR and phase of BD (e.g. transitions between mania, and depression), and change in RAR following initiation or withdrawal of treatment (especially lithium) (e.g. Kripke et al. 1979;Wolff et al. 1985). The sampling, design, and analytic strategies of most studies demonstrate several methodological flaws (e.g. repeated testing of small samples with correction, absence of reporting of non-significant outcomes). Overall, the consensus judged the quality of the studies to be modest. Figure 1 shows that > 50% studies (14 of 27) were undertaken in the USA and it is only in the last decade that the number of studies undertaken across Europe has matched the USA. The earliest studies reported the seminal work undertaken by Kupfer and associates and pursued by researchers at NIMH; more recent studies represent projects exploring RAR as disease course modifiers or examining RAR as potential predictors of treatment response and/or to monitor treatment outcomes. Table 3 also demonstrates that there were no studies (meeting our eligibility criteria) that reported RAR in BD for more than a decade (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006). This is most likely a consequence of the fact that psychiatry research shifted towards the reporting of metrics favoured by experts in sleep/sleep disorders, rather than focusing on metrics most relevant to specific psychiatric disorders. Reviewing publications after 2006 suggests this this trend started to reverse when BD researchers increased their focus on sleep-wake cycles and also with the rise in interest in precision diagnostics and therapeutics.
As shown in Fig. 2, only 22% studies reported follow-up assessments at > 6 months (N = 6). Naturalistic observational studies and treatment outcome studies are equally common (N = 10; 37%). In nine studies, > 50% of the sample were taking lithium or the study specifically explored RAR and lithium treatment.
Although our preliminary work for the evidence-map identified > 30 possible RAR parameters (parametric and non-parametric measures; mean or variable measures of sleep analysis; putative circadian phase markers extracted from sleep quantity analysis), the eligible studies only consistently reported estimates for seven metrics (see Fig. 3). Early studies mainly reported cosinor metrics (amplitude, acrophase and mesor), whilst the more recent studies increasingly report all raw data (either in the main text of the study or Appendices). However, in the middle decades, there was a tendency to only report statistically significant findings, without any indication as to whether other parameters were measured and/or which were found to be non-significant. Interestingly, when number of studies measuring a parameter is compared with the likelihood that a statistically significant association would be found, five variables were associated with outcome in > 50% of the studies. Namely: Variability/Rigidity of RAR (4 out of 4), 24 h rest activity cycle (22 out of 25), amplitude (5 out of 6), acrophase/phase advance (5 out of 7), and sleep efficiency (SE: 5 out of 8). In contrast, most measures of sleep quantity show weaker links: wake after It was not possible to provide specific evidence of the nature of the metrics that were most useful in different types of studies, although there was some trend of interest. Using information précised in Table 3, available evidence largely supports the view that mania is associated with circadian phase advance and BD-depression with phase delay. The latter was not obvious in studies of mixed samples of UP and BD depression. Interestingly, treatment may be associated with phase change, and treatments may have different effects on phase in BD-II, and phase shifts may can be associated with outcome at follow-up. The variable 24 h rest-activity cycle represents a global estimate of RAR and its primary value to the map is that it confirms that RAR can be used to assess a range of clinical outcomes in BD. Amplitude appeared to be a useful marker of illness course and treatment response or outcome. It was noted that variability/rigidity of RAR was only estimated in four small samples (Ns = 3,8,10,12). These studies reported extended continuous durations of actigraphy and found the metrics to be significant markers of longitudinal course (including admissions). However, the studies did not report sufficient raw data or basic analyses to allow further investigation or interpretation of the findings.

Discussion
This article explores associations between RAR metrics measured via actigraphy and course, outcome and/or treatment response in BD. The project arose from three converging strands of work. First, a background scoping exercise undertaken for the H2020-funded consortium exploring a range of putative markers of lithium response phenotypes (including actigraphy) ; second, scrutiny of RDOC publications that suggest undertaking longitudinal actigraphy to explore RAR phenotypes in BD (Insel 2014); and third, a study of circadian and sleep-wake cycles in lithium responders and non-responders (Scott et al. 2020). These projects provided us with sufficient insights into this research field and, alongside our prior knowledge, led us to conclude that the existing literature on RAR and course/outcome/ response was unlikely to be suitable for systematic review or meta-analysis. Furthermore, it appeared it would be counter-productive to use these classic evidence-based techniques as they would prematurely restrict the focus of our investigation to a specific or narrow question of what presently seems to be an imprecise research area. As such, it was agreed that utilizing evidence mapping might enable some synthesis of the diverse range of (predominantly small-scale) studies currently available. Regarding our first question (what evidence exists and what conclusions can be drawn about the quality of evidence?). We conclude that limited evidence exists, but what is available indicates that a discrete number of RAR metrics are more consistently associated with transition between different phases of BD and/or may be predictive of longitudinal course of illness or treatment response. The metrics that show the most frequent associations represent markers of the amount, timing, or variability of RAR rather than the sleep quality metrics that are frequently targeted in contemporary studies of BD. However, these putative 'circadian' signals should be viewed in context, as there is evidence of selective reporting of both metrics and outcomes, thus increasing the likelihood of publication bias. Notably, the statistical associations reported derive from multiple small-scale studies that together included < 500 individuals with BD. In sum, the findings are of interest, encourage pursuit of research in this field, but clearly expose that the quality of evidence and available published studies is modest.
Next, the second question addressed what areas are, or are not, well-researched? Our findings about the extent, nature and quality of the research, indicate that the available publications demonstrate the diverse ways in which RAR can be studied in the context of BD, but that no single area has been targeted for a prolonged period of research. Notably, the early studies showed a keen interest in treatment response (or change in illness with treatment withdrawal), but these themes were not pursued to a definitive conclusion. However, this theme is now drawing interest again and two recent studies used actigraphy to explore RAR and acute treatment outcome, namely ketamine infusions for depression (albeit in sample with many more UP than BD cases) and inpatient treatment of mania (such as Blue Blocking Glasses) (Duncan et al. 2017;Henriksen et al. 2020). The map also highlights evidence that clinical depressive symptoms are associated with robustness of circadian rhythm (e.g. Hwang et al. 2017), whilst current RAR patterns may predict future mental state (Salvatore et al. 2008). Also, in a small inpatient sample, it was shown that RAR recorded by actigraphy were associated with clinical progress, whilst estimates made with a consumer wearable were not (Averill et al. 2019). This is a useful reminder that research grade devices are still more accurate than consumer grade device and apps (and so actigraphy is likely to continue to have a role in the immediate future). Importantly, individuals with BD are prepared to wear actiwatches and/or consumer devices for extended periods (> 6 months) and this, allied with new conceptualizations of and analytic approaches to of RAR (such as fractal activity as an indicator system adaptability and lagged data analyses, etc.), may lead to a greater understanding of the chronological sequence of change of RAR, mood, cognition and other cardinal features of BD (Knapen et al. 2020;Walker et al. 2020).
The final question asked, based on the available evidence, what are the potential ways forward for the field? We suggest that one important option would be to develop a consensus within research consortia or wider groups of clinicians and investigators regarding the most appropriate RAR metrics to include in future actigraphy studies of BD (Scott et al. 2017). It is particularly important that this dialogue focuses on metrics that offer insights into RAR in BD that will have the optimal clinical as well as research utility for this field. This should be prioritized over simply generating a list of RAR measures recommended for estimation or reporting of sleep or sleep-wake research in general (Sack et al. 2007). These discussions will necessarily consider any RAR metrics that may represent trait or state variables (e.g. there is emerging evidence that relative amplitude, amplitude, interdaily stability and intradaily variability may be markers of disease onset as well as disease progression) (Merikangas et al. 2019;Scott et al. 2020). Developing a consensus is perhaps the most important recommendation that arises from this mapping exercise, as without such a course of action there is a clear possibility that we will continue to lack the means to make detailed crossstudy comparisons of the metrics that would offer most promise as diagnostic or response biomarkers.
An obvious gap in the available literature is the lack of youth studies of actigraphy that take a longitudinal perspective. Given the interest in staging models and temperament, research on the transition between subthreshold conditions or at-risk mental states and the onset of full-threshold illness episodes would be welcome. In individuals with established BD, consideration should be given to RAR metrics that might be combined in multi-platform research of precision diagnostics and therapeutics ). For instance, precision studies that focus on prediction of response to mood stabilizers such as lithium, and the identification of biosignatures (rather than single biomarkers) might consider giving priority to reporting RAR markers such as amplitude and variablility (given our mapping findings regarding emerging evidence for their importance). The mapping exercise demonstrated that actigraphy studies are viable in inpatient as well as outpatient settings, as such, wider consideration should be given to monitoring inpatient progress using objective measures such as actigraphy, with the option of continuation to post-discharge settings. Of course, longitudinal monitoring that allows early warning signs of relapse or recurrent is also feasible with this ecological measure Ritter et al. 2011). The above steps would address some of the explicit aims outlined by RDOC publications but could be valuable for clinicians e.g. they might consider using electronic monitoring or including sleep diaries as part of the clinical assessment of BD. The latter could be scrutinized not only to identify mean sleep duration and timing but also variability between weekdays and weekends or large differences in duration of sleep onset/offset/ duration over 2-3 weeks.
Whilst consumer grade wearables and smart phone apps are attractive to many, it is important to remember that they all await regulatory approval for use as clinical devices (and commercial developments of longer battery life to avoid repeated need for charging). However, their use alongside actigraphy could be helpful, both to validate different measures of RAR that might be derived from consumer devices but also to facilitate ecological momentary assessment (EMA) research (Merikangas et al. 2019;Depner et al. 2020). Research in this area is ongoing and has the advantage of combining self-report ratings of events, cognition, energy, and mood states with objective RAR data (de Wild-Hartmann et al. 2013;Merikangas et al. 2019). However, these advances necessarily require researchers to work more closely alongside statisticians and computational data scientists as the optimal methods for analyzing and understanding links between RAR, symptoms and behaviours are more sophisticated (including non-linear dynamics, entropy) than the basic approaches that are most commonly used and/or the algorithms that are currently available. Assuming this dialogue can occur, combining these approaches would allow to comprehensive exploration of the course of illness, prediction of treatment response and a range of patient reported outcomes (Moskowitz and Young, 2006;Marino et al. 2013). For clinicians, actigraphy offers access to objective ecological monitoring that provides a far more reliable and valid measure of RAR than can be obtained by self-rating scales (Depner et al. 2020;Mulligan 2016). However, not all clinicians feel confident in extracting data and metrics from the devices and this is likely to mitigate against wider dissemination beyond specialist settings. Another advantage of actigraphy is that it is a more feasible option for routine clinical practice than putative 'omic' and brain scanning markers of treatment response or illness outcome as these are less available, more expensive and often concentrated in research settings (Gooley and Chua 2014). Also, if the focus is on RAR, then actigraphic monitoring would be easier to integrate into a clinical management and monitoring package than evaluation of other objective circadian markers e.g. measurement of dim light melatonin onset (DLMO) (Depner et al. 2020).

Conclusions
Despite 50 years of research, and several papers identified by this evidence map being highly cited, use of actigraphy to assess RAR in longitudinal research and examination of these metrics and treatment response, course and outcome of BD is under-investigated. This is in marked contrast to the extensive literature on casecontrol or cross-sectional studies of actigraphy, especially typical sleep analysis metrics in BD, which are reported in independent studies, systematic reviews, meta-analyses and meta-regressions. The most obvious reason for this marked disparity in the amount of research undertaken using these different methodologies, is that case-control and cross-sectional studies are easier to undertake, usually involve much briefer time frames, and have lower resource requirements than longitudinal research. Also, it was notable that prospective studies identified in this evidence map were still predominantly shorter-term projects, often reporting outcomes over weeks rather than months.
This evidence map suggests that the existence of several often-quoted publications has probably led to an overestimation of the extent and robustness of existing evidence of longitudinal associations between RAR and treatment response, course, and outcome in BD. However, we are aware of promising emerging findings in this area (McCarthy et al. 2019;Scott et al. 2020) and there is tremendous potential for expanding this field of research to inform precision medicine projects in psychiatry. Further, there is a strong argument that, rather than actigraphy being replaced by consumer grade devices and apps, it may be more fruitful to establish frameworks for clinical and research projects that combine these approaches.

Data availability statement
Data sharing is not applicable to this article as no datasets were generated or analysed during the current study. All the data referred to in this manuscript are publicly available in the manuscripts describing the original studies.

Ethics approval and consent to participate
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.

Consent for publication
Not applicable.

Competing interests
The authors report no financial affiliation or other relationship relevant to the subject matter of this article. JS is a Visiting Professor at Diderot University, Paris and an Honorary Professor at Kings College, London. AY declares paid lectures and advisory boards for the following companies with drugs used in affective and related disorders: AstraZeneca, Eli Lilly, Lundbeck, Sunovion, Servier, Livanova, and Janssen. He is a paid consultant to Johnson & Johnson, is lead Investigator for Embolden Study (AstraZeneca), BCI Neuroplasticity study and a pharma sponsored Aripiprazole in Mania study. His institution has received funding for investigator-initiated studies from AstraZeneca, Eli Lilly, Linova, Lundbeck, Wyeth, and Janssen. See Tables 3 and 4.

Table 3 Parameters reported in studies of Rest-Activity Rhythms: additional details
Cosinor model descriptions from Nelson et al (1979); nonparametric approach descriptions from Van Someren et al (1997). Sleep quantity analysis variables and descriptions as from Natale et al (2009) Interdaily stability Quantifies the degree of resemblance between the activity patterns on individual days; ranges from 0 to 1 and may typically be about 0.6 for healthy adults Intradaily Variability Quantifies the fragmentation of periods of rest and activity; ranges from 0 to 2 and typically is < 1 for healthy adults, with higher values indicating a more fragmented rhythm All participants were inpatients who initiated lithium treatment Duration of sleep increased by > 60 min after starting lithium (~ 6.3 h to 7.5 h). When data were reviewed for mania and depression separately, individuals with mania showed reduction in motor activity in parallel with symptom reduction, whilst individuals with depression showed increased motor activity with reduction in symptoms Kripke, USA (1978) 7 BD: across episodes Adults assessed through complete cycles of mania and depression. In 5 individuals there was evidence that a circadian rhythm 'free-ran' fast & in 5 individuals, evidence suggested that lithium slowed a circadian rhythm. The authors concluded that the prophylactic benefit of lithium may derive from slowing or delaying an overfast circadian clock to prevent desynchronization  BD Scott et al. Int J Bipolar Disord (2020) 8:37  Long-term lithium was discontinued, and outpatients were followed for 3 months of placebo treatment and up to 12 months.
Two individuals experienced manic relapse with symptoms commencing within 2 wks of discontinuation, 2 remained well. Relapse was characterized by fragmentation of sleep-wake patterns, disorganization of 24 h rest-activity cycles & increased motor activity, especially at night. The latter was accompanied by a marked reduction in SE Klein, USA (1992) 10 BD, in remission 49 ± 12 b 3d with lithium; 3d at ~ 3 wks post-lithium withdrawal As with the previous publication, long-term lithium was discontinued, and outpatients were followed for 3 months of placebo treatment. Seven experienced manic relapse: 4 within 3 weeks and all 7 within 3 months of lithium discontinuation. These individuals had higher baseline levels of motor activity (daytime) even though asymptomatic and showed higher daytime & night-time activity (during follow-up) than individuals who remained well (N = 3) Scott et al. Int J Bipolar Disord (2020) 8:37

30-200d
Naturalistic longitudinal follow-up of individuals attending a lithium clinic who were invited to wear actiwatch for > 1 month (many continued for > 6 months) Lower mean daytime activity (counts/minute) and variability in sleep pattern/duration (alongside psychological measures such as self-esteem) was associated with shorter time to hospital admission during ~ 15-month follow-up (but admissions according to polarity were not reported) Scott et al. Int J Bipolar Disord (2020) 8:37  The best signal for identifying relapse was for IS (inter-daily stability)-a marker of regularity of rest-activity patterns (overall classification ~ 66%) Scott et al. Int J Bipolar Disord (2020) 8:37  Phase advance or delay observed in biochemical rhythms (such as cortisol) were much greater than sleep-wake cycle changes identified by actigraphy

BD Depression
= 2 49 ± 9; 66% 7d 12 individuals (10 inpatients) completed the study. Actigraphy was not undertaken during sleep. Individuals with higher activity levels at baseline showed a larger reduction in symptoms of depression after one week. Change in severity of depression correlated with increased in amount of daytime activity (with significance for some but not all ratings). Activity also showed a lagged correlation with mood rating in the following hours (but not vice versa) Duncan, USA  This either indicates that data were reported as medians and/or data were missing for some variables; # The same sample of BD cases was studied in different mental states; $ Analysis reported by Scott (2011) is described in Scott et al. (2017)