If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Defitech Chair of Clinical Neuroengineering, Center for Neuroprosthetics (CNP) and Brain Mind Institute (BMI), Swiss Federal Institute of Technology Lausanne (EPFL), Geneva, SwitzerlandDefitech Chair of Clinical Neuroengineering, Clinique Romande de Réadaptation, Center for Neuroprosthetics (CNP) and Brain Mind Institute (BMI), Swiss Federal Institute of Technology (EPFL Valais), Sion, Switzerland
Defitech Chair of Clinical Neuroengineering, Center for Neuroprosthetics (CNP) and Brain Mind Institute (BMI), Swiss Federal Institute of Technology Lausanne (EPFL), Geneva, SwitzerlandDefitech Chair of Clinical Neuroengineering, Clinique Romande de Réadaptation, Center for Neuroprosthetics (CNP) and Brain Mind Institute (BMI), Swiss Federal Institute of Technology (EPFL Valais), Sion, Switzerland
Corresponding author. Defitech Chair of Clinical Neuroengineering, Center for Neuroprosthetics (CNP) and Brain Mind Institute, Swiss Federal Institute of Technology (EPFL), Campus Biotech, Chemin des Mines 9, 1202 Geneva, Switzerland.
Defitech Chair of Clinical Neuroengineering, Center for Neuroprosthetics (CNP) and Brain Mind Institute (BMI), Swiss Federal Institute of Technology Lausanne (EPFL), Geneva, SwitzerlandDefitech Chair of Clinical Neuroengineering, Clinique Romande de Réadaptation, Center for Neuroprosthetics (CNP) and Brain Mind Institute (BMI), Swiss Federal Institute of Technology (EPFL Valais), Sion, SwitzerlandClinical Neuroscience, University of Geneva Medical School, Geneva, Switzerland
Manuscript: “Predictive models for response to non-invasive brain stimulation in stroke: a critical review of opportunities and pitfalls”
•
Need for personalized NIBS application to improve response rates and effect size of individual patients
•
Allocation of personalized NIBS relies on robust predictive models
•
Most work on NIBS response-determining factors is based on in-sample associations
•
Associations within a sample are not valid for making generalizable predictions, internal and external validation needed for generalizability
•
Large and unbiased data sets are critical to establish good predictive models towards personalized NIBS
Abstract
Background
Noninvasive brain stimulation has been successfully applied to improve stroke-related impairments in different behavioral domains. Yet, clinical translation is limited by heterogenous outcomes within and across studies. It has been proposed to develop and apply noninvasive brain stimulation in a patient-tailored, precision medicine-guided fashion to maximize response rates and effect magnitude. An important prerequisite for this task is the ability to accurately predict the expected response of the individual patient.
Objective
This review aims to discuss current approaches studying noninvasive brain stimulation in stroke and challenges associated with the development of predictive models of responsiveness to noninvasive brain stimulation.
Methods
Narrative review.
Results
Currently, the field largely relies on in-sample associational studies to assess the impact of different influencing factors. However, the associational approach is not valid for making claims of prediction, which generalize out-of-sample. We will discuss crucial requirements for valid predictive modeling in particular the presence of sufficiently large sample sizes.
Conclusion
Modern predictive models are powerful tools that must be wielded with great care. Open science, including data sharing across research units to obtain sufficiently large and unbiased samples, could provide a solid framework for addressing the task of building robust predictive models for noninvasive brain stimulation responsiveness.
]. Patients are often significantly limited by the functional consequences of their stroke. For instance, roughly 65% of stroke survivors cannot incorporate their affected hand into their activities of daily living six months post-stroke, roughly 50% have cognitive impairments in their chronic phase, and only around 25% return to their full pre-stroke level of everyday participation and physical functioning [
]. This highlights the great necessity of developing novel therapeutic strategies to promote stroke rehabilitation. Several interventions aiming at promoting underlying brain plasticity are currently under investigation [
]. In this review, we will discuss non-invasive brain stimulation (NIBS) techniques, provide an overview of the current applications, and discuss possible future steps towards precision medicine and the prerequisites for accomplishing this task.
Currently, the most widely used NIBS techniques are transcranial direct current simulation (tDCS) and repetitive transcranial magnetic stimulation (rTMS), for details please see panel 1. In stroke rehabilitation research, NIBS mostly aims at (i) rectifying unhealthy brain states such as maladaptive disbalanced interhemispheric inhibition or (ii) reinforcing effects of training-based interventions [
]. For example, initial research studying anodal (excitatory) tDCS applied to the ipsilesional primary motor cortex (M1) in chronic motor stroke patients documented positive effects on paretic hand function [
]. This initial work was conducted about 15 years ago and has been partially replicated and extended in following years, for instance by combining NIBS with behavioral training paradigms [
Modulation of training by single-session transcranial direct current stimulation to the intact motor cortex enhances motor skill acquisition of the paretic hand.
Combined transcranial direct current stimulation and robot-assisted arm training in subacute stroke patients: an exploratory, randomized multicenter trial.
]. However, the further clinical translation of the approach is limited so far. Based on current guidelines, strongest level A evidence (definite efficacy) has only been reported for contralesional low-frequency M1 rTMS targeting motor impairments in the subacute phase [
]. The application of ipsilesional high-frequency M1 rTMS for enhancing motor function in the post-acute phase and low-frequency rTMS of the right inferior frontal gyrus targeting non-fluent aphasia in the chronic phase has reached level B evidence (probable efficacy) [
]. For the tDCS technique, there is currently not sufficient data available to reach a solid recommendation on possible therapeutic efficacy for motor stroke or aphasia [
One possible cause for the limited clinical translation of the discussed NIBS approaches to date is the emergence of distinct responder and non-responder patterns. Examples of this bifurcation include lower responsiveness to rTMS protocols in patients with cortical lesions [
Transcranial direct current stimulation (tDCS) for improving activities of daily living, and physical and cognitive functioning, in people after stroke.
]. It has been proposed that the heterogeneity in response rates might be reduced by tailoring the NIBS protocols to the phenomenological subgroup (precision medicine) or individual patients (personalized medicine) [
]. Yet, these approaches have not been convincingly tested so far.
We argue that there are two critical prerequisites for testing a precision medicine approach. First, a sufficient number of effective stimulation protocols must be available. In this regard, research has extended the spatial parameter space by investigating alternative stimulation targets, such as the cerebellum [
Rethinking stimulation of the brain in stroke rehabilitation: why higher motor areas might be better alternatives for patients with greater impairments.
Stimulation targeting higher motor areas in stroke rehabilitation: a proof-of-concept, randomized, double-blinded placebo-controlled study of effectiveness and underlying mechanisms.
]. Furthermore, recent technical developments such as of state-dependent electroencephalography-triggered transcranial magnetic stimulation (TMS) have enlarged the temporal parameter space [
In this review, we focus on the second prerequisite – the development of robust models for predicting response rates for the respective NIBS interventions. We will summarize and discuss current approaches, which largely rely on in-sample statistical associations (data mining). These approaches are valid for gaining insights about potential underlying mechanisms or for identifying possible candidate predictor variables for further testing. However, it is critical to note that in-sample statistical associations alone are insufficient evidence for claims of out-of-sample predictive power [
]. If the field continues to rely on data mining/associational approaches, it will be unable to identify robust biomarkers for predicting NIBS responsiveness. However, a shift towards the appropriate use of predictive modeling techniques may overcome this limitation. In the second section of the review, we will discuss the basics of predictive modeling approaches, providing a brief overview specifically targeted to an audience of applied translational neuroscientists and clinicians. Furthermore, we highlight critical prerequisites and limitations of predictive modeling techniques. Hereby, we strive to inform future study designs assessing predictors/biomarkers of NIBS responsiveness. If successful, this could be an important step towards developing NIBS protocols suitable for precision medicine.
2. Part 1: current status of associational approaches
Initial research has established in-sample associations between parameters derived from different domains spanning from simple behavioral to more complex electrophysiological or imaging-based metrics and NIBS responsiveness. Some important exemplary studies are discussed below, for a detailed overview please see Table 1. It is tempting to overgeneralize these findings and make interferences beyond the studied sample. However, this approach has several pitfalls, as discussed in detail in the section on predictive modeling, and should be avoided. Yet, in-sample associations can be useful to generate insights into potential underlying mechanisms or for identifying candidate predictors to be tested in future studies.
Table 1Overview of identified associational studies relating parameters derived from different assessment domains with response to NIBS. ∗: 10–20 EEG positions F3-anode, F3-cathode, F4-anode, or F4- cathode to return electrode at contralateral mastoid; ∗∗: with non-parametric permutation testing; ∗∗∗: stepwise backward selection; AI: activity index scale (disability); ANOVA: analysis of variance; ARAT: Action Research Arm Test; BCI: brain-computer interface; cM1: contralesional M1; CBS: Catherine Bergego Scale; COR: correlation analysis (unspecified); CORpa: partial correlations; CORpe: Pearson Product-Moment Correlation; CORsp: Spearman's Rank-Order Correlation; CORspm: correlation analysis, behavioral covariate was added to SPM design matrix; cPPC: contralesional, left posterior parietal cortex; CST: corticospinal tract; cTBS: continuous theta burst stimulation; DCM: dynamic causal modeling; DD: directional diffusivity; DTI: diffusion tensor imaging; ERSα: event-related synchronization in α-band; FA: fractional anisotropy; FMA-UE: Fugl-Meyer Assessment for upper extremity; fMRI: functional magnetic resonance imaging; fNIRS: functional near-infrared spectroscopy; FT: maximum index finger tapping frequency; GABA: γ-amino butyric acid; GF: maximum grip force; iM: ipsilesional M1; iSP ratio: ipsilateral silent period (contralesional/ipsilesional); iTBS: intermittent theta-burst stimulation; JTT: Jebsen Taylor hand function test; LF: low frequency; M1-M1: bihemispheric montage; MEG: magnetoencephalography; MEP: motor evoked potential; MFT: manual function test; MLR: multiple linear regression; MULR: multinomial logistic regression; MRI: magnetic resonance imaging; MRS: magnetic resonance spectroscopy; MVRA: multivariate regression analysis; MWUT: Mann-Whitney U test; Oz: occipital zero position of EEG 10–20 system; PRI: power ratio index (δ+θ)/(α+β); RT: reaction time; rTMS: repetitive transcranial magnetic stimulation; S1: primary somatosensory cortex; SICI: short intracortical inhibition; SLR: simple linear regression; struct: structural MRI; TCT: thalamocortical tract; tDCS: transcranial direct current stimulation; TMS: transcranial magnetic stimulation; VAS: visual analog scale for self-assessment of the level pain; VLSM; voxel-based lesion-symptom mapping.
Effects of low-frequency repetitive transcranial magnetic stimulation of the contralesional primary motor cortex on movement kinematics and neural activity in subcortical stroke.
Percentage of improvement in finger tapping frequency (in relation to control stimulation)
MRI - fMRI
CORspm
no
Overactivity (higher) of the contralesional dorsal premotor cortex, contralesional parietal operculum, and ipsilesional mesial frontal cortex at baseline was associated with the level of improvement (more) in hand motor function after the intervention
Differential effects of high-frequency repetitive transcranial magnetic stimulation over ipsilesional primary motor cortex in cortical and subcortical middle cerebral artery stroke.
Differential effects of high-frequency repetitive transcranial magnetic stimulation over ipsilesional primary motor cortex in cortical and subcortical middle cerebral artery stroke.
Modulation of training by single-session transcranial direct current stimulation to the intact motor cortex enhances motor skill acquisition of the paretic hand.
Effect of baseline brain activity on response to low-frequency rTMS/intensive occupational therapy in poststroke patients with upper limb hemiparesis: a near-infrared spectroscopy study.
Baseline motor impairment predicts transcranial direct current stimulation combined with physical therapy-induced improvement in individuals with chronic stroke.
Prediction of motor recovery in the upper extremity for repetitive transcranial magnetic stimulation and occupational therapy goal setting in patients with chronic stroke: a retrospective analysis of prospectively collected data.
It is of note that the procedures for determining a responsiveness metric differ across studies and there is no unique and universally accepted way. Some examples are computing absolute (delta) or relative changes (percentage, ratio) to control stimulation (e.g. Ref. [
Effects of low-frequency repetitive transcranial magnetic stimulation of the contralesional primary motor cortex on movement kinematics and neural activity in subcortical stroke.
Differential effects of high-frequency repetitive transcranial magnetic stimulation over ipsilesional primary motor cortex in cortical and subcortical middle cerebral artery stroke.
Prediction of motor recovery in the upper extremity for repetitive transcranial magnetic stimulation and occupational therapy goal setting in patients with chronic stroke: a retrospective analysis of prospectively collected data.
Differential effects of high-frequency repetitive transcranial magnetic stimulation over ipsilesional primary motor cortex in cortical and subcortical middle cerebral artery stroke.
]). This heterogeneity in the applied procedures has to be considered when comparing NIBS response rates across studies.
2.1 Clinical and behavioral characteristics
Several studies from different behavioral domains were able to associate standardized clinical scales with the magnitude of the NIBS response. For instance, O'Shea and colleagues were able to establish an association between a higher (better) Fugl-Meyer Assessment score for the upper extremity (FMA-UE) in combination with a longer time interval since the stroke; and a larger responsiveness towards a cathodal (inhibitory) contralesional M1 tDCS protocol in their cohort (N = 13) of chronic stroke patients [
]. The detected two-factor association is in line with emerging evidence from imaging studies suggesting a remaining supportive role of the contralesional M1 for severely impaired patients well into the chronic stage [
], which likely was disturbed by the cathodal tDCS application.
Another example is taken from the language domain. Norise and colleagues studied patients with first time, single, left-hemispheric, chronic stroke (N = 9) applying ten sessions of an individualized tDCS protocol [
]. The applied protocol was chosen from four possible active conditions (anodal tDCS or cathodal tDCS applied to left or right frontal lobe) based on which elicited the best response in a montage-finding phase. Responsiveness to the best individualized, active stimulation condition was associated with a fluency item of the Boston Diagnostic Aphasia Examination at baseline, indicating larger responsiveness for patients with a stronger baseline impairment. The authors speculate that the larger responsiveness of severely impaired patients might be due to an extended “recovery window” well into the chronic phase and resulting higher susceptibility for the NIBS intervention.
A further important feature determining responsiveness might be the location of the lesion. This association was for instance established in the important study of Ameli and colleagues [
Differential effects of high-frequency repetitive transcranial magnetic stimulation over ipsilesional primary motor cortex in cortical and subcortical middle cerebral artery stroke.
]. The authors applied a high-frequency rTMS protocol to ipsilesional M1 studying effects on finger and hand tapping movements in stroke patients (N = 29) from different recovery stages (acute to chronic). For this purpose, the patients were classified, using high-resolution structural magnetic resonance imaging (MRI), into a subcortical lesion group and a group with additional cortical lesions. The analysis indicated a significant interaction between lesion location and intervention, suggesting a higher susceptibility for patients with a subcortical-only lesion pattern. This suggests that the functional integrity of the stimulation site might be critical. The notion is supported by computational modeling work documenting that the altered electrical properties in stroke regions affect the stimulation currents in magnitude, location, and orientation; and these alterations may result in variable response rates across patients [
Different TMS-based parameters have been associated with NIBS susceptibility. A simple and straightforward approach is to assess the presence versus absence of motor evoked potentials (MEPs) in the affected limb. The approach is integrated in workflows aiming at predicting recovery of motor function after stroke, such as the PREP2 algorithm [
]. In addition, presence of MEP response has been related with NIBS response rates. For example, Lee and colleagues were able to associate a presence of MEP with good responsiveness to a high-frequency rTMS protocol applied for two weeks to the affected hemisphere of subacute stroke patients (N = 29) [
]. In this regard, assessments of GABAA-ergic neurotransmission, for instance via studying the paired-pulse short intracortical inhibition protocol, likely contains valuable information as GABA serves a critical role in mediating motor learning processes, stroke recovery, and tDCS effects [
]. An example of this approach was the study of Zimerman and colleagues, who showed a relationship between contralesional M1 cathodal tDCS-induced short intracortical inhibition modulation and tDCS-responsiveness quantified via the online improvement in a hand skill learning task in chronic stroke patients (N = 12) [
Modulation of training by single-session transcranial direct current stimulation to the intact motor cortex enhances motor skill acquisition of the paretic hand.
]. Structural and functional alterations to the network can be well captured with MRI-based imaging techniques. A key structure for successful recovery of motor function is the cortico-spinal tract [
]. In their seminal study of chronic stroke patients (N = 15), they were able to describe an association between better structural integrity measures of the cortico-spinal tract and larger gains in motor function following a combined protocol of bihemispheric (M1-M1) tDCS and physical/occupational therapy. An alternative approach is the evaluation of effective connectivity measures for example applying the dynamic causal modeling technique. Utilizing this approach, Diekhoff-Krebs and colleagues found that in chronic stroke patients (N = 14), both stronger positive coupling between ipsilesional supplementary motor area and stronger negative coupling between ipsilesional M1 and contralesional M1 were associated with better motor response to intermittent theta-burst stimulation of ipsilesional M1 [
Finally, different assessment domains may be combined in one statistical model. A recent example originates from the study by Kuo and colleagues of subacute stroke patients (N = 18) [
]. The authors assessed a set of clinical, TMS-based and magnetoencephalography-based metrics. Using a stepwise multiple regression approach with backward elimination, the authors identified the baseline ipsilateral silent period ratio as more influential than five other independent variables (age, gender, baseline FMA-UE score, baseline ipsilesional-to-contralesional MEP ratio, baseline ipsilesional-to-contralesional alpha event-related synchronization ratio) in determining responsiveness towards a combined intervention of M1-M1 tDCS and paretic hand exercise. Specifically, larger ipsilateral silent period ratios were associated with a lower responsiveness to tDCS. An advantage of such approaches is that they potentially account for several mechanistically independent factors. However, a clear downside is that the winning model depends heavily on the feature selection procedure.
2.5 Perturbation probe-based approaches
A complementary approach to the described associational studies is the use of TMS-induced “virtual lesion” (VL) experiments, which allows for studying causal brain-behavior relationships [
]. The VL approach takes advantage of the capability of single TMS pulses or short trains of repetitive pulses to temporarily disrupt the functionality of a given cortical target. By doing so, it is possible to quantify the impact on behavior or imaging metrics of “shutting off” the target. The approach has provided valuable insights for better understanding of brain network alterations post-stroke. For instance, Lotze and colleagues showed in a VL experiment that contralesional motor areas played a supportive role in organizing complex finger movements in the studied sample (N = 7) of narrowly selected (only internal capsule lesions), well recovered chronic stroke patients. This demonstrates that contralesional motor areas do not always have a maladaptive role as suggested by interhemispheric competition models [
]. In a more recent example, Hensel and colleagues showed that “virtually lesioning” the contralesional anterior intraparietal sulcus improved performance on a tapping task, suggesting a maladaptive role of this region in the studied sample of acute first-time stroke patients (N = 14) [
]. Furthermore, the VL approach has shown promise in tracking the longitudinal change of the recovery-facilitating role of particular brain regions. A study by Tscherpel and colleagues [
] of patients with first-ever ischemic stroke in the left hemisphere and mild to moderate motor deficit (N = 14) found that the time-sensitivity of interference to contralesional frontoparietal areas is region-specific. Another study by the same group [
] found that slow and simple electroencephalogram responses to TMS were associated with both severe motor impairment and poor motor recovery of stroke patients (N = 25).
The described VL approach could also allow for assigning individual patients to a tailored NIBS intervention when applied as a baseline probe preceding the actual interventional phase. As described in a detailed review by Morishita and colleagues, a VL probe could be used to determine the functional role of the contralesional M1 for an individual patient in a recovery-phase-specific manner [
]. In the described study concept, patients would receive excitatory NIBS to the contralesional M1 following a detrimental response to the VL probe. Conversely, a beneficial response would result in the application of an inhibitory NIBS protocol to the contralesional M1. As illustrated, the VL approach could be one strategy to circumvent the dilemma of translating findings from associational studies to individual out-of-sample patients when applied before the allocation of an individual patient to a tailored intervention.
3. Part 2: predictive modeling
As we saw in Part 1, many studies have examined responsiveness of stroke patients to NIBS interventions. Since research on NIBS for treatment of stroke is still in a preclinical stage, these studies are primarily focused on finding associations and correlations between responsiveness to NIBS, broadly defined, and various individual factors.
3.1 Associational studies
Data mining studies, which seek to find associations and correlations between variables of interest are a necessary prerequisite for precision medicine, because they generate the insights and hypotheses that may eventually lead to more robust claims. The only rule of analytics is that any insights derived from a sample can only be considered valid on that sample, and attempts to generalize the insights to a larger population must be made with an open mind. This is particularly true when samples are small and/or biased, as is the case in most studies cited in Part 1.
3.2 Precision medicine
The ultimate goal of much healthcare research today is the establishment of precision medicine, defined by the United States National Institutes of Health and Precision Medicine Initiative as “an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment and lifestyle for each person.” Precision medicine requires mathematical models of disease to provide actionable information to patients and caregivers (see Fig. 1A), which in turn requires the ability of such models to predict with reasonable accuracy the course of recovery of a patient (or group) based on demographic or clinical characteristics of the patient (or group). In short, precision medicine requires the ability to make predictive models. Importantly, such models should be robust, meaning that they should perform well on all subsets of the target population. Mathematical models in healthcare can largely be divided into rules-based and machine learning models.
Fig. 1Rules-based and machine learning models in healthcare. The problem to solve is the following (A): given a patient with certain characteristics, the decision-making machine should output a recommended course of treatment or a prediction of responsiveness to a particular course of treatment. The decision-making machine may be produced in a rules-based manner (B) in which human experts explicitly hard-code the rules. Alternatively, it may be produced in an ML-based manner (C) in which humans provide data and an algorithm infers the rules. Please note the greater number of knobs in (C) compared to (B); the number of knobs, which represent model parameters, is a reasonable proxy for model complexity. Supervised ML models (D, left column) are trained on data affixed with labels; if the labels are continuous numbers, we refer to a regression problem, while if they are discrete categories, we refer to a classification problem. Unsupervised ML models (D, right column) are trained on data without labels; common problems include finding clusters in the data, and dimensionality reduction, i.e. reducing the number of features in the data while retaining most of the information. A key metric for ML models is whether they perform equally well on data used for training as on other data; failure to do so is known as overfitting (E). To illustrate common causes of overfitting, we plotted some deliberately uncorrelated random x and y data (E, left column) and fit polynomial equations to the data (E, right column). In each subplot, the in-sample data was denoted by black dots and the out of sample data by blue crosses, while the polynomials increase in complexity from linear to quadratic to cubic. Overfitting, therefore, is the extent to which the fitted models, denoted by red lines, model the black dots far better than the blue crosses. Please note that overfitting is least prominent with large sample sizes and simpler models. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
A rules-based model (see Fig. 1B) is one in which decision rules are explicitly hard-coded by experts relying on their knowledge and experience. Examples of rules-based models include the various triage protocols adopted by hospital emergency departments, many of which are described in a review by Bazyar and colleagues [
]. These models are typically expressed as logical flow charts.
Such models are typically transparent, and can thus be easily passed through the filter of common sense. They are easy to use, providing an unequivocal answer with great speed and minimal requirements for data infrastructure.
On the downside, their data-agnostic and non-evidence-based nature makes them potentially biased by the opinions of the experts who created them. For instance, as Jenkins and colleagues point out, it is possible that triage protocols initially developed by the military may incorporate decisions geared towards advancing mission objectives in addition to purely medical considerations [
A machine learning (ML) model is one in which decision rules are implicitly inferred by computer algorithms from data provided to the computer (see Fig. 1C) [
]. In the same way that human learning is the process of gradually improving performance at a task by sensory input and practice, machine learning is the process of computers improving performance through data and algorithms. ML models are characterized by an overall architecture and hyperparameters chosen by humans; and parameters, which are automatically fit to, or “learned” from human-provided data.
Most ML models can be categorized as either supervised or unsupervised. Supervised ML seeks to create models of the relationship between predictor variables X and response variables Y. The term “supervised” refers to the fact that the responses can serve as a ground truth to which predictions can be compared; the Y is “supervising” the X. Common problems for supervised ML are regression (response variables are continuous numbers) and classification (response variables are discrete categories). Unsupervised ML seeks to find patterns within the data set, i.e. we have only the X, not the Y. Common problems for unsupervised ML are clustering (grouping data points according to a notion of similarity) and dimensionality reduction (eliminating or combining features in the data while preserving most of the information within). Illustrations of these types of ML problems are given in Fig. 1D.
On the downside, the well-known computer science saying “garbage in, garbage out” distills a fundamental truth about all ML models: one cannot outsource thoughtfulness and common sense to algorithms. The quality of the final model depends heavily on the choice of model architecture and hyperparameters that make up the initial model, and on the quality of the data used to train the model. In particular, attempting to create a quality ML model without datasets that are sufficiently large and unbiased is a fool's errand.
3.5 ML in stroke treatment
ML has great potential in assisting stroke treatment. Tozlu and colleagues found that ML algorithms such as elastic net or random forest applied to demographic, clinical and imaging variables show some promise in predicting stroke patients’ (N = 102) motor function after a rTMS intervention, which is a regression problem [
]. Furthermore, it has been found that support vector machines showed promise in the classification problem of predicting whether stroke patients would have good or poor motor outcome, using functional MRI data (Rehme and colleagues, N = 21 [
As mentioned earlier, when doing predictive modeling as opposed to data mining, robustness of a model, i.e. its ability to perform well on out-of-sample data as well as on in-sample data, is the true test of the validity of a model (see Fig. 1E). Indeed, a model which performed well on sample data, but poorly out-of-sample, would be misleading and therefore probably worse than no model at all in the view of the general population. Three phenomena can cause this to occur. The first is overfitting, in which a model fits the noise in the data more than the underlying population, as illustrated in the right half of Fig. 1E. The second is sampling bias, the phenomenon whereby a model trained on biased data will produce a similarly biased outcome. For instance, in the left half of Fig. 1E, the x and y sample data are uncorrelated, which can occur even if they are correlated in the broader population. The third is data snooping, in which the ability of a subset of the data to assess model quality is compromised if that data has affected any step of the learning process.
3.7 Avoiding overfitting by regularization and choosing simple models
A model's complexity is largely a function of how much flexibility it has to contort itself to fit to the data. Thus, researchers should adopt the principle of Occam's razor, which in the context of ML states that simpler models are to be preferred. This is well illustrated in Fig. 1E, which shows the more complex models overfitting the training data at the expense of wildly missing the non-training data.
Models are trained by fitting the parameters to minimize some quantity considered a proxy for “model badness.” If Occam's razor suggests a syllogism between “complex” and “bad,” then it makes sense to incorporate complexity into our definition of “badness,” thereby causing the training process to favor simpler versions of a given model. This penalization of complexity is known as regularization. Some well-known examples regularization in linear regression include lasso regression, which drives many coefficients to equal zero, ridge regression, which drives many coefficients to be close to zero, or elastic net, which is a hybrid of lasso and ridge.
Regularization, as well as a preference for simpler model architectures to begin with, are powerful tools against overfitting, which researchers should keep in mind.
3.8 Combating sampling bias by data sharing
Combating sampling bias is more straightforward, albeit likely a harder pill for the research community to swallow: it requires that data, gathered at considerable expense, be freely shared across labs. While this may seem counter to the competitive aspect of modern research, encouraging trends in this direction are clearly visible: the number of PubMed articles matching the term “open data” is growing steadily, from 6122 in 2010 to 19,300 in 2020. Other forces moving the community towards more open data are outlined by McKiernan and colleagues [
3.9 Avoiding data snooping by train-validate-test separation
While it is common to see the terms “validation” and “testing” used interchangeably, as both refer to data not used to fit the model, non-training data is in fact required for two mutually exclusive purposes. As Kuhn and Johnson warn, while the parameters are fit using training data, tuning of hyperparameters, which is done before fitting the model using non-training data (to avoid data snooping) is also a part of the learning process [
]. Thus, the learning process is really two nested loops: the outer loop explores the space of possible hyperparameters, and for each set of hyperparameters, the inner loop fits the model parameters on the training data and evaluates model quality on some non-training data. To avoid data snooping, this non-training data cannot be used to evaluate the quality of the final model, once the hyperparameters have been chosen and the parameters fit. Thus, the aforementioned non-training data is typically called validation data, and another subset, typically called the test set, of non-training data must be kept in reserve, not to be touched until we need to evaluate the quality of the final model fit.
Avoiding data snooping therefore requires us to either forgo hyperparameter tuning altogether, which is a myopic way to do ML, or keep strict separation of training, test and validation datasets. For more on hyperparameters, we refer the reader to the Supplementary Online Material.
3.10 Limitations
The challenge of data aggregation is compounded by the fact that combining data from multiple studies may introduce nuisance covariates that may (at least partially) offset the benefits of more representative samples. As an example, consider two studies of rTMS in stroke patients, one of which is conducted at subacute stage in country A, and the other of which is conducted in chronic stage in country B. Then longitudinal changes in responsiveness may reflect genuine recovery or simply differences in the healthcare systems of countries A and B. Other nuisance covariates may include whether one is considering immediate or long-term effects of NIBS, or which traditional therapies the NIBS served as an adjunct to. Such covariates must be taken seriously in any attempt to aggregate multiple data sets.
3.11 Summary
In summary, ML has great potential to leverage large amounts of data to support medical practitioners by making predictions of patients’ responses to treatments. However, it is important not to get swept up in the hype surrounding ML, and to realize that if used without sufficient care, it is likely to lead researchers into serious error by proposing models whose apparent predictive power is little more than a mirage. As proven in landmark work of Ioannidis [
], a significant part of all published research is false or not reproduced, and we believe that as ML makes its way into ever more fields of inquiry, the proportion of published results that is not reproducible or false is likely to increase unless great care is taken. It is our hope that neuroscience researchers intending to use ML for predicting will always keep in mind both its pitfalls, and some ways to avoid them.
To avoid overfitting, simple models should be chosen to begin with, and further simplify the ones they chose via regularization. To avoid data snooping, one should be meticulous about differentiating validation and testing datasets. Finally, to avoid sample bias, one should make efforts to gather sufficiently large and unbiased datasets; if such is not possible, one should pool the data with data gathered by others; and if such is not possible, one should avoid ML altogether.
4. Conclusions
The development of predictive models for responsiveness of stroke survivors to NIBS (or any other interventional strategy) has the potential to guide personalized application protocols in the future. This could be key to reduce the heterogeneity of outcomes and maximize the individual treatment response associated with conventional NIBS protocols. However, further testing is needed to demonstrate the benefits of this novel approach.
It is important to be cautious not to make premature and unsubstantiated claims on prediction. As discussed above, it is common for detected in-sample associations to fail to generalize due to model overfitting. The most important contributor to overfitting is sampling bias, which is unavoidable when one is restricted to limited numbers of patients in one study site. Unfortunately, due to the high logistical and financial costs associated with most clinical interventional neurotechnology-associated trials, it is challenging to aggregate a sufficiently large data set at a single research unit. Data sharing across units could overcome this hurdle [
]. Contrary to common worries that data sharing entails risks to proper attribution of credit and funding, researchers who practice open science benefit from clear documentation, enhanced preservation, data curation, reproducibility, transparency and more citations of their research [
]. If successful, we believe open science initiatives would provide a solid framework for accomplishing the task of creating robust predictive models for NIBS responsiveness.
Declarations of interest
None.
CRediT author statement
Maximilian J. Wessel: Conceptualization, Investigation, Writing - Original Draft, Writing - Review & Editing, Project administration, Funding acquisition. Philip Egger: Conceptualization, Investigation, Writing - Original Draft, Writing - Review & Editing, Visualization. Friedhelm C. Hummel: Conceptualization, Writing - Review & Editing, Supervision, Funding acquisition.
Acknowledgements
The research was partially funded by the Strategic Focus Area “Personalized Health and Related Technologies ( PHRT, #2017–205 )” of the ETH Domain (CH) to F.C.H., the Novartis Research Foundation—FreeNovation , (Basel, CH) to M.J.W., the Wyss Center for Bio and Neuroengineering (ecoss WCP024; Genève, Switzerland) to F.C.H., the Defitech Foundation (Morges, CH) to F.C.H. and the Bertarelli Foundation—Catalyst program (Gstaad, CH) to F.C.H.
Other conventional low-intensity transcranial electric stimulation techniques: transcranial alternating current stimulation (tACS), transcranial random noise stimulation (tRNS), transcranial pulsed current stimulation (tPCS)
Stimulation parameters: intensity [% of MSO], pulse-shape, current-direction, coil-type, frequency [Hz], number of pulses, train-duration [s], inter-train-interval [s], burst-configuration
Safety and recommendations for TMS use in healthy subjects and patient populations, with updates on training, ethical and regulatory issues: expert Guidelines.
Modulation via mechanical interaction of ultrasound waves with neuronal membranes through mechanosensitive voltage-gated ion channels or neurotransmitter receptors
Panel 2: predictive modeling concepts and terminology
Machine learning (ML): the process of teaching computers to perform tasks by creating useful models of the world.
Parameter: Component of an ML model which is learned from data.
Hyperparameter: Component of an ML model which is not learned from data.
Supervised ML: The task of finding a relationship between one or more predictor variables X and one or more response variables Y.
Unsupervised ML: The task of finding patterns within a dataset X, without making predictions that can be checked against (or “supervised by”) response variables.
Data mining: The process of combing through datasets to extract insights. Synonymous with analytics, explanatory data analysis or associational study. Strong claims should be avoided and an open mind kept as insights may not generalize out of sample.
Predictive modeling: The process of creating models for the purpose of making individualized predictions. Claims of predictive power should be avoided when sample sizes are small and/or derived from a single site.
Overfitting: The phenomenon whereby a model learns more from noise in the data than from the population distribution. A particular threat when 1) datasets are small and/or biased, 2) datasets contain many predictor variables, or 3) excessively complex ML models are used.
Cross-validation: The process of partitionin2yg the dataset into several slices, or folds, and repeatedly holding one-fold out from training, in order to test on the held-out fold.
Leave-one-out cross-validation: The form of cross-validation in which each fold consists of a single data point. Not recommended unless dealing with small datasets.
Data snooping: The use of training data in any way to test a model, or analogously using the same data to develop and test a hypothesis. A particular threat when creating data preprocessing pipelines, it will make ML models appear better than they are.
Modulation of training by single-session transcranial direct current stimulation to the intact motor cortex enhances motor skill acquisition of the paretic hand.
Combined transcranial direct current stimulation and robot-assisted arm training in subacute stroke patients: an exploratory, randomized multicenter trial.
Transcranial direct current stimulation (tDCS) for improving activities of daily living, and physical and cognitive functioning, in people after stroke.
Rethinking stimulation of the brain in stroke rehabilitation: why higher motor areas might be better alternatives for patients with greater impairments.
Stimulation targeting higher motor areas in stroke rehabilitation: a proof-of-concept, randomized, double-blinded placebo-controlled study of effectiveness and underlying mechanisms.
Effects of low-frequency repetitive transcranial magnetic stimulation of the contralesional primary motor cortex on movement kinematics and neural activity in subcortical stroke.
Differential effects of high-frequency repetitive transcranial magnetic stimulation over ipsilesional primary motor cortex in cortical and subcortical middle cerebral artery stroke.
Prediction of motor recovery in the upper extremity for repetitive transcranial magnetic stimulation and occupational therapy goal setting in patients with chronic stroke: a retrospective analysis of prospectively collected data.
Safety and recommendations for TMS use in healthy subjects and patient populations, with updates on training, ethical and regulatory issues: expert Guidelines.
Effect of baseline brain activity on response to low-frequency rTMS/intensive occupational therapy in poststroke patients with upper limb hemiparesis: a near-infrared spectroscopy study.
Baseline motor impairment predicts transcranial direct current stimulation combined with physical therapy-induced improvement in individuals with chronic stroke.