A common gene drive language eases regulatory process and eco-evolutionary extensions

Verma, Prateek; Reeves, R. Guy; Gokhale, Chaitanya S.

doi:10.1186/s12862-021-01881-y

Research article
Open access
Published: 09 August 2021

A common gene drive language eases regulatory process and eco-evolutionary extensions

BMC Ecology and Evolution volume 21, Article number: 156 (2021) Cite this article

2279 Accesses
6 Citations
Metrics details

A Correction to this article was published on 25 October 2021

This article has been updated

Abstract

Background

Synthetic gene drive technologies aim to spread transgenic constructs into wild populations even when they impose organismal fitness disadvantages. The extraordinary diversity of plausible drive mechanisms and the range of selective parameters they may encounter makes it very difficult to convey their relative predicted properties, particularly where multiple approaches are combined. The sheer number of published manuscripts in this field, experimental and theoretical, the numerous techniques resulting in an explosion in the gene drive vocabulary hinder the regulators’ point of view. We address this concern by defining a simplified parameter based language of synthetic drives.

Results

Employing the classical population dynamics approach, we show that different drive construct (replacement) mechanisms can be condensed and evaluated on an equal footing even where they incorporate multiple replacement drives approaches. Using a common language, it is then possible to compare various model properties, a task desired by regulators and policymakers. The generalization allows us to extend the study of the invasion dynamics of replacement drives analytically and, in a spatial setting, the resilience of the released drive constructs. The derived framework is available as a standalone tool.

Conclusion

Besides comparing available drive constructs, our tool is also useful for educational purpose. Users can also explore the evolutionary dynamics of future hypothetical combination drive scenarios. Thus, our results appraise the properties and robustness of drives and provide an intuitive and objective way for risk assessment, informing policies, and enhancing public engagement with proposed and future gene drive approaches.

Background

Gene drive techniques increase the frequency of a synthetic genetic element in populations in a manner only partially determined by its impact on organismal fitness (and stochastic events). Swift progress in molecular biology allows us to design complicated drive systems which may be substantially more efficient in the properties of interest than their natural counterparts. The need for theoretical sandboxing of such technology with planetary consequences is imperative before field deployment. It is also critically important to provide the stakeholders of such a technology sufficient understanding to evaluate the basis of crucial projected outcomes. However, the number of publications on theoretical and experimental synthetic gene drive systems is overwhelming and ever-increasing. Generally, the description of properties of each of the sequentially proposed synthetic drive approaches uses bespoke modelling frameworks [1,2,3]. The ability to quickly compare the relative sensitivity of the fundamental properties of different drive scenarios to parameter changes is currently limited. Regulators, policymakers, and non-experts alike desire such a tool to discuss the applicability of synthetic drive constructs. Consequently, we suggest a common language and demonstrate its applicability in a simplified framework.

The natural Segregation Distorter (SD) locus in Drosophila melanogaster imposes an enormous organismal fitness cost, in that it is homozygous lethal (and only viable as heterozygotes) [4,5,6]. Natural selection, therefore, at the organismal level, would act to eliminate the SD allele. However, because of its capacity to bias the production of SD functional sperm in +/SD heterozygotes, the allele has rapidly increased to an equilibrium frequency of 1–5% in most natural populations worldwide [7,8,9]. This natural system illustrates how gene drive elements can increase in frequency despite a substantial cost to (overall) organismal fitness. Developing analogous synthetic drive elements to push linked genes into wild populations in a self-perpetuating manner is an old aspiration [10]. Based on the intended use, the synthetic gene drive system can be categorized into two types: replacement drives (also modification drives) and suppression drives. Suppression drives aim to reduce or completely eradicate the target populations upon release and remain the focus of many regulatory considerations [11, 12]. A replacement drive works by incorporating or substituting a target gene with the desired gene into the population. Replacement drives have broad applicability ranging from the control of disease vectors by rendering them harmless [13,14,15] to making resistant pests sensitive to insecticides [16] and control of invasive species in agriculture [17, 18]. For instance, Gantz et al. 2015 have provided evidence of a CRISPR based gene-drive system that can spread antimalarial genes into a target vector population of Anopheles stephensi and renders them resistant to the human malaria parasite Plasmodium falciparum [15]. In agriculture, Buchman et al. 2018 have reported the construction of a synthetic Medea gene drive for a significant crop pest Drosophila suzukii rendering them harmless to the target crops [18].

Like SD, it does not necessarily follow that any synthetic drive element will increase in frequency to the extent that it displaces all wildtype alleles initially present in the wild population. This fixation property is dependent on various drive parameters of the developed system. Other such properties of interest are the speed of action, reversibility, and potential to be spatially confined to only target populations. The sensitivity of such fundamental properties of drive systems to drive parameters has been a topic of interest of numerous recent theoretical studies [19,20,21,22,23,24,25,26,27,28]. We have collated this material in the provided database.

We constructed a representative literature database on synthetic gene drive system to be cognizant of the current trends in this rapidly growing field of research GitHub. The database consists of 75 publications from the year 1995 to 2021. The literature is sorted based on gene drive type (replacement or suppression), the model system under study, theoretical methodology, consideration of breakdown of drive, the possibility of gene drive reversibility and public accessibility of the literature. From the analysis of the literature database, we found that the number of studies on replacement drives is 35 [15, 29] and suppression drives are 37 [30,31,32], with twelve publications considering both approaches. The majority of research studies (41 out of 62 total) have considered resistance evolution in synthetic gene drive systems [3, 33]. Analytical methodologies mainly employed deterministic and stochastic models with a few including spatial features [19, 22, 25, 34,35,36,37]. The model organisms in gene drive studies have been chiefly mosquitoes (total 25) [15, 30, 38], fruit flies (total 13) [18, 39, 40], rodents (total 3) [41, 42] and 20 generic studies with no particular species in mind.

Most of the studies, we observe, use new terminologies for the bespoke drive mechanisms developed therein. It is partially because the molecular mechanisms of different gene drives can be very different and complex. While excellent molecular biology tools are used in ingenious ways to develop new synthetic drives, they still act on population genetics’s fundamental forces. The new jargon sometimes is unnecessarily confusing for policymakers and regulators in charge of deciding about such techniques’ future applicability. Since an exact comparison between the techniques is not possible, a new and useful technique might get lost in the plethora of synthetic drive projects.

The linguistic challenges in discussing the unavoidably complex gene drive topic lead to at least two types of problems. First is the inconsistent application of terminology; this can add to the confusion, and in some instance, can actively contribute to misunderstandings. For example, the use of different terms to describe the same thing, e.g., the term “quorum” in daisy quorum drive [43] appears to be substantially or entirely synonymous with the much earlier and commonly used term underdominace [1]. Other examples include the equivalence between modification drive, replacement drive and population replacement or between driving-Y and non-autosomal X-shredder. Furthermore, some terms are used inaccurately as being synonymous when they are not; an example is that gene drive is frequently equated with “non-Mendelian” or “super-Mendelian” inheritance. However, this is not necessarily the case, e.g., for gene drives like Medea or underdominace where zygotes’ genotypes can be entirely Mendelian. In this light, it is notable in the recent attempt to standardise the definition of gene drive [44], where gene drive is stated to work by “reducing the fitness of alternative genotypes without directly distorting Mendelian inheritance”.

The second source of linguistic challenges is inconsistent parametrisation and their description. This disparity almost inevitably results from the bespoke modelling approaches currently employed to describe each drive approach. To reduce, but certainly not eliminate, the above challenges, we propose terminology that permits a common parameterised synthetic drives language (using a relatively small number of standard parameters rooted in standard population genetics). This standardisation enables non-experts to precisely and quantitatively discuss a wide range of replacement drive scenarios in a mutually comprehensible manner. Furthermore, it also greatly facilitates the intuitive description of drive systems that combine multiple distinct drive approaches. This feature could be of increasing value in the future as drives get complicated. As one of the most advanced assembled drive systems, with the acronym SDGD, is a combination of two distinct drive approaches (SDGD is a suppression drive and as such is not considered in this study, which focuses exclusively on replacement drives) [45]. Furthermore, a modelling study focused on SDGD speculates on adding a third component to this drive system to carry an anti-parasite gene [46]. Other modelling studies have focused on combined drives to enhance desirable gene drive properties [43, 47,48,49,50,51,52,53,54,55,56], a trend likely to accelerate in the future. Such developments considerably increase the complexity of discussing gene drives compared to single drive systems’ already existing substantial complexity.

Analyzing the select literature, we have distilled the primary components of synthetic gene drive models. From the principles of standard population genetics, we incorporate the processes that subvert the generally dominant role that organismal fitness plays in how natural selection can impact the frequencies of alleles within natural populations. Precise accounting of a generic diploid organism’s lifecycle through the various stages of development, from an adult, forming gametes to zygote and then back to an adult, is done. We discuss how the drive can act at any one or all of these stages. We then proceed to combine the knowledge into a single population dynamic model. Our model is backwards compatible, as demonstrated by the recovery of specific gene drives discussed in previous theoretical and experimental studies. Furthermore, the explicit use of standard terminology also allows us to extend the same basic model to complicated scenarios such as multilocus and multiallelic drive mechanisms (in section Backward compatibility). We deploy the developed succinct theoretical model for a single locus system in a user-friendly tool DrMxR—Drive Mixer (ShinyApps).

While our basic model is available as a standalone tool, we provide results also extending to an ecological and spatial dimension. A mechanism for localizing the gene drive to a target population is the imposition of a suitably high invasion threshold. We determine the extended conditions required for the invasiveness of drive. We also evaluate the impact of spatial structure on the condition of invasion (from rare) and fixation of the drive for a single population.

When considering multiple proposed drive systems for release, regulators find it essential that a comparison between the systems is possible. The current state of the art does not allow this easily. We thus show that a single theoretical approach, when minimally extended, provides specific cases of different drive systems. This exercise provides us with a common vocabulary across different drive systems. Furthermore, we provide DrMxR to test specific cases of proposed drive mechanisms that will be useful in risk assessment and regulation of gene drives. With applicability in mind, DrMxR is specially targeted towards policymakers, the general public, and even experts for quick hypothesis testing. To this end, we begin by detailing the process of theory development in the following section.

Results

For developing this model, we have assumed an obligate sexually reproducing organism, a likely necessity for successful gene drive where organismal fitness is negatively impacted. We split the life cycle of an organism into three tractable stages; the minimal abstraction required to recover the established results in the field of engineered gene drive systems. Further complications can indeed be added depending on the details of the case study in focus.

Figure 1 shows the lifecycle of an individual in our model. We focus on two allelic types—wildtype (W) and the driven gene (D). Thus we have adults of three genotypes, wildtype homozygotes WW, heterozygotes WD and drive homozygotes DD. Adults are chosen from the population pool for reproduction. Adults produce gametes that combine to form zygotes. The zygotes grow up to become adults, and the cycle continues. We allow for overlapping generations, a realistic assumption for numerous target species such as mosquitoes, drosophila or rodents [18, 38, 57]. We assume that the alleles during gamete formation are segregated independently according to Mendel’s inheritance laws. Hence the total number of alleles in the absence of any evolutionary processes remain conserved over successive generations. Therefore, frequencies of genotypes reach Hardy-Weinberg equilibrium in the limit of an infinite population, random mating, and no selection.

The essential feature of a gene drive is biasing the chance of inheritance of the desired gene in the population [58]. The expected outcome, however, is that the population composition is modifiable in a controlled fashion. Interventions along the lifecycles can accomplish the change via distortion, viability and fertility selection. These processes act at different stages of an individual’s life cycle. Distortion acts at the gamete level and biases the transmission of the drive allele in the heterozygote. Gametes combine to form zygotes, but some are non-viable and die. Fertility selection acts at the adult stage when individuals are chosen to reproduce with probability proportional to their fitness. Distortion, viability selection and fertility selection, thus, together or even independently, can drive the population away from the Hardy-Weinberg equilibrium. Synthetic gene drive techniques allow us to engineer such selection pressures.

Viability selection

Viability selection acts during the zygote phase of an individual’s lifecycle. The viability fitnesses represent the inherent variation in the fitnesses of the three genotype, WW, WD and DD. The fitness can also capture the payload costs of the drive allele. Viability fitness is defined here as the probability of survival of the zygotes up-to-the adult stage. $\omega$ and $\nu$ denotes the genotypic viabilities of WD and DD, respectively. The above parameters have been normalized to the viability of WW fixed at 1.

Well described synthetic drive systems that work principally by manipulating viability selection parameters include those using zygotic toxin-antidotes. In these systems, a proportion of zygotes of specific genotypes may become non-viable. Medea (Maternal effect dominant embryonic arrest) is an example of a naturally occurring toxin-antidote gene drive found in flour beetles [59, 60]. In Medea drive, wildtype homozygous offspring of heterozygous mothers are non-viable. Population dynamics of Medea drives have been studied in [2, 47]. A synthetically engineered Medea drive first demonstrated in Drosophila [61] has been extensively studied [18, 62]. Similarly, a synthetic viability selection based underdominant population transformation system was developed for Drosophila melanogaster in [63]. Figure 2A shows the population dynamics of Medea drive and deviation from Hardy-Weinberg equilibrium parabola.

The result can be recapitulated by readers using DrMxR (ShinyApps) where Medea and other related synthetic drive systems can be seamlessly modelled including inverse-Medea [64], or Semele [65]. The drive efficiencies of Medea, Inverse Medea and Semele drive is represented by parameters $d_{m}$, $d_{im}$ and $d_{s}$ respectively. We recover a subset of key results of the population dynamics from earlier publications in the backward compatibility section. The framework used by DrMxR is general and applicable to other single construct gene drive systems either entirely or partially based on viability selection.

Fertility selection

Specific genotypes may experience fitness advantages because of preference for traits during mating or because some genotypic pairings are more fertile than others. Both of these fitness components are modelled using the fertility selection parameters. The fact that both mating success and fecundity are considered jointly dictates that the fertility selection arrow on Fig. 1 traverses three life stages, rather than the two indicated for the other types of selection. The fertility fitness component arising from mating success is included in the parameter $f_{WW}$, $f_{WD}$ and $f_{DD}$ for the three genotypes. Fertility selection is an evolutionary phenomenon that drives the population away from the Hardy-Weinberg equilibrium. Our model did not differentiate between sexes, but it is possible to include this complexity [66].

Previous work [67, 68] captures the rich dynamics that ensue when fertility selection is considered. The population dynamics of a two allele system for different fertilities and sex-dependent viabilities have been extensively studied in [66]. The authors have also accounted for non-random mating between the mating pairs by introducing additional parameters [66]. We have accounted for variable fertility rates by introducing suitable parameters in the context of the gene drive system (as shown in Fig. 2B).

Distortion

Gametic distortion alters the transmission of drive alleles in heterozygotes, so they substantially exceed the Mendelian expectation of 50% and is controlled by the single parameter p in our model. Biologically such distortion happens in natural meiotic drives where meiosis is subverted due to intra-genomic conflict [69,70,71]. Examples of naturally occurring gene drive elements based on distortion are segregation distorter and t-haplotype in heterozygous fruit flies and mice, respectively [39, 72]. These drive elements bias their transmission during spermatogenesis by killing sperm carrying non-driving alleles (W). Though the killing of non-carrier sperm also has the potential to reduce fertility [71, 73], ‘distortion’ can be conceived as an independent evolutionary force responsible for biased transmission of drive allele. The synthetic homing drive also distorts the transmission of alleles in heterozygotes. To keep the model tractable, both analytically and in terms of user comprehension, DrMxR does not currently consider sex-ratio gene drives (Y-driving, X-Shredder) [74, 75]. Figure 2C shows the effect of distortion on the population dynamics of the three genotypes: WW, WD, DD. Previously published evolutionary dynamics of a homing drive using CRISPR are recovered in the Backward compatibility section.

All the above methods of biasing the inheritance pattern of a gene are recovered employing our generic model. In Methods, we first derive the mathematical formulations of the processes independently and then combine them in a single dynamical model system. To demonstrate the generality of our approach, we recover the results of [33, 47, 64, 65] as special cases of our model formulation. Ecologically it is vital to characterize the spread of a genetic construct. We do this in panmictic as well as spatially constrained populations. We provide an analytical form for calculating the refraction zone (the safe amount of drive heterozygotes and homozygotes from which the wildtypes can recover). For spatially constrained systems, we show the exact form in which the probability of invasion and fixation of a drive element depends on the network’s connectivity.

Combined dynamics

The three factors viz. distortion, viability and fertility selection can act during the three stages of an organism’s lifecycle. Figure 2 illustrates the specific impacts of these forces on the population dynamics by varying parameters using our application DrMxR. The equilibrium dynamic changes in different ways relative to the Hardy-Weinberg equilibrium line in Fig. 2. Besides individual impact, our application allows intuitive exploration of scenarios when more than one of these three evolutionary forces acts in combination. Realistically, such scenarios arise when a drive element impacts simultaneously both distortion and fertility selection [71, 73]. In the Drosophila segregation distorter, selective killing of sperm carrying a wildtype allele in heterozygous males biases the transmission of drive allele and potentially reduces the males’ fertility. Homing endonuclease gene drives based on CRISPR/Cas9 have been mathematically modelled to bias transmission and also to reduce the fertility of the genotype carrying payload gene [33].

Our approach recovers the result of [33] showing the combined effects of distortion and fertility selection on population dynamics. Additionally, our application allows us to study various drive combinations as well. In the Methods, we recover the result of [47] and show the combined effect of fertility selection (underdominance) and viability selection (Medea gene drive). Similar explorations of the population dynamics of other drive combinations across their entire parameter range are possible in DrMxR, for example, Medea (viability selection) together with homing endonuclease (distortion) can be studied.

Ecological factors

In the context of field deployment, understanding only the population genetics of the system is not enough. The properties of gene drive constructs are diverse, depending on their molecular construction and the differential selection pressure they impose in the varied ecological situations. Conversely, the ecology of the target species itself can disrupt the intended dynamics of the driven gene. Taking the demographic parameters into account is imperative when assessing the impact of gene drive deployment. Below we derive the invasion threshold of a drive system and evaluate the impact of spatial structure on the invasion (from rare) and fixation of the drive for a single population.

Invasion threshold

The unintended spread of certain types of drive to non-target populations has been a significant concern ever since the conception of synthetic gene drives. This interest is particularly the case for replacement drives (not intended to alter the size of populations) since the negative selection costs (fertility and viability) imposed by replacement-drive constructs are generally much smaller than for suppression drives [2, 24, 25, 76]. In this context, the option of localizing the replacement gene drive to target populations has been the focus of scientists for both developing and regulating gene drive [77]. A mechanism for localizing the driven construct is the imposition of a suitably high invasion threshold. The invasion threshold is the minimum frequency of drive carrying organisms required to be released to replace the wild target population. If the invasion threshold is high, the drive is likely more spatially restricted because the invasion of the non-target populations will require a large number of introduced individuals. As high threshold drives theoretically limit their spatial spread, they also may mitigate the spread of drives into partially interfertile species (or subspecies). Accidental release of a few drive organisms may completely transform wild populations for gene drives with low or no threshold [23, 78]. A recent review of different types of gene drives based on a quantitative analysis of their invasiveness can be found in [79].

A relevant quantity of interest is the number of drive individuals required to invade a wild population successfully. Here we consider both the drive heterozygotes and homozygotes together as drive individuals. In our model, the invasion threshold can be quantified based on the direction of the flow lines in the de Finetti diagram. We define the refractory zone as the area of the flow lines towards the population consisting of all wildtypes in the de Finetti diagrams. Thus, we quantify the amount of release that a population may sustain and still revert to the wildtype by measuring the wild-type vertex’s basin of attraction.

We calculated the refractory zone by analytically computing the equation of the invariant manifold separating the flow lines through approximations. Details of the calculation are in the Methods section. The refractory zone quantifies the minimal number of drive heterozygotes and homozygotes (released or migrants), capable of transforming the wildtype population.

In the model, variation in the drive efficiency and fitness of different genotypes affects the refractory zone of a gene drive system. Using the insight provided from Fig. 1, we consider the case of distortion based gene drive along with fertility selection. Figure 3 shows the heat-map of the refractory zone with variation in distortion probability p and fertility fitness of heterozygotes $f_{WD}$. When both the drive efficiency and fitness of heterozygous are high, the distortion drive’s refractory zone is zero. Hence an accidental release of only a small frequency of drive organism would lead to complete replacement of the wild population. The gene drive system is, therefore, absolutely non-localized. Low distortion drive efficiency and fitness of heterozygotes make the drive system localized, so a significant release of drive organism is required to successfully transform the wild population [34]. For intermediate values of p and $f_{WD}$, the drive system is localized and does not require a massive release [34, 80].

Spatial organization within a population

Recent works have highlighted the need for realistic spatial modelling for more accurately predicting the outcome of gene drive release more so for suppression [19, 22, 34, 37, 46, 81, 82] than the replacement drives [35]. Most of the analytical models, including DrMxR, assume random mating between individuals of different genotypes. Nevertheless, assuming random mating may give an incorrect prediction about the invasion condition of the gene drive. In reality, individuals in the population are spatially constrained and more likely to interact with individuals living in proximity. This factor will interfere with the evolutionary dynamics of the spread of gene drives. Consequently, we have developed a framework to explore the consequences of relaxing the assumption of a well-mixed population. Here we derive the condition for a distortion based gene drive to invade a single wild population if the assumption of random mating is violated and the population is spatially structured. The details of this derivation are given in the Methods section.

The analysis (See “Method”) uses the framework of evolutionary game theory and tracks the frequencies of alleles instead of genotypes. Previous work has shown that interpreting the association of alleles in a diploid genome as a two-player game leads to some intriguing new insights into genetic evolution [83,84,85]. Also, different ways of updating a population can lead to different allele dynamics in a panmictic population [86, 87]. Population update rule defines the elementary process that changes the frequency of each type in the population; for example, in the birth-death update rule, an individual is selected first for birth proportional to its payoff from the evolutionary game. It replaces another randomly chosen individual from the population selected for death. In our case, population update occurs in allele space, so an individual unit is an allele that can be wildtype (W) of drive type (D).

Ohtsuki and Nowak [88] found that if the interaction between the players (alleles in our case) take place on a regular graph of degree k (see Fig. 4), the payoff entries of the game are transformed according to equation (13). So, as k tends to infinity, the additional transformational entries of payoff matrix will become increasingly small, and invasion of gene drive will essentially depend upon whether the drive allele is more fit than wildtype when the drive is rare, and fixation will depend on whether the same is true when wildtype is rare. Since the interaction unit in this formulation is at the allele level, the biological interpretation of k is not straightforward. Intuitively, parameter k measure the level of mixing between individuals within a population (where k tending to infinity corresponds to complete mixing, a simplifying assumption common to many models including DrMxR). Since a different mathematical formulation has been employed, results obtained in this section cannot be added to the DrMxR where dynamics can be visualised through de Finetti plots.

Figure 4 shows that the invasion and fixation outcomes within a single population vary depending on the degree of spatial mixing and distortion efficiency. Increasing network degree can move a population where the drive cannot invade or fix to a situation where the drive can fix but cannot invade from rare for lower to moderate values of p ($p = 0.65$–0.80). The fixation but no-invasion case corresponds to the introduction of the invasion threshold that can help local confinement of the gene drive. Interestingly, one can move to this regime by regulating the degree of the network. For higher values of $p>0.80$ when the drive can both invade and fix in the population, increasing the network degree can introduce an invasion threshold. A similar trend ensues in Fig. 4B, but increasing network degree may allow the drive to invade the wild population but does not allow it to get fixed in the population. This scenario corresponds to the over-dominance case, and mathematically, the dynamics correspond to a stable fixed point in the interior of the simplex. The condition for the fixation and the invasion tends towards a well-mixed population regime for higher k.

Discussion

We have developed a minimalist modelling framework and identified three forces/factors responsible for propagating gene drive in the presence of an organismal fitness cost. These forces act during different stages of the target organism’s lifecycle and relate the gene driving mechanism to the organism’s biology. Such a type of approach is arguably missing in earlier works on gene drive. For example, [33] studied the population dynamics of CRISPR gene drive without explicitly stating that the fitness they incorporated belongs to fertility selection parameters. In other models fitness costs have been introduced through viability fitness parameter [47, 64, 65]. With our approach, we can highlight that the evolutionary outcome for the two cases (drive acting through viability or fertility but leading to similar costs) differs substantially. Our work stresses the importance of both the target organism’s biology and knowing the exact phases of the lifecycle where the synthetic construct will act. The current modelling approach also provides a classification of a simple gene drive system based on the biology of how the drive is designed (out of the three primary life stages) and avoids unnecessarily new and confusing terminology.

As with different translational evolutionary biology applications, the eventual aim of several synthetic gene drive constructs is field deployment. Thus, any drive technology needs to be compared with other available techniques, not by experts of the particular system but regulators who need a broader perspective. Our work employs standard population genetics methods while keeping our model as generic and minimal as possible. The resulting model allows us to provide a birds-eye view of the dynamics over the space of different drive mechanisms. Educators and regulators would benefit from using our DrMxR for studying the population dynamics of the gene drive. Unlike SLiM, a scriptable evolutionary simulation framework not limited to drive systems [89], DrMxR is specific to drive systems and only valid for a generic species with gamete, zygote and adult life stages. On the other hand, MGDrive, an R-package focusing on testing gene drives in species with Egg–larva–pupa–adult life stages or chiefly mosquitoes [90]. In species with density-dependent larvae competition, the timing of the expression of driving endonuclease becomes very significant, i.e. before or after the density-dependent larvae competition [91]. DrMxR is currently not capable of modelling such scenario. DrMxR is also no substitute for species and geography specific gene drive models [22, 46, 82, 92]. The utility of our framework lies in easing the understand of the gene drive mechanism and how it can arise or be a by-product of distortion, viability selection and fertility selection. Though not unique, our model also distinguishes the origin of the fitness cost of the drive allele. The fitness cost can affect the fertility of the organism where the transgenic grows up to reach the adult stage, or it could also affect its viability, in which case the organism dies at the zygote stage. This distinction is crucial as it leads to different population dynamics for the same amount of fitness cost.

In our model, users can choose the driving factor and its corresponding effect on the target organism’s biology by tweaking the various parameters explored in this manuscript. Deviations from the null Hardy-Weinberg equilibrium may be studied via the effect of the three driving factors, individually or combined. It is possible to investigate conditions for invasion and fixation of the drive and its tolerance to fitness cost that is highly relevant for drive deployment (relevant code provided on ShinyApps. As case studies of our approach, we have recovered the results of various drives such as CRISPR homing endonuclease drive, Medea, single-locus engineered underdominance, Inverse Medea, and Semele in the Backward compatibility section [33, 47, 64, 65].

Empirical studies have shown that the selfish genetic elements based on transmission distortion can reduce both fertility (offspring production) [93, 94] and viability (egg to adult ratio) [95] of the target species. To estimate the evolutionary outcome, we have allowed to jointly vary the factors influencing the propagation of such gene drives. Flexibility to see the combined effect for various evolutionary factors influencing the spread of gene drive on the population dynamics is an essential feature of the DrMxR. We believe that analytical results for evaluating the refractory zone would help regulators estimate the drive’s invasiveness. Methodologically, the refractory zone calculation is a development deriving from a dialogue between evolutionary games and population genetics [85, 96].

Our results show how gene drive invasion and fixation conditions differ relative to the mixed population model. We found that for lower values of network degree, the region of phase space in Fig. 4 for invasion & fixation and no invasion or fixation increases. Hence, introducing spatial features during interaction makes the drive either highly invasive or redundant. These results might be informative for the decision-maker in developing an intuitive understanding of how gene drive dynamics differ for structured population instead of the common assumption of well-mixed. Also, our spatial model does not help to directly compare the potential of different drives to invade a new population through migration [23, 25, 47, 51, 52].

In this study, we develop a common vocabulary to model various synthetic (and natural) gene drive systems, but the mathematical model we used cannot be regarded as general. Our current model cannot address the reduction in population size and its effects on the spread of a gene drive. Therefore, DrMxR is currently only appropriate for studying gene drives that can only bring about population replacement without affecting population density. Suppression drives—intended to eradicate or reduce the target population or ‘reversal drives’—intended to reverse the genetic alteration introduced by the first gene drive [21, 26, 97, 98] are not included in the app. Some newly proposed gene drive systems that are mainly intended for suppression but can also be used for replacement, such as CleavR, TARE, TADE, double-drives and Y-linked editors, cannot be currently modelled in this study [49, 50, 54, 56, 99, 100]. Classification of such complex drive system based on our mathematical model would also be problematic since these might have very complex selection mechanisms or have simple mechanisms but whose dynamics critically depend on the genetic makeup of different populations.

Self-exhausting drives that first rapidly spread in the population and then self-exhaust after limited generations are also not included in the current version of DrMxR app. However, a simple form of the self-exhausting drive called daisy-chain drive has been shown in the backward compatibility section as an example of how the current mathematical model could be extended [52]. Numerous drive studies have now extended to multi-locus systems, further expanding the vocabulary of the dynamics. Currently, our application (DrMxR) focuses on a single locus and highlights the complexities that single-locus drives can generate. Since we root our vocabulary in processes underlying multi-locus and multi-allelic drives, our concept can be extended for multi-locus and multi-allelic drive systems such as one locus two toxin (1L2T), two locus two toxin (2L2T) and reciprocal chromosomal translocation (RCT), Killer & rescue drive and tethered homing gene drive [1, 25, 25, 28, 28, 28, 51, 52, 101, 102]. We have heuristically demonstrated the extension of mathematical modelling of such systems together with resistance evolution for CRISPR homing drive in the Backward compatibility section. However, these gene drive systems are not implemented in the DrMxR app. These extensions will also allow for the inclusion of multiple drive systems in an ecological context in the future [103].

An important aspect of risk assessment for regulators is the ability of a gene drive to invade non-target populations through migration [23, 25, 47, 51, 52, 104]. We have extended our analysis to spatial systems using game theoretical methods as per [88]. Studying density-dependent migrations between patches [47, 104] could be included to understand the spread of different drive systems. Currently, DrMxR does not model such a scenario, and it can be the probable direction of future work. Inclusion of ecological parameters such as seasonality and environmental disturbances would also be necessary when utilizing the theory to model a specific target species [22]. Inclusion of ecological factors such as density dependence, spatial organization, non-random mating and target specific mating systems is in progress. It will be a necessary litmus test in assessing any drive deployment strategies [103]. For specific species, considering detailed life history and influences in the organism’s lifecycle would be a valid extension. For example, a mosquito lifecycle consists of egg, larva, pupae and adult stages. It becomes essential to distinguish when driving endonuclease is expressed, before or after the density-dependent larvae competition [91]. Hence, adding an appropriate life cycle depending on the model organism is necessary for a more reliable prediction of gene drive spread. However, we emphasize the disparity between the theoretical developments in simple synthetic drive scenarios and the urge towards a unified understanding at the elemental level. Using a common language will allow for a comparison between different drive techniques and adaptable to complex drive systems.

Conclusion

The vast, diverse and growing literature in the field of gene drive is often challenging to follow for non-experts because of the varying terminology. This linguistic heterogeneity obscures actual novel results and prevents a clear view of the field. The diverse vocabulary also does not facilitate easy comparisons between different drive techniques. We develop a common vocabulary describing gene drive systems based on pre-existing standard population-genetic terminology (distortion, fertility selection and viability selection). Based on this common vocabulary, we present DrMxR, a tool to grasp different gene drives while considering ecological and evolutionary aspects. We demonstrate that our model can be used to recover work already presented in several studies. Besides comparing available drive constructs, our tool is also helpful to explore the evolutionary dynamics of future hypothetical combination drive scenarios. The results obtained for drives in spatially structured organisms could be informative in developing an intuitive understanding of how gene drive dynamics differ for structured population instead of the common assumption of panmictic population. We believe that our work will be useful for regulators, educators, the general public, and even experts in developing insights about the population dynamics of the proposed and future gene drive system.

Table 1 Offspring proportions when alleles are segregated randomly during meiosis

Full size table

Table 2 Effect of fertility selection, distortion and viability selection on mating rates and offspring proportions. Fertility selection changes the mating rate of genotypes at adult stage. Distortion biases the transmission of drive allele from heterozygous individual by probability p > 0.5. Each entry in offspring’s column gives the proportion of genotype produced from the mating pair in the corresponding row. Viability selection effects offspring proportions as some may become non-viable. To illustrate an example, we consider Medea gene drive where wild-type homozygous offspring of heterozygous mother are non-viable.

Full size table

Methods

We consider diploid individuals of single locus with two alleles: wildtype (W) and drive allele (D). The possible genotypes are WW, WD and DD. We start with the simplest case assuming an infinitely large population, random mating, random segregation of alleles during meiosis and no distinction between male and female genotype in terms of not tracking their distinct genotype frequency. We will relax some of these assumptions as we proceed. Considering all mating pairs in Table 1, the rate of production of each genotype can be written as:

$$\begin{aligned} F_{{WW}} & = x_{{WW}}^{2} + x_{{WW}} x_{{WD}} + \frac{{x_{{WD}}^{2} }}{4} \\ F_{{WD}} & = x_{{WW}} x_{{WD}} + x_{{WD}} x_{{DD}} + 2x_{{WW}} x_{{DD}} + \frac{{x_{{WD}}^{2} }}{2} \\ F_{{DD}} & = x_{{DD}}^{2} + x_{{WD}} x_{{DD}} + \frac{{x_{{WD}}^{2} }}{4}, \\ \end{aligned}$$

(1)

where $x_{\alpha }$ and $F_{\alpha }$ are the frequency and rate of genotype production respectively and $\alpha \in$ (WW, WD, DD). The population dynamics of the genotypes in continuous time is governed by the following set of differential equation:

$$\begin{aligned} {\dot{x}}_{\alpha } = F_{\alpha } - x_{\alpha } {\bar{F}}. \end{aligned}$$

(2)

Here, ${\bar{F}}$ is the average fitness of the three genotype:

$$\begin{aligned} {\bar{F}} = \sum _{\alpha }F_{\alpha }. \end{aligned}$$

(3)

The total population remains constant hence the frequencies of all genotypes sum to unity.

$$\begin{aligned} x_{WW} + x_{WD} + x_{DD} = 1. \end{aligned}$$

(4)

Constraints on frequencies allows us to represent the dynamics of (2) in a de Finetti diagram. We will now derive the population dynamics equations when the three factors, namely viability selection, fertility selection and distortion, are added to the system one by one.

Table 3 Offspring proportions for Inverse Medea and Semele gene drive

Full size table

Table 4 Offspring proportions for CRISPR based homing gene drive with resistance

Full size table