Metabolic engineering of Pichia pastoris

1. Introduction

The yeast commonly known as Pichia pastoris has first been described as Zygosaccharomyces pastori by the French mycologist and cytologist Alexandre Guilliermond (Guilliermond, 1920). In the 1950s, Herman Phaff isolated several related strains from oak trees in California, and renamed the species as Pichia pastoris (Phaff et al., 1956). Such as only few other yeasts, it shows a distinct ability to grow on methanol as the sole carbon and energy source, which is based on several highly overexpressed genes encoding the enzymes of the methanol assimilation and dissimilation pathways (Wegner and Harder, 1987). In the 1980s the feature to highly express genes such as AOX1 (encoding for methanol oxidase, the first step of methanol utilization) only when methanol is present, was utilized to develop a strong and methanol inducible expression system (Cregg et al., 1985). In 1995 P. pastoris was re-classified into the newly established genus Komagataella (Yamada et al., 1995), which was later split into several species (Kurtzman, 2005), so that former P. pastoris strains now split up into the two species K. pastoris and K. phaffii. Today six species of the genus Komagataella are described (Naumov et al., 2013). Thus, the widely accepted name Pichia pastoris stands for at least two different species. To avoid confusion and to include all strains used in biotechnology we use the established name P. pastoris here as a synonym for all Komagataella strains employed in biotechnological applications.

At a time when the genome of Saccharomyces cerevisiae was sequenced (Goffeau et al., 1996), laying foundations for metabolic engineering in this yeast (Nielsen, 1998), nearly nothing was known of genomics, metabolism (except for methanol metabolism) and cell biology of P. pastoris. It took until 2009 when two genome sequences were published (De Schutter et al., 2009, Mattanovich et al., 2009), followed by further (re)-sequencing and re-annotation of several strains: both the K. pastoris type strain CBS704/DSMZ70382 and K. phaffiiCBS7435 and its commercial variant GS115 (Kuberl et al., 2011, Love et al., 2016, Sturmberger et al., 2016, Valli et al., 2016). Thoroughly annotated genome sequences are available at www.pichiagenome.org and have formed the basis for detailed systems biology analyses of P. pastoris since then (Zahrl et al., 2017).

P. pastoris is a methylotrophic yeast, capable to oxidize methanol for energy production, and to assimilate it as sole carbon source for growth and product formation. After oxidation of methanol to formaldehyde, assimilation begins with the reaction of dihydroxyacetone synthase, a specialized transketolase, transferring formaldehyde to xylulose-5-phosphate (Xu5P), resulting in glyceraldehyde-3-phosphate and dihydroxyacetone. Xu5P is then regenerated via a cyclic pathway (called the Xu5P cycle) involving reactions of the pentose phosphate pathway. Russmayer et al. (2015a) showed recently that the entire Xu5P cycle is localized in peroxisomes, utilizing a set of specialized enzymes encoded by duplicated genes that acquired transcriptional regulation by methanol. This finding sheds light on the potential benefit of compartmentalization of an entire specialized pathway and may be utilized for metabolic pathway engineering in the future.

Besides methanol, glucose and glycerol are frequently used as carbon sources. Glycerol is utilized quite fast (qSmax ca. 0.37 h−1 in mineral media) with a high biomass yield making it a useful substrate to accumulate biomass, recombinant proteins and potentially metabolites. The presence of four putative H+/glycerol symporters encoded in the genome is an explanation for the efficient utilization of glycerol (Mattanovich et al., 2009).

Glucose uptake is limited in P. pastoris compared to S. cerevisiae, with a nearly tenfold difference in maximum specific glucose uptake rates (qSmax ≈ 0.35 h−1and 2.88 h−1, respectively). Under fully aerobic conditions the glycolytic flux does not exceed the respiratory capacity, so that nearly no fermentative by-products are formed (Hagman et al., 2014). P. pastoris is therefore classified as a canonical Crabtree-negative yeast. This limitation of glucose uptake in comparison to S. cerevisiae has been ascribed to the low number of hexosetransporter genes and a potentially more rigid regulatory network of carbon metabolism. The lower glycolytic flux comes with a higher fraction of pentose phosphate pathway (PPP) flux with a split ratio of 40–50% PPP flux (Baumann et al., 2010, Nocon et al., 2016). This provides much more NADPH regeneration per carbon compared to S. cerevisiae, which has a split ratio in glucose surplus conditions in the range of 4–17% (Gombert et al., 2001, Maaheimo et al., 2001, Velagapudi et al., 2007). Apart from these fundamental differences in glucose utilization it was shown recently that P. pastoris reduced its maintenance demand threefold under extreme calorie restriction (Rebnegger et al., 2016) different to S. cerevisiae where no change was observed (Vos et al., 2016).

This review is intended as a comprehensive summary of the development and current state of the art of metabolic engineering in P. pastoris and the underlying specific techniques to analyze and engineer its metabolism. For comprehensive reviews on cell engineering methodology and process design, the reader is referred to reviews by Schwarzhans et al. (2017a) and Yang and Zhang (2018). The first one focusses on metabolic and cell engineering, systems biology and an extensive discussion of clonal variability. The latter addresses most specifically bioreactor cultivation design as a tool for optimization of protein production. More specific reviews on P. pastoris engineering focused on glycoengineering (Laukens et al., 2015), mathematical modeling (Theron et al., 2018), systems biotechnology (Zahrl et al., 2017) and CRISPR-Cas technologies in different yeasts (Raschmanová et al., 2018).

Based on conclusions from past achievements, novel developments will be discussed here in the light of recent synthetic biology progress, drawing conclusions for the upcoming potential of this yeast for modern metabolic engineering.

2. Methods of metabolic engineering of P. pastoris

Until recently, specific methods to engineer and analyze the metabolism of P. pastoris have been quite underdeveloped, allowing only single solutions to specific questions rather than targeted construction of chassis platforms for advanced metabolic engineering. Since the publication of the first genome sequences in 2009 this development has gained great momentum. This section summarizes the current state of the art of methods to quantify intracellular metabolites and metabolic fluxes, to model and predict metabolic states, and to engineer cellular metabolism of P. pastoris for the production of new biomolecules or the enhanced productivity and quality of heterologous proteins. Based on the impressive methodological developments, enabling advanced Design-Build-Test cycles for metabolic engineering, P. pastoris holds a position at the forefront of synthetic biology.

2.1. Quantification of cellular metabolism

2.1.1. Metabolomics

The set of metabolites synthesized and taken up by an organism constitute its metabolome (Fiehn, 2002). “The metabolome can be defined as the complete complement of all small molecule (< 1500 Da) metabolites found in a specific cell, organ or organism” (Wishart, 2007). This definition by Wishart includes all small metabolites but excludes polymers like DNA, RNA, proteins, polymeric carbohydrates and others. Importantly, also metabolites detected and not necessarily synthesized by the organism are included in this definition. Metabolomics is the technology used to measure and describe qualitatively and quantitatively the metabolic state of a cell. It directly provides information about alterations in the cellular metabolism and can reveal potential targets for metabolic engineering. However, compared to other omics technologies like transcriptomics or genomics, metabolomics has the challenge that the chemical diversity of metabolites is substantially larger. For genomics, transcriptomics and to a lesser extent for proteomics, the analytes (DNA, RNA, proteins) are chemically homogenous, which facilitates the establishment of a general methodology to quantify these entities.

For P. pastoris, a first benchmark study providing quantitative metabolite data was published 2012 by Carnicer et al. (2012a). In this work more than 30 metabolites including amino acids, organic acids and some sugar phosphateswere quantified after controlled cultivation in a chemostat, using an optimized quenching protocol. Importantly, an interspecies comparison between S. cerevisiae and P. pastoris was performed, highlighting some differences between the two species. While the metabolite concentrations of upper glycolysis and the tricarboxylic acid (TCA) cycle were comparable, the concentrations of metabolites of the lower part of glycolysis (2-phosphoglycerate/3-phosphoglycerate and phosphoenolpyruvate pools) were lower in P. pastoris. Interestingly, also the pool of trehalose-6-phosphate was significantly larger in S. cerevisiae compared to P. pastoris (Carnicer et al., 2012a).

An important part in metabolomics is dealing with sample preparation techniques due to two main reasons: first, as already mentioned, chemically heterogeneous compounds need to be analyzed, and second, the cellular metabolism needs to be arrested rapidly to measure metabolites with a high turnover rate. In order to stop cellular metabolism, cells are quenched and consecutively washed to remove extracellular metabolites. During this procedure it is important that cells are not retained for a long period of time in the pure quenching solution as this leads to substantial leakage of metabolites to the quenching solution. For P. pastoris cells, a rapid filtration and washing step on a filter is best suited to remove extracellular metabolites and reduce unwanted leakage effects (Russmayer et al., 2015b). The initial quenching procedure established by (Carnicer et al., 2012a) for P. pastoris using 60% (v/v) methanol was further optimized by buffering the quenching solution to reduce the leakage of metabolites during the quenching process. In order to provide a framework for further buffer optimizations the parameters influencing the leakage of a certain metabolite were investigated. Hereby, it was found that size and charge-related properties of a metabolite play a major role in controlling metabolite loss, which are: molecular weight, the van-der-Waals volume, the charge-weighted positive surface area, charge-weighted negative surface area and the total polar surface area. Depending on the main research question and the metabolites of interest, these influence factors can help to decide which buffer systems are best suited. The buffer system that has shown so far the best properties towards reducing leakage losses is Tris, buffered (0.125 M) to a pH of 8.2, with a NaCl concentration of 0.055 M and a methanol concentration of 60% (v/v) (Mattanovich et al., 2017). To gain metabolites out of the quenched biomass for their analysis, the extraction method of choice employs boiling ethanol (Carnicer et al., 2012a). However, in another early study a direct comparison of boiling ethanol and repeated freeze-thaw with methanol plus sonication showed that also the freeze-thaw extraction method can be as efficient as the boiling ethanol procedure (Tredwell et al., 2011).

In order to accurately quantify the absolute concentrations of metabolites, internal standards are used which are either labeled with 13C or 15N. 13C labeled standards are preferred for the obvious reason that carbon is present in most cellular metabolites. As for many metabolites such labeled standards are commercially not available, it is common practice to utilize a uniformly 13C- labeled cell extract (Lu et al., 2017). The preparation of such a standard for P. pastoris and its application during the quenching and extraction procedure was described by Neubauer et al. (2012).

Metabolomic datasets can be used for different research applications. One aspect is to analyze the metabolite concentrations in strains producing recombinant proteins compared to control strains (Jorda et al., 2014). For example, analysis on protein producing strains supplemented with all amino acids showed that costly amino acids were preferentially taken up (Heyland et al., 2011), suggesting a regulatory logic applied under the limited energy and redox availability caused by recombinant protein production. Another research question was to characterize the metabolite levels on different carbon sources, especially on methanol for P. pastoris, compared to glucose or glycerol conditions. Connected to this aspect, a quantitative analysis of intracellular free metabolites has been used to study the metabolic impact of assimilation of methanol as a carbon source (Russmayer et al., 2015a). Under methanol assimilation conditions, also the special metabolite sedoheptulose-1,7-bisphosphate was detected, which is required for the xylulose-monophosphate cycle enabling methanol assimilation.

In many metabolomic studies, the main focus is on metabolites of the central carbon and energy metabolism, which are typically comprising the metabolite classes of amino acids, organic acids, sugar phosphates and alcohols. However, a study investigating the physiological impacts of hypoxia also focused on the lipid metabolism and on the so called lipidome. Changes in lipid metabolites including fatty acids, phospholipids and sphingolipids were quantified under normoxic and hypoxic conditions showing significant differences (Adelantado et al., 2017).

Most studies conducted with P. pastoris measuring intracellular metabolites levels were targeted approaches. In contrast, Tredwell et al. (2017) followed an untargeted strategy using NMR to evaluate the impact of recombinant protein production on metabolomic profiles. Thereby, the untargeted NMR metabolic profiling strategy was combined with a transcript analysis of unfolded protein response (UPR) relevant gene transcripts (HAC1, KAR2, PDI1). Correlations between these UPR markers and certain metabolite signals (isoleucine, aspartate, arabitol) were detected assuming that such data can be used as indicators to detect UPR stress. The benefit of replacing transcript analysis of one or a few genes with a metabolomics method remains however to be proven.

2.1.2. Metabolic flux analysis

13C metabolic flux analysis is a powerful tool to experimentally determine the flux distribution in a cell (Sauer, 2006, Zamboni et al., 2009). The methodology is used on a frequent basis to measure the metabolic flux distributions in P. pastoris, and Table 1 gives an overview about 13C metabolic flux experiments carried out with this organism.

Table 1. Main studies applying 13C metabolic flux analysis to P. pastoris.

Research question related to13C labeling experiment	Labeled substrate	Measured metabolites	Analytical procedure	Ref.
Verification of amino acid biosynthesis pathways and flux ratio analysis after growth on either glucose and glycerol	10% (w/w) of [U-13C6] glucose or [U-13C3] glycerol	Proteinogenic amino acids	NMR (2D [13C,1H]-COSY)	(Solà et al., 2004)
Analysis and regulation of the methanol metabolism in the presence of a second carbon source	different mixtures of glycerol and methanol, always 10% (w/w) of [U-13C3] glycerol and [U-13C1] methanol	Proteinogenic amino acids	NMR (2D [13C,1H]-COSY)	(Solà et al., 2007)
Flux changes between a protein producing strain and a control strain in fed-batch experiments	10% (n/n) [U-13C6] glucose	Proteinogenic amino acids	GC-MS	(Heyland et al., 2010)
Amino acid supplementation to unburden cellular metabolism during protein production	10% (n/n) [U-13C6] 13C- glucose	Proteinogenic amino acids	GC-MS	(Heyland et al., 2011)
The metabolic flux distribution during growth on a mixed feed of glucose and methanol while producing recombinant protein	12% (w/w) [U-13C6] glucose and [U-13C1] methanol	Proteinogenic amino acids	NMR (1H-13C-HSQC)	(Jordà et al., 2012)
Mapping of metabolic fluxes of glycolysis, pentose phosphate and methanol assimilation pathways	80% [1–13C1] glucose and 20% [U-13C6] glucose; 100% [U-13C1] methanol	Intracellular metabolites	GC-MS, LC-MS	(Jorda et al., 2013)
Comparison of fluxes between recombinant protein producing strain (Rol) and control strain in chemostat	80% [1–13C1] glucose and 20% [U-13C6] glucose; 100% [U-13C1] methanol	Intracellular metabolites	GC-MS, LC-MS	(Jorda et al., 2014)
Comparison of fluxes between high expression recombinant protein producing strain (beta-galactosidase) and low expression control strain	80% [U-13C] glucose, 20% [1–13C] glucose	Intracellular free amino acids	GC-MS	(Nie et al., 2014)
Comparison of fluxes between recombinant protein producing strain (hSOD) and control strain	17% [U-13C6] glucose;	Proteinogenic amino acids	GC-MS	(Nocon et al., 2014)
Flux distribution in chemostat cultivations of P. pastoriscomparing glucose to a mixed substrate of methanol/glycerol	20% [U-13C6] glucose; 20% [U-13C1] methanol and 20% [U-13C3] glycerol;	Proteinogenic amino acids	GC-MS	(Russmayer et al., 2015a)
Measurement of split ratios between glycolysis and pentose phosphate pathway (PPP) analyzing strains engineered in the PPP	100% [1,6–13C; 2,3,4,5–12C] glucose;	Intracellular metabolites	GC-MS	(Nocon et al., 2016)
Differences in fluxes comparing medium with and without glutamate	80% [U-13C] glucose, 20% [1–13C] glucose	Intracellular free amino acids	GC-MS	(P. Liu et al., 2016)
High anabolic use of the TCA cycle on glucose and comparison of flux profile to S. cerevisiae and P. stipitis	10% (w/w) U-13C labeled	Proteinogenic amino acids	NMR	(Zhang et al., 2017)

In order to calculate fluxes, it was first necessary to verify which reactions occur in the central carbon metabolism of P. pastoris. As mentioned earlier, genome data for P. pastoris was not available until 2009. Therefore, the first labeling experiments were also used to identify the biochemical pathways and to investigate if they differ from S. cerevisiae. It was found that the proteinogenic amino acids in P. pastoris are in principle synthesized as in S. cerevisiae and that the resolved flux ratios of P. pastoris are more similar compared to S. cerevisiaethan to Scheffersomyces (Pichia) stipitis (Solà et al., 2004). However, later it was found that the leucine pathway has a different compartmentalization compared to S. cerevisiae. This information is especially important for flux analysis experiments as the leucine precursor alpha-isopropylmalate is synthesized from alpha-ketoisovalerate consuming cytosolic acetyl-CoA instead of mitochondrial acetyl-CoA (Förster et al., 2014). Current experiments show that also other amino acid biosynthesis pathway enzymes have a different compartmentalization in P. pastoris compared to S. cerevisiae. In particular, biosynthetic enzymes of the arginine and lysine pathways were found in the peroxisome or cytosol rather than in mitochondria (own unpublished data). In a recent publication comparing S. cerevisiae and the methylotrophic yeastHansenula (Ogataea) polymorpha it was highlighted that it is important to carefully evaluate compartmentalization constraints in modeling approaches and not to extrapolate assumptions regarding compartmentalization from other organisms without further evidence (Lehnen et al., 2017).

In order to obtain absolute flux values, it is necessary to determine the biomass composition of the cell, besides other important parameters such as metabolite uptake and secretion rates. In 2009, the macromolecular composition of P. pastoris cells under different oxygenation conditions was investigated and compared to a strain producing a recombinant protein. It was found that the amino acid composition of P. pastoris is significantly different from the amino acid composition determined for the yeast S. cerevisiae. Furthermore, it was found that different oxygenation levels have an impact on the general biomass composition and need to be taken into account (Carnicer et al., 2009). The determined biomass datasets were integrated into a stoichiometric model together with flux ratios of a METAFoR analysis yielding absolute fluxes (Baumann et al., 2010). Biomass composition differs markedly in cells grown on different carbon sources as well, which should be taken into consideration for metabolic flux balancing (Tomàs-Gamisans et al., 2018).

Different analytic techniques can be used to analyze labeling patterns of 13C labeling experiments. Due to the high abundance of proteinogenic amino acids, NMR (Solà et al., 2004) as well as GC-MS based techniques (Heyland et al., 2011) were used to determine the labeling profile in P. pastoris cultures. Later also techniques were established to directly measure the labeling pattern of free intracellular metabolites based on GC-MS (Mairinger et al., 2015; Mairinger et al., 2018) and LC-MS (Jorda et al., 2013).

13C flux measurements were often used to determine the flux changes between a protein producing strain compared to a control strain (Table 1). In several studies, the flux profile in protein producing P. pastoris was shown to change towards an up-regulation of the PPP pathway. Consistently, the engineering of the PPP pathway by overexpression of the first two enzymatic steps (encoded by ZWF1 and SOL3) further increased recombinant protein production as discussed in Section 3.2 (Nocon et al., 2016).

2.2. Metabolic modeling

Various (ad hoc) modeling efforts, very recently reviewed by Theron et al. (2018), have been undertaken to understand and optimize especially protein production in P. pastoris. Here we outline recent developments in the modeling and genome-wide analysis of P. pastoris metabolism. Constraint-based analysis is a key approach in the systems analysis of metabolism and has proven extremely useful in providing mechanistic insights into metabolism. Together with (genome-scale) metabolic models, constraint-based analysis enables the prediction of intracellular steady-state flux distributions and the study of the genotype-phenotype relations in silico (Lewis et al., 2012).

For P. pastoris several genome-scale metabolic models are currently available (Caspeta et al., 2012, Chung et al., 2010, Saitua et al., 2017, Sohn et al., 2010, Tomas-Gamisans et al., 2016, Tomàs-Gamisans et al., 2018, Ye et al., 2017). The latest two genome-scale metabolic models represent two independent consensus reconstructions that integrated and updated the knowledge of the previous reconstructions. While Tomàs-Gamisans et al. (2018) focused on extending the phenotypic capabilities by providing accurately measured biomass compositions during growth on glucose, glycerol and methanol as sole carbon source, Ye et al. (2017) put their focus on increasing the coverage of their model, which currently contains 1243 annotated genes, 2407 reactions, and 1740 metabolites. In comparison, the latest version of the consensus genome‐scale reconstruction of S. cerevisiae, YEAST v7.0 (Aung et al., 2013) contains 910 genes, 3498 reactions and 2384 metabolites. These recent model expansions led to an improved predictability of P. pastoris growth capabilities over a wide range of different carbon and nitrogen sources in industrially relevant conditions. Overall, current genome-scale metabolic models of P. pastoris allow one to robustly predict cellular growth rates. In fact, these models were already used to guide rational metabolic engineering projects, e.g., to improve recombinant protein production (Nocon et al., 2014; see Section 3.2 for details), or to support culture media optimization (Matthews et al., 2018). To improve the predictive quality of these models not only with respect to growth but also with respect to recombinant protein yield, Irani et al. (2016) attempted to reconstruct the native as well as the humanized N-glycosylation pathways in P. pastoris. Although including N-glycosylation into the analysis reduced the predicted protein yield (compared to a purely metabolic analysis), the model typically overestimated achievable protein yields by orders of magnitude. Thus, future development will need to take other processes (e.g., protein folding and secretion efficiencies) into account in order to realistically model protein production.

Constraint-based modeling provides means to analyze the metabolic steady-state behavior determined by the stoichiometric constraints of the biochemical reaction network, while regulatory constraints are often not considered. In the case of P. pastoris the situation is aggravated by the fact that knowledge on metabolic regulation specific to P. pastoris is all but known. Computational methods that aim to compensate these deficiencies by the integration of multi-omics data sets have so far proven unsuccessful (Machado and Herrgård, 2014). Moreover, recent findings by (Hackett et al., 2016) questioned the importance of transcriptional regulation on changes in nutrient conditions at least in S. cerevisiae. They report that many changes in flux levels are due to changes in the metabolome (by Michaelis-Menten-like kinetics) rather than changes in the enzyme levels. These observations motivate the increased interest in the development of large-scale kinetic models of metabolism (Vasilakou et al., 2016).

A (quasi-)dynamic analysis of P. pastoris was recently presented by Saitua et al. (2017). It coupled the uptake and secretion dynamics of key metabolites with the steady-state behavior of metabolism in a well-established framework known as dynamic flux-balance analysis (Höffner et al., 2013, Sánchez et al., 2014). By doing so, the authors could analyze the dynamic rearrangements in the intracellular flux distributions. Moreover, they derived optimized feeding strategies and beneficial single gene deletion strategies that led to a predicted increase in production of recombinant human serum albumin. However, the experimental viability of the predictions and the computational scalability of the method to predict more complex intervention strategies remain to be further investigated.

Next to dynamic constraints, (global) resource allocation constraints have received much attention in recent years (Berkhout et al., 2013, Goelzer and Fromion, 2011, Molenaar et al., 2009, O'Brien and Palsson, 2015). In essence, these methods try to take enzyme levels into account and assign a “price” on the expression of each protein/enzyme. As total resources are limited, cells have to carefully adjust their resource allocation, which – in turn – allows one to calculate optimal enzyme distributions and corresponding flux distributions. Although impressively accurate results were achieved in Bacillus subtilis(Goelzer et al., 2015), Escherichia coli (O'Brien et al., 2013) and S. cerevisiae(Sánchez et al., 2017) resource allocation has not been studied in P. pastoris yet as these methods tend to be quite data intensive. They require i.a. estimations of catalytic rates, detailed knowledge of protein synthesis steps, and/or absolute protein abundances – data that is not readily available for non-classical model organisms.

2.3. Tools for metabolic engineering and synthetic biology of P. pastoris

2.3.1. Genomic integration: prerequisite for genetic engineering

Applying metabolic and cell engineering strategies requires advanced tools to perform genetic modifications and to introduce synthetic pathways into the host (Wagner and Alper, 2016). Since the early days of P. pastoris research, gene overexpression has mostly been performed by stably introducing the gene expression cassette into the genome, ideally into a targeted locus via a single crossover knock-in strategy based on homologous recombination (HR). Although HR efficiencies are lower in comparison to S. cerevisiae, the number of transformants obtained is usually sufficient to select strains for protein production. Using the standard electroporation-based transformation protocol described in Gasser et al. (2013), adapted from Cregg and Russell (1998), or the high-efficiency protocol applying pretreatment with lithium acetate prior to electroporation (Wu and Letchworth, 2004), transformation efficiencies of 103–104 colonies per µg linearized DNA are typically obtained, with around 85% of the obtained transformants containing the expression cassette in single or multiple copies. Methods to increase multi-copy transformants which are often associated with higher gene expression and product levels have been recently reviewed by Piva et al. (2017).

Also the generation of gene knockouts was described early on, e.g. for protease-deficient mutants (Gleeson et al., 1998) or AOX1/2-deficient strains (Cregg et al., 1989), however, targeting efficiencies have often been a bottleneck for this purpose. While in principle also the knock-in approach can be used to disrupt gene functions (Vervecken et al., 2004) in most cases generation of knock-outs relies on gene replacement by double crossover (Da Silva and Srikrishnan, 2012). Depending on the genomic locus to be disrupted and the length of homologous flanking regions, gene replacement efficiencies can range from < 0.1–80% (Nett and Gerngross, 2003; Schwarzhans et al., 2016; Vogl et al., 2018a), and often a high number of false positive transformants was observed. For obvious reasons genes are easily disruptable when their deletion has no negative impact on strain fitness while the disruption of genes with a negative impact on growth leads to much lower frequencies of positive clones. In the latter case positive clones face a negative selection pressure during cultivation so that clones with incorrect integration into random loci may dominate. As long as sufficiently long homologous arms (approximately 1000 bps on each side) are provided, the probability of obtaining correctly integrated clones is still rather high, however, if short or no flanking regions are presented non-homologous end joining (NHEJ) is prevailing (Näätsaari et al., 2012; own unpublished data). Deletion of the NHEJ machinery allowed to use significantly shorter homologous flanking regions (down to 250 bps on each side). However, the downside of this approach is that a specific genetic background bearing the Δku70 mutation needs to be used, which shows 10–30% reduced growth rates, reduced transformation efficiencies and decreased stress tolerance due to impaired DNA damage repair (Näätsaari et al., 2012, Weninger et al., 2018).

In contrast, the split-marker approach (Heiss et al., 2013) and genome editing by CRISPR/Cas9-mediated homology directed repair HDR (Gassler et al., 2018, Weninger et al., 2018, Weninger et al., 2016) improve or trigger HR, independent of the strain background. The latter has also the advantage of marker-free genetic manipulations. Alternatively, recombinase-based marker recycling strategies such as Cre/loxP (Marx et al., 2008, Pan et al., 2011) and FLP/FRT (Cregg and Madden, 1989, Näätsaari et al., 2012, Perez-Pinera et al., 2016) are available for P. pastoris. While episomal plasmids are usually not in use for the expression of recombinant genes due to their segregational instability in non-selective conditions (Hong et al., 2006, Liachko and Dunham, 2014, Sreekrishna et al., 1987), they proved to be an excellent choice for genome engineering approaches such as the expression of Cas9 or Cre recombinase. Efficient curing of the episomal plasmid can be achieved during one round of plating, meaning that the final production strain is free of the DNA modifying enzyme.

2.3.2. Genome editing with CRISPR/Cas9

In addition to gene replacements and targeted integration, CRISPR/Cas9 can be used to create indel mutations via the NHEJ DNA repair mechanism at targeted genomic loci if no homology target (also called donor DNA) is provided. Frequencies are highly dependent on the genomic locus and the used guide RNAs, and range from 90% to 100% for easily disruptable genomic loci such as AOX1, DAS1/2 or the glycerol kinase encoding gene GUT1, but can be as low as < 5% for difficult loci and/or specific sgRNAs. It has been shown that in P. pastorisCRISPR/Cas9 preferentially creates deletions of one single bp (and to a lesser extent 2 or 3 bps) upstream of the Cas9 cleavage position, indicating a difference to S. cerevisiae where insertions and deletions of different lengths (DiCarlo et al., 2013) were observed.

Independently, both Weninger et al. (2016) and Gassler et al. (In press) found that Cas9 exerts some toxicity to the host cells. Correspondingly, the use of weaker promoters for Cas9 expression (such as PPFK300 or PLAT1) was shown to enhance transformation efficiency and growth rate (Prielhofer et al., 2017). Furthermore, both groups strongly recommend testing at least 2 or 3 different guide RNAs per locus. Also, the capacity for multiplexing (targeting two or more loci simultaneously) has been demonstrated, with slightly reduced targeting efficiencies (70% compared to approximately 90%) compared to single sgRNA expression (Weninger et al., 2016).

CRISPR/Cas9-mediated HDR allows the efficient and selection marker-free integration of DNA fragments into the genome with the possibility to replace, disrupt or tag a specific genomic locus (Singh et al., 2017). Thereby researches take advantage of the fact that the double-strand break generated by Cas9 recruits the HR machinery and thus increases HR efficiency at the targeted locus. Weninger et al. (2018) suggested the use of a ku70 background strain to achieve high efficiency of marker-free HDR. However, this reduces transformation efficiencies drastically and is not suited to generate indel mutations. The Cas9/sgRNA episomal plasmid described by Gassler et al. (In press) (part of the CRISPi kit available at Addgene #1000000136) and linear donor DNA fragments with flanking regions of approximately 1000 bp each enabled targeting efficiencies above 50% on a routine basis, similar to indel generation, in a wild type strain. Please note that genes essential for growth can still not be deleted by this method, however, in such a case Cas9-mediated indels (as described above) or dCas9-mediated CRISPR interference (Lian et al., 2018, Qi et al., 2013) might be suitable to diminish gene function.

For genes especially hard to knock-out such as och1 (Krainer et al., 2013, Nett and Gerngross, 2003, Vervecken et al., 2004), where also CRISPR/Cas9 only yielded 50% mutation efficiency (Weninger et al., 2016), also more time and work intensive indirect methods might be considered. Examples are the simultaneous expression of the gene to be knocked out from an episomal plasmid (Chen et al., 2013) or as part of a loxP-flanked disruption cassette (Shibui and Hara, 2017), followed by curing of it. Another promising approach to increase HR efficiency was the addition of hydroxyurea during cell transformation that arrests the cells in the S/G2 phase of the cell cycle, where HR is prevalent to NHEJ (Tsakraklides et al., 2015).

2.3.3. Novel modular cloning strategies and libraries of synthetic parts for P. pastoris

For rapid optimization of single expression cassettes and whole synthetic pathways, efficient and high throughput modular cloning strategies are essential. To this end, seamless modular cloning strategies such as Gibson assembly, Golden Gate assembly (GGA), or Restriction site free cloning (RSFC) have been adapted and established for use in P. pastoris. Initially GGA and RSFC were used to design or optimize single expression units by assembly of standardized and commonly used genetic parts such as promoters, secretion signals, tags and transcription terminators (Obst et al., 2017 – MoClo Pichia Toolkit, available at Addgene #1000000108; Schreiber et al., 2017, Vogl et al., 2015). Additionally, a library of nuclear localization signals was reported (Weninger et al., 2015).

The GGA toolbox was then extended to allow simultaneous expression of up to eight genes from one vector and the repertoire of synthetic parts encompasses 20 promoters, 10 transcription terminators, 5 integration loci and 4 resistance markers (Prielhofer et al., 2017) – GoldenPiCS kit, available at Addgene #1000000133. The GoldenPiCS kit is fully compatible with the GoldenMOCS system, which was designed to perform metabolic engineering tasks in different cellular hosts with the same cloning platform (Sarkari et al., 2017). Gibson assembly has also been used for the same purpose, with a set of 49 promoters and 20 terminators characterized (Vogl et al., 2016). Both studies found that it is crucial to avoid repetitive sequences (such as the same promoter or terminator) in vector or pathway design in order to prevent loop out during the transformation step, which resulted in only partial integration of the desired pathway.

Multiple enzyme pathways can be expressed using combinatorial assembly of gene cassettes (by Gibson assembly or the GoldenPiCS toolkit) or using single cassettes with polycistronic expression constructs based on 2 A peptide (Geier et al., 2015, Siripong et al., 2018), thus enabling efficient testing of complex heterologous pathways and fine-tuning of enzyme expression. Due to the combinatorial nature of multigene pathway assembly, a high number of variants is possible, thus screening for the best combination is likely becoming the limiting factor. To explore the possible design space, advanced high throughput screening methods are needed.

For efficient genetic engineering and pathway expression, tunable and differently strong promoters are a prerequisite to drive gene expression. Apart from overexpression of heterologous genes, exchange of native promoters with other promoters is a way to overcome feedback inhibition, optimize enzyme abundance and increase the metabolic flux towards a product of interest (Liu et al., 2015, Marx et al., 2008). Efficient and regulatable P. pastoris promoters are often derived from carbon-source utilization pathways such as the methanol utilization pathway (MUT) or the rhamnose-utilization pathway (reviewed by Vogl and Glieder, 2013). Natively, these promoters are induced by the availability of the respective carbon source and shut off on glucose or glycerol surplus. Although the AOX1 promoter is still the most commonly used control element for protein production applications, a wide variety of other MUT promoters have been described and tested recently (Gasser et al., 2015, Prielhofer et al., 2017, Vogl et al., 2016). Together with systematically engineered promoter variants (Hartner et al., 2008, Yang et al., 2018), and synthetic promoters (Vogl et al., 2014) a large toolbox of synthetic parts is available, which allows methanol-induced or in some cases also de-repressed gene expression. If methanol-free expression systems are intended, strong constitutive promoters such as PGAP or PTEF1 are frequently used (Vogl and Glieder, 2013). Alternatively, the PG promoter series which is induced in glucose-limiting conditions as often encountered during production processes (Prielhofer et al., 2013) or the thiamine-regulatable PTHI11 promoter (Landes et al., 2016) provide for methanol-independent regulated promoters. Vitamin or amino acid regulated promoters such as PSER1, PMET3, and PTHR1 also allow for repression and induction of pathways independent of the carbon source and can be used to specifically down-regulate otherwise essential gene functions (Delic et al., 2013). Additionally, less strong constitutive or regulatable promoters, as provided e.g. in the GoldenPiCS toolbox (Prielhofer et al., 2017) can be of interest for metabolic engineering applications and to balance multigene expression in synthetic pathways. Altogether, more than 100 promoters have been described for use in P. pastoris.

Recently, a non-orthogonal combination of promoter engineering (by duplication or deletion of transcription factor binding sites) and overexpression of the respective transcription factors (TF) has been reported for PGAP (Ata et al., 2017). A similar approach was followed up for PAOX1, where single overexpression of two out of three different TFs involved in regulation of the MUT pathway (Mxr1, Mit1) converted P. pastoris to methanol-free PAOX1expression (Vogl et al., 2018b). Concurrently, it was reported that knockout of three repressing transcription factors (Δmig1Δmig2Δnrg1) in combination with Mit1-overexpression renders PAOX1 methanol-independent (Wang et al., 2017). As an example of a truly orthogonal system, a β-estradiol-inducible circuit consisting of an estradiol-inducible zinc-finger transcription factor and corresponding artificial binding sites was shown to be operational in P. pastoris(Perez-Pinera et al., 2016).

Less effort was so far put into transcription terminators (TT), which have also been implicated with mRNA stability. Recent high-throughput screening in S. cerevisiae emphasized that TTs are important regulatory elements which should be considered for the fine-tuning of gene expression and metabolic pathways as terminator selection can influence the flux through a metabolic pathway (Curran et al., 2013, Wei et al., 2017, Yamanishi et al., 2013). Apart from the commonly used AOX1-TT and the S. cerevisiae derived CYC1-TT, two recent studies provided more insight into this topic in P. pastoris: while Vogl et al. (2016) focused on TTs of MUT genes, Prielhofer et al. (2017) assessed the efficiency of TTs of strongly expressed P. pastoris genes including many ribosomal protein genes. All tested P. pastoris TTs showed rather similar reporter levels, slightly exceeding ScCYC1-TT by maximum 50%. The so far tested P. pastoris terminators had a size of 174–507 bps (Prielhofer et al., 2017, Vogl et al., 2016), whereas significantly shorter (35–75 bps long) and even synthetic terminator sequences have been identified for S. cerevisiae (Curran et al., 2015). As TTs seem to be functional across yeast species (Wagner and Alper, 2016), it will be of interest to test and implement such short TTs also into the synthetic P. pastoris toolbox.

As stated above, the traditional method is to integrate the expression cassette(s) into the genome of P. pastoris. Only a few integration loci have been comparatively analyzed so far (3´-AOX1, 5´-AOX1, AOX1, 5´-ENO1, 5´-RGI2, 5´-GAP/TDH3, GUT1, rDNA locus), which all seemed to allow gene expression from a single expression cassette at similar strength (Prielhofer et al., 2017, Vogl et al., 2018a).

A recent approach involving in vivo self-ligation cloning of short overlapping DNA sequences and episomal vectors (Camattari et al., 2016) is a promising step towards in vivo single step pathway construction similar as described for S. cerevisiae (Shao and Zhao, 2009), however, efficient assembly of larger multi-gene pathways in P. pastoris will need further optimization. Episomal vectors (using e.g. the native P. pastoris ARS, the K. lactis Pan-ARS or the P. pastorismitochondrial ARS sequences) are also a useful possibility for screening experiments e.g. for the assessment of activity of enzyme variants, however, they are not suitable for protein production processes so far, as they require a constant positive selection pressure due to their segregational instability (Camattari et al., 2016, Liachko and Dunham, 2014, Schwarzhans et al., 2017b).

Apart from precise genetic engineering of industrial strains, adaptive laboratory evolution (ALE) is a valuable tool to engineer their genomes, especially when the desired traits are more complex and genetically not well defined. Evolution and subsequent reverse engineering proved to be efficient for increasing stress tolerance and product levels in many host organisms (Dragosits and Mattanovich, 2013). So far, the application of laboratory evolution has been rather limited in P. pastoris, which might be due to the lack of easily selectable traits connected to protein production. In this respect, ALE has been used to select for enhanced growth on methanol yielding some clones that also produced more heterologous protein (Moser et al., 2017).

Mating as another approach to engineer genomes without exact knowledge of the involved genetic traits was not fully accessible for P. pastoris until recently. Mating type switching and the rapid haploidization after mating prevented the efficient applications of methods like quantitative trait loci mapping (Swinnen et al., 2012). The recent development of mutant P. pastoris strains with defined stable mating types (Heistinger et al., 2018) further enhances the accessibility of the genetic repertoire of this yeast.