Expression Technologies Inc.

Home Up About us Contact Orders Search

Protein yield
 

Protein Yield or protein expression level

In molecular biology, protein yield means the recombinant protein expression level or the quantity of protein production in a defined volume of a culture. The quantity is measured in grams, milligrams, or micrograms. The defined volume is often a liter. If a protein yield is in grams per liter, this protein yield is high which is in the range of pharmaceutical production. If a protein yield is in milligrams per liter, the protein yield is intermediate which is sufficient for most biochemical analysis. If a protein yield is in micrograms per liter, the protein yield is low which is only enough for limited biochemical studies. Various expression technologies may be used to increase protein yield. The main control of protein yield appears to be at the transcription level, though other regulations such as DNA replication and protein translation are also important. We will examine the factors related to protein yield individually.

Contents of protein yield

bullet

Host and protein yield

bullet

Expression vector and protein yield

bullet

Targeting protein cDNA and protein yield

bullet

Other factors and protein yield

bullet

Technologies to improve protein yield

Host and protein yield

bullet

Host systems and protein yield

bullet

Host cell strains and protein yield

Host systems and protein yield

Most recombinant protein expressions are achieved in heterologous hosts, namely the proteins are expressed in the cell lines or cell strains other than where they are produced in their native environments. The origins of the recombinant proteins are often from mammals such as human or mouse. The heterologous hosts are E.coli, yeast, insect, and mammalian cells. At production scale, the protein yields in these hosts are similar in grams per liter. At laboratory scale, the protein yields from E.coli or insect cells are often higher than those from yeast or mammalian cells. A recombinant protein may be expressed in all available hosts. It may be also expressed in one host only.

Host cell strains and protein yield

Eukaryotic cells with different genetic background are called cell lines. Prokaryotic cells with different genetic background are often termed cell strains. We use cell strains here since most of the protein expression we discussed is in E.coli. In a chosen host, there are many cell strains available for protein expression. Protein yield can be significantly different in different cell strains. Some cell strains may supply rare tRNAs. Others may promote disulfide bond formation. Still others may reduce protein toxicity. The needs of a particular protein may be examined from the existing knowledge and experiments. Cell strains may be chosen accordingly.

Back to top ^ | Go to bottom v

Expression vector and protein yield

bullet Promoter and protein yield
bullet Ribosome binding site (rbs) and protein yield
bullet Protein yield and spacing between rbs and start codon AUG
bullet Stop codon, transcription terminator and protein yield
bullet Replication origin and protein yield
bullet Selection marker of the expression vector and protein yield
bullet Regulatory gene and protein yield

An expression vector must contain structure units that allow protein expression. These structural units include at least a promoter, a ribosome binding site (rbs), a start codon, a stop codon, and a terminator which are required for recombinant protein expression in a host cell. In addition, the expression vector has to contain a selection marker and replication origin for the production and selection of the vector in a host cell.  All these structural units directly or indirectly determine the expression level of the recombinant protein or the protein yield. It should be stressed that the targeting recombinant protein itself also determines its expression level. 

Promoter and protein yield

Promoter strength determines the mRNA level of the recombinant protein. Under normal conditions, the stronger the promoter is, the higher protein yield may be obtained. For most toxic proteins, weaker promoter gives higher protein yield. In these cases, less is more and more is less. The un-induced or leaky expression is presumably responsible for the observation. Commonly used promoters used in E.coli expression are either from native E.coli genes or from bacteriophages.

Promoter origin E.coli phage
Promoter name Ptrc, Plac, Ptac, PBAD PT7, PT3, PT5, PSP6, PL

Phage promoters usually allow transcription at high specificity and rate. Some E.coli promoters such as Ptrc and Ptac also permit high transcription rate.

Ribosome binding site (rbs) and protein yield

Protein synthesis or translation machinery ribosome binds at the ribosome binding site (rbs) which is also termed Shine-Dalgarno sequence. The consensus rbs sequence is UAAGGAGG. Some E.coli genes do not have the consensus rbs sequence, but they still allow efficient protein translation. It is reported that the secondary structure of rbs is important for the ribosome binding or translation initiation. The 5' end capping and secondary structure may also enhance the mRNA stability and therefore increase protein yield. Optimal transcription initiation may be obtained from the consensus rbs sequence; therefore protein yield may be increased accordingly. The rbs sequence locates up stream of start codon AUG which is different from eukaryote Kozak sequence. Kozak sequence flanking the start codon is recognized by ribosome as translation initiation site.

It is reported that different protein yield may be obtained from a different rbs for a protein. For a chosen recombinant protein, different rbs may give different expression level.

Protein yield and spacing between rbs and start codon AUG

The spacing between rbs and start codon AUG is important for efficient translation initiation and protein yield. The optimal spacing appears to be 7 + 2 nucleotides. However it has been reported that as few as 4 nucleotides or as many as 14 nucleotides worked with lower efficiency.

The percentages of E.coli genes use AUG and GUG as start codons are about 80% and 15% respectively. Other codon may also be used as start codons but with lower frequency. Most genes from bacteria viruses or phages use AUG as start codon. AUG may give better start than other codons in translation initiation.

Stop codon, transcription terminator and protein yield

All organisms use three stop codons UAA, UAG, AND UGA. E.coli cells use UAA at much higher frequency than the other two codons. Eukaryotes do not exhibit this preference in stop codon usage. Multiple stop codons may increase the transcription termination efficiency. Together with stop codons, the transcription terminator sequences down stream of the stop codons are responsible for the transcription termination. Efficient transcription termination minimizes the cellular energy drain and reduces the metabolic burden for the host. More importantly, the transcription terminator forms secondary structure at 3' end of the mRNA, improves the stability of the mRNA, and therefore increases the protein yield.

Most expression vectors contain multiple stop codons in three reading frames and efficient transcription terminators. To express a eukaryotic gene in E.coli, changing the stop codon to TAA in cloning may increase the transcription termination efficiency and the translation termination accuracy.

Replication origin and protein yield

The replication origin determines the copy number of the expression vector in a host cell. Many highly expressed genes in their native cells contain multiple copies. Plasmid copy number ranges from a few copies to hundreds of copies. High copy number expression vectors normally give high protein yield for non-toxic proteins. High copy number also drains cellular energy and is a major metabolic burden for the host cells. In addition, high copy number also increases the toxicity of the recombinant protein. Therefore many expression vectors use low or intermediate copy number replication origins derived from pBR322 or pACYC plasmids. High copy number origins such as those from pUC plasmids are mostly used for cloning purposes.

Selection marker of the expression vector and protein yield

Most common selection markers used on expression vectors are ampicillin, chloramphenicol, kanamycin, and tetracycline resistance genes. The popularity of these selection marker genes are more or less in this listed order. The degree of toxicity to the host cells of these gene products combined with their respective antibiotics may contribute the popularity of these selection marker genes. Expressing a recombinant protein in expression vectors with different selection markers clearly result in different protein yield although all other conditions are the same. The mechanism of the differences is not known.

Regulatory gene and protein yield

The most commonly used regulatory gene in protein expression is lacI repressor gene. The following over-simplified chemical equilibrium represents the binding between lacI repressor (R) and its DNA binding site lac operator (O).

All wild type E.coli cells contain endogenous lacI genes. However the lacI repressor expressed from E.coli chromosome is often insufficient to repressor the leaky expression of the expression vector. Leaky expression leads to the toxicity to the host cells and therefore decreases the protein yield. The lacIq gene expresses high level of lacI repressor. With incorporation of lacIq gene on the vector, the leaky expression of the recombinant protein is greatly reduced and protein yield is increased. However problems still exist for cloning and expression of the highly toxic proteins. See Toxic protein cloning and expression for more information.

In addition to lacI repressor gene, other regulatory genes may also be incorporated on expression vectors. All regulatory genes are important for DNA cloning and protein expression.

Back to top ^ | Go to bottom v

Targeting protein cDNA and protein yield

bullet 5' end GC contents and protein yield
bullet 5' end codon usage and protein yield
bullet N-terminal amino acids and protein yield
bullet Coding sequence and protein yield

5' end GC contents and protein yield

When a eukaryote gene is cloned for expression in E.coli, only DNA sequences from start and stop codons are needed. Sequences flanking the start and stop codons of the cDNA are often provided from the chosen expression vector. In most cases, the cDNA sequences are not modified before cloning into an expression vector. However the cDNA sequences may affect protein expression. E.coli cells often use AUG and UAA as start and stop codons. Using AUG and UAA as start and stop codons respectively is a general practice in cloning step. High 5' end GC contents of the transcribed mRNA may form secondary structures. These secondary structures may reduce or stop protein translation. Minimizing the 5' GC contents will eliminate the secondary structure formation and increase protein yield. This may be achieved using AT-rich amino acid codons. GC contents on other part of the cDNA seem to have less effect on protein yield.

5' end codon usage and protein yield

All organisms use 20 amino acids, but 64 codons are used to encode these 20 amino acids plus three stop codons. Only two amino acids Met and Trp are encoded by a single codon. All other amino acids are encoded by multiple codons. This raises the possibility that different codons for the same amino acid may exhibit different translation efficiency. This is indeed the case. Different organisms have different codon preference. Consistence with this preference, different amounts of tRNA are available for recognizing different codons. Some codons are highly used in mammalian cells, but they are rarely used in bacteria. The bacterial cells may not have sufficient amount of tRNA to handle the expression of the protein with multiple rare codons especially at the N-terminus of the protein. As a result, the yield of protein will be low. Please see Technologies to improve protein yield caused by rare codons for more information.

Rare codons at the first 20 amino acids, sometime times at first 50 amino acids, appear to affect protein yield significantly. Rare codons after first 50 amino acids do not have significant effects on protein yield. However clusters of the same rare codon can pause or stop the translation even the clusters are located at after first 50 amino acids. These clusters of the same rare codon will cause premature translation termination. The resulting truncated protein may not be correctly folded. Incorrectly folded soluble protein is not stable and is susceptible to protein degradation. Careful examination of clusters of the same rare codon is important for protein yield.

N-terminal amino acids and protein yield

The structural proteins of bacteriophages are highly expressed in E.coli hosts. Many recombinant proteins are also highly expressed when the first 5 to 10 amino acids from structural proteins of bacteriophages are added to the N-termini of the proteins. Some N-termini of the fusion partners are engineered to contain these amino acids. This is why the expression levels of the fusion recombinant proteins are high.

Coding sequence and protein yield

Recent studies indicate protein coding sequence itself is important for protein yield. Eighteen amino acids and translation stop are encoded by multiple codons. A single protein may be represented by a large number of coding sequences with different codons encoding the same amino acids. Protein expression levels of these different coding sequences can be 250 times different. It was observed that coding sequences also affect their mRNA level, mRNA degradation, and the host cell growth rate. It was concluded that codon bias was not responsible for the expression variation. The stability of mRNA folding near the ribosomal binding site and associated rates of translation initiation play dominant role in determining protein expression level. It appears that the protein expression levels of different coding sequences are empirical. There are no general rules to determine a coding sequence of a protein that will lead to high expression level. Pharmaceutically important proteins may justify the resources to test large number of coding sequences.

Back to top ^ | Go to bottom v

Other factors and protein yield

bullet Cell density and protein yield
bullet Growth medium and protein yield
bullet Antibiotic selection and protein yield
bullet Proteolysis and protein yield
bullet Fusion partners and protein yield
bullet Protein toxicity and protein yield
bullet Protein solubility and protein yield

Cell density and protein yield

Under most conditions, protein yield is proportional to the cell mass of the host cells. The cell mass is equal to the cell density times the cell volume or the culture volume. Using a larger volume and increasing the cell density are the most common ways to increase protein yield. The culture volume is limited by laboratory or production settings. The cell density is related to the culture conditions and the growth medium. Changing culture condition from shake flask to fermentation will increase cell density 10 times or higher. A fed batch fermentor can reach a cell density of OD600 = 30 to 50 for most E.coli strains. E.coli cells can reach a cell density up to OD600 = 200 to 250 under an optimized fermentation condition. Protein yield may be increased from milligrams to grams per liter.

Growth medium and protein yield

In addition to culture condition, growth medium can also increase cell density and therefore protein yield. High density growth media support high density E.coli growth. In a shake flask under normal aeration conditions, common media such as LB can support E.coli growth up to a cell density of OD600 = 2 to 3. Richer media such as TB can grow E.coli to OD600 = 5 to 8. By contrast, all of our proprietary high density bacterial growth media can grow E.coli to a cell density of OD600 = 30 to 50. This is over ten times higher than LB and over five times higher than TB. As a result, 5 to 10 times more plasmid DNA or protein can be produced in our high density growth media. More about growth medium...>

Antibiotic selection and protein yield

Ampicillin is the mostly used and therefore best studied antibiotic. The expression of ampicillin resistance gene or β-lactamase protein per se does not appear to have significant impact on protein yield. However lost of selection is one of the major reasons of low protein yield with ampicillin selection marker. Ampicillin may be degraded chemically under acidic condition of the medium or by β-lactamase. At high cell density with insufficient aeration, the culture medium may reach pH 4 or lower. Ampicillin will be chemically degraded at this or lower pH. Lost of antibiotic in the medium will result the growth of the cells without expression vectors and will lower protein yield. Ampicillin analog carbenicillin is more stable at acidic pH. Using carbenicillin in place of ampicillin, providing additional ampicillin at induction, or increasing the pH will improve the selection and protein yield. E.coli cells appear to tolerate ampicillin at a large range of concentrations. Commonly used ampicillin concentration is from 50 to 200 ug/ml medium under shake flask conditions.

Chloramphenicol is the second mostly used antibiotic. Its selection marker is often used on the plasmid that co-expressed with an ampicillin selection plasmid. E.coli cells also tolerate chloramphenicol at a large range of concentrations. Chloramphenicol concentrations of 30 to 150 ug/ml may be used in a shake flask container. Chloramphenicol does appear to be easily degraded as ampicillin.

Kanamycin selection marker is also often used on the plasmids that co-express with ampicillin plasmids. E.coli cells do not have high tolerance of kanamycin and tetracycline. Kanamycin and tetracycline are used at 30 to 50 ug/ml and 10 to 20 ug/ml respectively.

The percentage of cells containing plasmid may be tested by growing cell on the plates with or without antibiotic. The ratio of cell number growing on antibiotic plate over the cell number on the plate without antibiotic is the percentage of cells containing plasmid. Increasing the percentage of cells containing the plasmid will increase protein yield. Aeration condition, medium pH, growth temperature, and addition of extra antibiotic will all affect this percentage.

Proteolysis and protein yield

The degradation of protein is termed proteolysis. Proteolysis can be reduced by using protease-deficient host cell strains. Expressing the protein in a different cellular compartment may also reduce proteolysis. Some amino acid sequence may be related in proteolysis. For example, amino acids following start Met such as Arg, Lys, Phe, Leu, Tyr, and Trp are more susceptible to degradation than other amino acids. In eukaryote, PEST sequence (Pro, Glu, Ser, and Thr) are involved in proteolysis, but they are not important in prokaryote.

The most important factor for protein stability in expression is protein folding. Incorrectly folded protein will subject to degradation and results in low yield. Most observed protein degradations during and after protein expression are the results of expressing truncated protein domains. In many cases, the flanking amino acid sequences of an intact domain are also required for correct folding. Ten to 20 amino acids flanking the intact domain are generally sufficient for correct folding. Expressing an intact domain of the protein with necessary flanking amino acid sequences is critical to avoid protein degradation.

Fusion partners and protein yield

A recombinant protein may be fused with an amino acid tag or a fusion protein. A tag is usually less than 50 amino acids in length. A fusion tag may facilitate protein detection and purification. In addition, the amino acid sequences of most fusion tags are optimized for protein yield. A fusion protein is often highly expressed soluble protein. Many fusion proteins will also facilitate protein detection and purification. Expressing a recombinant protein with a fusion tag may increase its protein yield. Expressing a recombinant protein with a fusion protein may increase its yield, solubility, and stability.

Protein toxicity and protein yield

Protein toxicity is a commonly observed phenomenon. All active proteins will perform certain functions. All these functions with few exceptions are needed by the host cells and therefore they interfere with cellular proliferation and differentiation. The appeared phenotype of the effects of these proteins to the host cells is their "toxicity". We estimated that about 80% of all soluble proteins have certain degree of toxicity to their hosts. About 10% of all proteins are highly toxic to host cells. Toxic proteins tend to slow cell growth rate, reduce cell density, and in some cases kill the host cells. Protein yields of toxic proteins are lower than the non-toxic protein. More about toxic protein cloning and expression...>

Protein solubility and protein yield

Protein solubility is mainly determined by protein folding. At the time of a protein synthesis, an appropriate amount of prosthetic group, co-factor, ligand, other protein subunits, natural partners, molecular chaperones, and its natural environments such as a cell membrane or their substitutes must be available to get the protein correctly folded. One or more of these required materials for protein folding may be depleted at high production level resulting in miss-folded insoluble protein. In addition, the cellular protein synthesis machinery seems to have difficult to handle the protein folding at high protein synthesis rate. This is especially true for bacteria and insect expression systems. The bacteria or the insect cells may simply pack the highly synthesized protein into inclusion bodies. If the protein expression level is at tens of milligrams per liter in commonly used media such as LB or TB or hundreds of milligrams in high density growth media and in a fermentor, increasing the yield further may result in some proteins insoluble although other proteins are soluble and functional at grams per liter yield. The dilemma exists for the proteins becoming insoluble at high expression level. For the applications that solubility is not important, highest expression level may be attempted available technologies. For proteins that their solubility is critical, highest yield may only be achieved by using a high density growth medium or a fermentor. Both high density growth media and fermentation increase protein production by increasing cell density. They do not affect protein synthesis rate and therefore generally will not affect protein solubility.

Back to top ^ | Go to bottom v

Technologies to improve protein yield

bullet Technologies that almost always improve protein yield
bullet Technologies may be optimized in a standard laboratory
bullet Technologies to increase protein yield caused by protein toxicity
bullet Technologies to increase yield caused by post-induction toxicity
bullet Technologies to improve protein yield caused by rare codons
bullet Examples to increase protein yield 

Developments on DNA synthesis, cloning and protein expression enable today's scientists to optimize protein yield. For an important protein with large resources, almost all above mentioned technologies may be tested. For example, a protein may be expressed in all different expression systems from bacteria, yeast, insect, to mammal. Many different cell strains or cell lines can be tested. All sequence elements of the expression vector from promoter, terminator to regulatory genes can be optimized. Tens or hundreds of different entire coding sequences of a protein may be synthesized and tested. Some of these optimizations may be contracted out to CRO companies like ours.

Technologies that almost always improve protein yield

bullet Culture volume
bullet Fed batch fermentation
bullet High density growth media
bulletCombinations of above

Technologies may be optimized in a standard laboratory

bulletInduction time and temperature
bulletInducer concentration
bulletDifferent cell strains
bulletAntibiotic selection
bulletAmino acid sequences and codons of N-terminus of the targeting protein
bullet5' end GC contents
bulletIntact individual domains
bulletFusion partners
bulletSub-cellular compartments or locations
bulletDifferent expression systems (bacteria, yeast, insect, or mammal)

Some of above factors can be easily optimized such as induction time, temperature, and inducer concentration. Others may require more molecular biology manipulation. These are all standard techniques and can be performed in most molecular biology labs.  

Technologies to increase protein yield caused by protein toxicity

The one of the major factors affecting protein yield is protein toxicity. We estimate that less than 20% of low yield is caused by codon usage. Over 80% of low yield are caused by protein toxicity. We define that proteins interfering with cell proliferation and differentiation are toxic proteins. These proteins normally slow cell growth. Sometimes they cause cell death.

Protein toxicity is the result of protein leaky expression before induction. Transcription read-through from upstream real or cryptic promoters and insufficient transcription repression also result in leaky expression or pre-induction expression.  

bulletUse expression vectors containing a strong transcription terminator upstream of the promoter to stop possible transcription read-through. In addition the vectors should contain multiple transcription repressor binding sites to minimize non-induced transcription before induction.
bulletUse the cell strains over-expressing transcription repressor lacI. The higher the level of lacI is in the cell, the less leaky expression will be observed. However high level of lacI may also result in lower expression level.
bulletUse the media containing transcription inhibitor such as glucose. In addition to transcription inhibitors such as glucose, high density growth media contain trace metals, minerals and vitamins. They also contain organic buffers to keep the pH balanced to support cells reach high density (OD600 > 10). In many cases, high density growth media themselves will increase protein yield 10 times or higher.
bulletCombine any two or all three of above strategies.
bulletExpress the protein in different domains. Protein toxicity can be significantly reduced or sometimes eliminated when it is expressed in different domains.
bulletExpress as fusion proteins. Large fusion partners such as GST or thioredoxin can sometimes reduce protein toxicity. Small tags less than 30 amino acids do not affect protein toxicity significantly.

Some of above strategies, such as changing medium or using lacI expressing strains, can be easily achieved. Others, such as using a different vector, expressing as different domains or with fusion partner, require more molecular biology manipulation. These are all standard techniques and can be performed in all molecular biology labs.  

Technologies to increase yield caused by post-induction toxicity

Many strategies improving protein yield are focused on reducing pre-induction toxicity. Following strategies may be used to decrease post-induction toxicity.

bulletUse high density growth media and induce at lower temperature, with less inducer and shorter induction time.
bulletExpress the protein in periplasmic or in insoluble state.
bulletExpress the protein in different domains.
bulletFuse the protein with a large fusion partner such as GST, thioredoxin, MBP or NusA. Small tags will not significantly reduce protein toxicity.

Pre-induction toxicity can be completely eliminated by combination of expression vectors, cell strains and growth media. Post-induction toxicity cannot be completely overcome for certain proteins. With the strategies to reduce post-induction toxicity, sufficient amount of proteins may be expressed and purified.

Technologies to improve protein yield caused by rare codons

Recombinant protein may contain codons that are rarely used in the expression system or cells.  Insufficient amount of a particular tRNA in the expression system may result in so called codon starvation. Cellular translation machinery may pause or halt at the repetitive rare codons because of few tRNA available. Proteins with multiple repetitive rare codons especially within the first 50 amino acids of the amino terminus of the protein may significantly reduce the protein expression. Sometimes it shuts down the expression completely.

bulletStudy the codon usage of the protein, especially the first 50 amino acids, by a DNA sequence analysis software and determine which rare codon may affect protein yield.
bulletOptimize the codon usage of the cDNA encoding the protein by synthesizing the entire cDNA completely. The advantage of this strategy is that it eliminates codon usage problem completely. However it is costly to synthesize a relatively large gene and it also takes time to clone the synthesized cDNA into an expression vector.
bulletExpress the protein in cells supplemented with tRNAs recognizing rare codons. This may be the most commonly used strategy since it costs less and manipulation is relatively easy. One important consideration is choosing the cell strain carefully. Any large cDNA will contain a few rare codons, but not all these rare codons affect protein expression. Only multiple repetitive rare codons near the amino terminus of the protein are important. Cell strains with supplemented tRNAs of these codons should be sufficient. Some tRNA genes are toxic to the cells when they are over-expressed. They can result in low protein yield themselves. Therefore unnecessary tRNA genes should avoided whenever it is possible.
bulletUse a high density growth medium instead of a regular growth medium. High density growth media can support E.coli cells reach a cell density 5 to 10 times higher than that of a regular medium such as LB. Therefore 5 to 10 times more protein may be obtained. However, if there is no expression at all, a high density growth medium cannot help.
bulletCombine codon optimized cDNA with high density growth media.
bulletCombine tRNA supplemented cells with high density growth media.

Examples to increase protein yield 

bulletIncrease protein yield caused by protein toxicity.
bulletIncrease protein yield, solubility, and activity in high density growth media.
bulletIncrease protein yield and stability in high density growth media.

Related literatures of protein yield

Bacterial growth media
Plasmid DNA yield
Protein toxicity
Protein solubility

Related products of protein yield

Bacterial E.coli growth media
DNA ladders or DNA markers
Expression vectors
Competent cells for cloning and expression
E.coli cell strains for protein expression

Back to top ^

We appreciate your feedback and comments at info@exptec.com.


Home ] Up ]
Copyright 2003 Expression Technologies Inc.