Genome sequencing is the process of determining the order of the nucleotide bases (adenine, guanine, cytosine and thymine) in a DNA molecule or genome of an organism. Sequencing methods have been game changers in modern biology and medicine. DNA sequencing not only accelerates biological research and discovery, but also improves medical diagnosis and treatment of disease. Information about the precise sequence of nucleotides in DNA is useful in various applied areas of biology, such as molecular and forensic biology, virology, medicine, recombinant DNA technology, biosystematics and bioinformatics.
The first successful attempts to sequence DNA date back to the early 1970s. In 1973, Gilbert and Maxam reported a sequence of 24 base pairs using a method called wander point analysis. However, the first complete genome to be sequenced was that of bacteriophage ΦX174, followed by Sanger et al. Eel. The first known DNA sequence was obtained by a two-dimensional chromatography-based method. Over the next few decades, DNA sequencing became easier, faster, and cheaper with the development of dye-based sequencing methods and automated sequencing and analysis instruments.
DNA sequencing method
These are different methods of DNA sequencing.
first generation sequencing
The first generation sequencing method is the earliest developed sequencing technology and is called the basic sequencing method. There are two particularly important first-generation sequencing technologies.
1. Megazin-Gilbert sequencing
The method, developed by Allan Maxam and Walter Gilbert in 1977, is based on chemical modification of DNA followed by cleavage at specific bases. Therefore, this method is also called chemical lysis. During this process, one end of the DNA fragment must be radioactively labeled. Chemical treatments are used to create breaks at a small ratio between one or two of the four nucleotide bases. This produces a series of fragments, each radioactively labeled at one end. The next step is size separation by gel electrophoresis, where the fragments from the four reactions are aligned side by side. Visualization of the fragments is then aided by autoradiography, from which sequences can be deduced. Due to the development of advanced methods, this method is not widely used.
2. Sanger sequencing
The method requires an ssDNA template, DNA primers, DNA polymerase, dNTPs and ddNTPs. ddNTPs can be radioactively/fluorescently labeled for detection in automated sequencing methods. ︎ DNA samples are divided into four independent sequencing reactions containing all four standard deoxynucleotides (dATP, dGTP, dCTP, dTTP) and DNA polymerase. Each reaction mixture contains all the required chemicals but only one ddNTP. Once primers, polymerase and dNTPs are available, the polymerase begins to extend the DNA︎. However, once ddNTPs are added, the activity of DNA polymerase is stopped and the chain is terminated. This is because ddNTPs lack a 3' OH group where newly introduced nucleotides bind as phosphodiester bonds. Therefore, this process is also known as the chain termination method. Its main advantage is that it is a simpler DNA sequencing method than Maxam-Gilbert and avoids the use of toxic chemicals.
next generation sequencing
Over the past few years, first-generation sequencing methods have been complemented by next-generation sequencing technologies, particularly for large-scale, automated genome analysis and large amounts of low-cost data, as well as the development of advanced sequencing systems. That is why they are also called high-throughput sequencing.
All techniques include multiple methods broadly grouped into template preparation, sequencing and imaging, and data analysis. The combination of unique specific protocols for different stages differentiates the techniques and data produced.
1. Illumina/Solexa sequencing
Illumina sequencing technology uses the principle of sequencing by synthesis. Sequencing templates are immobilized on a proprietary flow cell surface. Unlabeled nucleotides and enzymes are added to initiate solid phase bridge amplification. The enzyme binds nucleotides to build double-stranded bridges on solid substrates. Denaturation immobilizes the single-stranded template on the substrate. Generate millions of dense double-stranded DNA clusters in each lane of the flow cell. The first sequencing cycle begins with the addition of four labeled reversible terminators, primers, and DNA polymerase. After laser excitation, the fluorescence emitted by each cluster is captured and the first base is identified. The next cycle repeats the incorporation of four labeled reversible terminators, primers and DNA polymerase. After laser excitation, take the image as before and record the identity of the second base. The sequencing cycle is repeated to determine the sequence of bases in the fragment, one base at a time. Align and compare data with a reference, then identify sequencing differences.
In each sequencing cycle, a single labeled dNTP is added to the nucleic acid strand. The nucleotide tag serves as a terminator for the polymerization reaction, so after each dNTP incorporation, the fluorescent dye is imaged to identify the base and then digested to incorporate the next nucleotide.
Its optimal sequencing performance is 1000 Mb/run.
2. 454 pyrosequencing
The basic principle of pyrosequencing is single nucleotide addition or sequencing by synthesis. The process uses a bioluminescent method to measure the release of inorganic pyrophosphate by converting it proportionally to visible light through a series of enzymatic reactions. It manipulates DNA polymerase by adding a limited amount of dNTPs at a time. After incorporation of complementary dNTPs, DNA polymerase extends the primer and pauses. DNA synthesis restarts after the next complementary dNTP is added in the distribution cycle.
In 454 pyrosequencing, template beads are surrounded/coated with sulfurylase and luciferase in picotiter wells. Each picotiter well contains a single clonally amplified template bead. Individual dNTPs are then flowed through the wells and dispensed in a predetermined order. Bioluminescence is imaged with a charge-coupled device (CCD) camera. The sequence and intensity of the light peaks are recorded as grams, revealing the underlying DNA sequence. Its sequencing performance is 400 Mb/run.
3. Solid-state sequencing
SOLiD stands for Support for Oligonucleotide Ligation Detection. It is based on the principle of sequencing by ligation. As with the 454 technology, DNA template fragments are clonally amplified on beads. However, the beads are placed on the solid phase of the flow cell, thus achieving higher densities than other methods. In sequencing by ligation, a mixture of different fluorescently labeled dinucleotide probes is pumped into a flow cell. As the correct dinucleotide probe binds the template DNA, it binds to the predesigned primers on the solid phase. After unincorporated probe is washed away, fluorescence is captured and recorded. Each fluorescence wavelength corresponds to a specific dinucleotide combination. The fluorescent dye is then removed and washed to begin the next sequencing cycle. Its sequencing performance is 2000 Mb/run.
Applications of genome sequencing
Below are some applications of genome sequencing.
1. Diagnosis and medication
DNA sequencing has delicate applications in screening for risk of genetic diseases, treatment based on gene therapy, genetic engineering and genetic manipulation.
The ability to sequence the entire genomes of many related organisms enables large-scale comparative genomics, phylogenetic and evolutionary studies.
3. Forensic science
DNA sequencing has a wide range of applications in DNA analysis, forensic sampling and identification, and paternity testing.
Shotgun sequencing of complex microbial communities, metagenomic sequencing of environmental or human microbiomes and environmental profiling.
Sequencing microbes to engineer resistance genes in crops. Mapping and Whole Genome Sequencing of Food Plants to Improve Productivity and Nutrient Composition and Environmental Tolerance.
6. Molecular biology
Research into genotypes, genes and proteins; gene-based cancer research; construction of endonuclease maps; detection of mutations; construction of molecular evolution maps and transcriptome maps.
In addition to the sequencing methods mentioned above, many methods are available with some changes in the protocol, but the basic principles are the same. Indeed, with potential applications in various fields of life sciences, genome sequencing has had a significant impact on modern biotechnology. In particular, high-throughput sequencing technology has become very beneficial in molecular biology research. With the advent of next-generation methods, sequencing has become uncomplicated in terms of process, data, time and economy.
Major genome sequencing methods are the clone-by-clone method and the whole genome shotgun sequencing. The clone-by-clone method of sequencing works well for larger genomes like eukaryotic genomes but it requires a high density genome map. Whole genome shotgun (WGS) sequencing does not require a genome map.What are the methods of genome sequencing? ›
Major genome sequencing methods are the clone-by-clone method and the whole genome shotgun sequencing. The clone-by-clone method of sequencing works well for larger genomes like eukaryotic genomes but it requires a high density genome map. Whole genome shotgun (WGS) sequencing does not require a genome map.What are the applications of genome sequencing? ›
Genome sequencing is used in analysing the factors that are involved in the conservation of species. For eg., the genetic diversity of a population can be used to predict the health and conservation of species.What is the best genome sequencing method? ›
NGS is a good choice for whole genome sequencing, whole exome sequencing, analyzing large panels of genes, detecting rare variants, and discovery and diagnostics.What are the 2 main methods of whole genome sequencing? ›
It mainly includes two methods: one is hierarchical shotgun sequencing (clone-by-clone method) and the other is whole genome shotgun sequencing. This method was once adopted by the HGP consortium. This method can generate high density maps, making the genome assembly easier.What sequencing methods are mostly used now? ›
Due to its sensitivity and relative simplicity in terms of both workflow and technique, Sanger sequencing remains the gold standard in sequencing technology today and is used in a variety of applications from targeted seqencing to confirming variants identified using orthogonal methods.How many sequencing methods are there? ›
There are two main types of DNA sequencing. The older, classical chain termination method is also called the Sanger method. Newer methods that can process a large number of DNA molecules quickly are collectively called High-Throughput Sequencing (HTS) techniques or Next-Generation Sequencing (NGS) methods.What is the most commonly known application of genomics? ›
Genomics is now being used in a wide variety of fields, such as metagenomics, pharmacogenomics, and mitochondrial genomics. The most commonly known application of genomics is to understand and find cures for diseases.What are 3 applications of the human genome project? ›
ADVERTISEMENTS: Some of the different fields where human genome project application is used are: (a) Molecular Medicine (b) Waste Control and Environmental Cleanup (c) Biotechnology (d) Energy Sources (e) Risk Assessment (f) DNA Forensics (Identification).What is the benefit of genome sequencing? ›
Whole-genome sequencing, pioneered by the Human Genome Project, enables us to read a person's individual genome and, among other things, identify differences from the average human genome.
A genome's sequence cannot be read out end-to-end. Rather, researchers must first determine the sequence of random pieces of DNA and then use those smaller sequences to put the whole genome sequence back together like a massive puzzle.What is genome sequencing in simple words? ›
(jeh-NOH-mik SEE-kwen-sing) A laboratory method that is used to determine the entire genetic makeup of a specific organism or cell type. This method can be used to find changes in areas of the genome. These changes may help scientists understand how specific diseases, such as cancer, form.What are the three types of DNA sequencing? ›
- Sanger sequencing. Researchers choose Sanger sequencing when performing low-throughput, targeted, or short-read sequencing. ...
- Capillary electrophoresis and fragment analysis. Capillary electrophoresis (CE) instruments are capable of performing both Sanger sequencing and fragment analysis. ...
- Next-generation sequencing (NGS)
First-generation methods enabled sequencing of clonal DNA populations. The second-generation massively increased throughput by parallelizing many reactions. Third-generation methods allow direct sequencing of single DNA molecules.What is the most advanced DNA sequencing method? ›
What is NGS? Next-generation sequencing (NGS) is a massively parallel sequencing technology that offers ultra-high throughput, scalability, and speed. The technology is used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA.What are the 4 principles of sequencing? ›
There are four common sequencing approaches in curriculum design, simple-to-complex, prerequisite learning, whole-to-part learning, and chronological learning. Simple-to-complex learning is self explanatory.Which is most widely used and accepted sequencing technology for genomics? ›
DNA sequencing may be used to determine the sequence of individual genes, larger genetic regions (i.e. clusters of genes or operons), full chromosomes, or entire genomes of any organism. DNA sequencing is also the most efficient way to indirectly sequence RNA or proteins (via their open reading frames).What are three practical applications of genetics and genomics? ›
People's genetic information is increasingly being used for a wide range of non-clinical purposes, such as solving crimes, determining paternity, and exploring one's ancestry.What are two applications for genome mapping? ›
Human genome maps help researchers in their efforts to identify human disease-causing genes related to illnesses like cancer, heart disease, and cystic fibrosis. Genome mapping can be used in a variety of other applications, such as using live microbes to clean up pollutants or even prevent pollution.How did scientists use genome sequencing techniques? ›
Sequencing a genome helps scientists better understand genomes and their role in creating organisms. When scientists sequence a genome, they take an organism's DNA and determine the order of its base pairs, which are coded by the letters A, C, T and G.
The Human Genome Project and beyond
Many genes related to hereditary diseases have been mapped, paving the way for the development of new diagnostic methods and treatments, as well as new research to establish the genetic mechanisms involved in certain diseases.
Read accuracy is the inherent error rate of individual measurements (reads) from a DNA sequencing technology. Typical read accuracy ranges from ~90% for traditional long reads to >99% for short reads and HiFi reads.What are the downsides of genome sequencing? ›
The biggest disadvantage of whole genome sequencing (WGS) is that the process generates data on a large scale. The vast volumes of data generated requires additional storage capacity and more time to analyze. This increases the cost as well as the time required for analysis.What does genome sequencing allow us to study? ›
Genome sequencing allows scientists to see a patient's complete DNA makeup, which contains information about everything from eye color to inherited diseases.Why is genomics not enough? ›
Because of the number of genes in any particular genome can number in the tens of thousands, it is impractical to use human scientists to comb through the data to put a function to every single one of them.What is the most challenging issue facing genome sequence? ›
the inability to develop fast and accurate sequencing techniques.What can go wrong with sequencing? ›
- The template concentration is too low. This is the number one reason as to why a sequence reaction fails. ...
- Poor quality DNA. ...
- Too much DNA. ...
- Bad primer or incorrect primer added to template. ...
- Blocked capillary on the sequencer.
Sequencing is a technique that is used to 'read' DNA. It finds the order of the letters of DNA (A, T, C and G), one by one. Sequencing a human genome means finding the sequence of someone's unique 3 billion letters of DNA. There are different methods and machines that can sequence genomes.Is DNA and genome sequencing the same thing? ›
Genome sequencing is figuring out the order of DNA nucleotides, or bases, in a genome—the order of As, Cs, Gs, and Ts that make up an organism's DNA. The human genome is made up of over 3 billion of these genetic letters.What is the difference between genome and DNA? ›
A genome is all of the genetic material in an organism. It is made of DNA (or RNA in some viruses) and includes genes and other elements that control the activity of those genes.
Whole genome sequencing has become a powerful tool in modern microbiology. Especially bacterial genomes are sequenced in high numbers. Whole genome sequencing is not only used in research projects, but also in surveillance projects and outbreak investigations.What are some fun facts about DNA sequencing? ›
Now, enjoy these amazing facts about DNA sequencing: 1- Though DNA sequencing used to take years, it can now be done in hours. Further, the first full sequence of human DNA used to take around 3 billion dollars. Now, certain companies will sequence your entire genome for less than $1,000.What is the difference between DNA sequencing and RNA sequencing? ›
Affordability and Speed
RNA-seq is similar to DNA sequencing but with an added step. Instead of isolating DNA, RNA is extracted from a sample and then reverse transcribed to produce cDNA. From there, the cDNA is fragmented and run through a high-throughput next generation sequencing system.
Researchers have created a new DNA sequencing technique called Chem-map, which enables researchers to perform in situ mapping of small molecule-genome interactions with unparalleled precision.What is the difference between whole genome sequencing and NGS? ›
The critical difference between Sanger sequencing and NGS is sequencing volume. While the Sanger method only sequences a single DNA fragment at a time, NGS is massively parallel, sequencing millions of fragments simultaneously per run. This process translates into sequencing hundreds to thousands of genes at one time.What is the principle for DNA sequencing? ›
This method is based on the principle that single-stranded DNA molecules that differ in length by just a single nucleotide can be separated from one another using polyacrylamide gel electrophoresis, described earlier. One dideoxynucleotide, either ddG, ddA, ddC, or ddT.Why is Illumina better than Sanger sequencing? ›
The critical difference between Sanger sequencing and NGS is sequencing volume. While the Sanger method only sequences a single DNA fragment at a time, NGS is massively parallel, sequencing millions of fragments simultaneously per run. This process translates into sequencing hundreds to thousands of genes at one time.Which is better Illumina sequencing or Oxford Nanopore? ›
Illumina sequencers tend to be high accuracy with a read accuracy of >99.9% while Oxford Nanopore provides sequencers with a read accuracy of between 87% and 98%4. Of course, the cost usually plays an important role in any purchasing decision.Is NGS more accurate than Sanger? ›
NGS is significantly cheaper, quicker, needs significantly less DNA and is more accurate and reliable than Sanger sequencing. Let us look at this more closely. For Sanger sequencing, a large amount of template DNA is needed for each read.
Researchers have created a new DNA sequencing technique called Chem-map, which enables researchers to perform in situ mapping of small molecule-genome interactions with unparalleled precision.
Overall, the USA, Iceland, the Netherlands, the UK, and Australia showed great performance in the three indicators for sequencing efforts. The number of SARS-CoV-2 genomic sequences deposited in the GISAID database has been substantially increasing day by day.What are the drawbacks of Illumina sequencing? ›
Disadvantages of illumina sequencing
One of the main drawbacks of the Illumina/Solexa platform is the high requirement for sample loading control because overloading can result in overlapping clusters and poor sequencing quality which results the overall error rate of this sequencing technology is about 1% [22,23].
Limitations of Sanger Sequencing
Sanger methods can only sequence short pieces of DNA--about 300 to 1000 base pairs. The quality of a Sanger sequence is often not very good in the first 15 to 40 bases because that is where the primer binds. Sequence quality degrades after 700 to 900 bases.
The biggest disadvantage of whole genome sequencing (WGS) is that the process generates data on a large scale. The vast volumes of data generated requires additional storage capacity and more time to analyze. This increases the cost as well as the time required for analysis.Why is PacBio better than Illumina? ›
Long-Read vs Short-Read Technology
Though the sequencing principles vary slightly between these two sequencing approaches, the main difference between Illumina and PacBio is that Illumina specializes in short-read raw sequence data while PacBio focuses on long-read raw sequence data.
Between 80-90% of the sequencing market is dominated by one company, Illumina. The short-read sequencing specialists have had an annual turnover of $2.5-4.5 billion and produce a wide array of machines that are used by thousands of labs around the world.Who are the competitors of Illumina sequencing? ›
Illumina competitors include Agilent Technologies, QIAGEN, Sanofi Pasteur, Danaher and ARCA biopharma. Illumina ranks 1st in Overall Culture Score on Comparably vs its competitors.What are the limitations of next-generation sequencing? ›
For many of the identified abnormalities, the clinical significance is currently unknown. Next-generation sequencing also requires sophisticated bioinformatics systems, fast data processing and large data storage capabilities, which can be costly.What is the difference between whole exome sequencing and whole genome sequencing? ›
What is the Difference Between the Whole Genome and the Exome? The complete genomic information within a sample or individual is known as the whole genome. Exons are the genome's protein-coding regions and are collectively known as the exome.How accurate is whole genome sequencing? ›
Typical read accuracy ranges from ~90% for traditional long reads to >99% for short reads and HiFi reads. Consensus accuracy, on the other hand, is determined by combining information from multiple reads in a data set, which eliminates any random errors in individual reads.