The race to improve DNA sequencing

Economy

Published on February 28, 2023

Cost, availability and sample preservation are the main concerns but progress is rapid.

Key advantages of long-read sequencing make it the technology of choice for many new scientific projects. : DNA sequence by ssalonso is available at https://bit.ly/3YaOmDL CC BY-NC-SA 2.0

Authors Giulio Formenti
Rockefeller University, New York

Editors S. Vicknesan
Senior Commissioning Editor, 360info Southeast Asia

DOI 10.54377/faba-ccff

Cost, availability and sample preservation are the main concerns but progress is rapid.

As we celebrate the 70th anniversary of the discovery of the DNA structure, it is astounding to look back at the progress made in our ability to read the ‘book of life’, the DNA sequence that provides the instructions to make all living beings.

The most recent and promising advancement is certainly represented by long-read DNA sequencing technologies. Long reads started to be developed around 2010, as it was progressively acknowledged that read length is an issue to reconstruct and study genomes.

Using long reads we can now basically reconstruct entire human genomes almost without errors and prior information, as demonstrated by the completed ‘telomere-to-telomere’ of the first human genome last year.

We can confidently make inferences in complex regions of the genome that were previously inaccessible to investigation, and as such often described as dark matter.

These regions can potentially harbour genes associated with human diseases that were previously described as having a genetic basis but such a genetic basis could not be found.

Owing to the specific characteristics of how long-read sequencing technologies read DNA, they also provide additional information that was not available with the previous short-read methods.

For instance, they allow us to immediately describe the ‘methylome‘ — how the DNA is modified in ways that are cell-specific. Methylation patterns are often caused by our lifestyle, and can also be associated with specific diseases.

How long-read sequencing works

Even after the DNA structure was elucidated, it took several years before methods to actually read nucleic acid sequences became available.

The first such attempt was published in 1965, and sequencing 76 nucleotides (DNA building blocks) required five people working three years with one gram of pure material isolated from 140kg of yeast.

More efficient approaches were clearly needed, and much of the progress in the field is owed to Frederick Sanger, a pioneer in reading the sequences of complex biological molecules.

Sanger invented the first method to read protein sequences in 1953, the same year the DNA structure was discovered. This was a landmark for DNA sequencing as well, as it established the general principle of ‘shotgun sequencing’, which is still what we use today to reconstruct the genome of any living being.

Since no technology can read an entire chromosome end-to-end in a single pass, in shotgun sequencing multiple copies of the same chromosome are first fragmented into smaller pieces at random positions, then — like in a puzzle — overlaps between fragments are used to reconstruct the original sequence.

This approach is much less laborious than earlier methods and therefore can scale to thousands, and nowadays to billions of sequences at once.

It does not come without disadvantages, particularly the length of the sequences that can be generated when scaling to millions or billions of reads is often in the order of only 150 nucleotides.

This is a significant limitation because, for instance, the human genome is over three billion nucleotides and approximately 50 percent of its sequence is repeated more than once.

When a sequence is repeated, the overlaps between the fragments generated by sequencing are not unique, limiting our ability to use this information to reconstruct the original sequence of chromosomes.

This results in many gaps and errors in the sequences generated, ultimately confounding all downstream analyses. This is where long-read sequencing technologies come into play.

In long-read sequencing, large DNA fragments, usually in the order of 10,000 or 20,000 nucleotides, often even longer than hundred thousands nucleotides and sometimes even over a million nucleotides, are read by sequencing machines at once.

Increasing read length by several orders of magnitude essentially resolves the repeat problem associated with ‘short-read sequencing‘ allowing entire chromosomes to be reconstructed with minimal computational effort.

Advantages, challenges of long-read sequencing

While all the important advantages should make long-read the technology of choice for most genome projects, they are hampered by the higher cost and the relatively limited availability of long-read sequencing machines around the world.

However, sequencing machine companies are releasing new instruments with increased throughput and reduced cost every year, making this less of an issue as we transition to long-read sequencing.

One standing challenge is the sequencing errors that are still present in the sequencing reads, although the technologies are getting better.

A challenge that will be harder to overcome is the quality of the starting material. To generate long reads, the DNA material needs to be relatively intact to begin with.

DNA, while being one of the most long-living biological molecules, can still degrade if preservation conditions are not ideal. We need to rethink how we preserve and store biological samples.

DNA sequencing is probably the domain in the biological sciences that evolved the most in the last century. This is clearly due to our interest, as human beings, to understand the process of our making, our origins and our destiny.

It will only progress, making this century of discoveries at least as exciting as the past.

Giulio Formenti is Research Assistant Professor at the Rockefeller University in New York. He declares no conflict of interest.

Originally published under Creative Commons by 360info™.

Enjoy this article? Sign up for our fortnightly newsletter

Are you a journalist? Sign up for our wire service

Genomic sequencing, which reads the baby’s DNA, has the potential to look at many more conditions than current newborn screening. : Jonathan Borba Unsplash Licence

Compiling and storing the DNA of newborns could be a massive boon for public health. But the ethical questions are just as big.

Gene screen

The pros and cons of storing babies’ DNA

Genetically, the signs of Down syndrome are the same, but people with it have widely varying levels of independence. : Carlos Palácio CC 0

With prenatal genetic screening now routine, it’s relevant to ask whether prospective parents or scientists understand enough about the results.

Gene screen

The ethical minefield of screening for disabilities

New sequencing technologies bring valuable benefits but equitable access may be a challenge. : ‘Strawberries at Out Standing in the Field’ by SanFranAnnie is available at https://bit.ly/41mGl16 CC BY-SA 2.0

Advances in the field are revolutionary but require safeguards to curb misuse.

Gene screen

DNA could ensure strawberry fields forever

Although individually very rare, collectively, rare diseases affect huge numbers of people. : James Cridland CC BY 2.0

‘Rare’ diseases are anything but. And with cheaper DNA sequencing, patients’ lives will be transformed.

Gene screen

Why life is about to get better for patients with rare diseases

Science can read the DNA code, or ‘sequence’ DNA, faster and cheaper and create more raw data than ever before. But how do we interpret it? : Micah Baldwin CCBY2.0

Collecting DNA data is not much help without the key to understanding what it all means. And that key is centralised health records.

Gene screen

DNA data offers benefits many of us will never realise

The darker aspects of genome science were evoked in The X Men films such as Days of the Future Past. : Elliott Brown CC BY-SA 2.0

Unless people are brought into conversations about genetic discrimination dystopian scenarios may come to pass.

Gene screen

Genetic discrimination and the ghost of the future past

A new technique in sequencing DNA will help poor countries identifying emerging diseases : Direct Relief is available at https://bit.ly/3EAFG2r CC BY 2.0

Cheap, portable sequencers in poor countries could be key to identifying and halting emerging diseases before they have a chance to become a pandemic.

Gene screen

This could stop pandemics in their tracks

DNA sequencing has become cheaper and widely available, raising challenging questions for scientists and ethicists. : Michael Joiner, 360info CC BY 4.0

Cheaper DNA sequencing promises a world of health benefits. But to realise these, careful and sensitive storage of DNA data is vital.

Special Report

About Us

More

Stay up to date

Use + Remix

The race to improve DNA sequencing

How long-read sequencing works

Advantages, challenges of long-read sequencing

Special Report Articles

The pros and cons of storing babies’ DNA

The ethical minefield of screening for disabilities

DNA could ensure strawberry fields forever

Why life is about to get better for patients with rare diseases

DNA data offers benefits many of us will never realise

Genetic discrimination and the ghost of the future past

This could stop pandemics in their tracks

Cheaper gene screening makes for DNA data dilemma

Related content

This could stop pandemics in their tracks

The ethical minefield of screening for disabilities

Cheaper gene screening makes for DNA data dilemma

Editor's picks

Bringing up baby – now with robots

India’s electronic voting machines have a trust problem

If AI steals our jobs, who’ll be left to buy stuff?

About

Our Policies

For journalists & newsrooms

For researchers

For non-media organisations