Sunday, July 5, 2026
SAVED POSTS
  • Login
  • Register
RathBiotaClan
No Result
View All Result
  • HOME
  • HEALTH SCIENCE

    TRENDING ON HEALTH (TOP)

    Fick Method Underestimates Heart Problems in Children After Heart Transplant, Study Finds

    For Women on Antidepressants, Creatine Showed a Possible Extra Boost

    Did the iPhone Quietly Reshape When and Whether Americans Have Children?

    For People Antidepressants Never Helped, a 30-Minute Home Session Is Now FDA-Approved

    NOW ON AIR (RBC)

    Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment
    BIOINFORMATICS

    Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment

    July 4, 2026
    BIOINFORMATICS

    Gaps in Sequence Alignment and Their Role in cDNA Matching

    July 4, 2026
    Local Alignment: Finding Substrings of High Similarity
    BIOINFORMATICS

    Local Alignment: Finding Substrings of High Similarity | Notes

    July 4, 2026
    BIOINFORMATICS

    Weighted Edit Distance Explained: Operation-Weight & Alphabet-Weight (Notes)

    July 4, 2026
  • NEUROSCIENCE
    • PHYSIOLOGY
    • IMMUNOLOGY
    • CANCER
  • DISCOVERIES
    • SPOTLIGHTS
    • STUDENT PORTAL
    • SCIENCE FEATURED
  • MOLECULAR BIOLOGY
    • GENETICS
    • BIOTECHNOLOGY
    • BIOINFORMATICS
    • BIOCHEMISTRY
    • BIOPHYSICS
  • ZOOLOGY & ECOLOGY
    • ENVIRONMENTAL SCIENCE
    • ECOLOGY
    • EVOLUTION
  • MICRO & PLANT SCIENCE
    • MICROBIOLOGY
    • CELL BIOLOGY
    • DEVELOPMENTAL BIOLOGY
  • PSYCHOLOGY
RathBiotaClan
RathBiotaClan
No Result
View All Result
Home BIOINFORMATICS

Gaps in Sequence Alignment and Their Role in cDNA Matching

Shibasis Rath by Shibasis Rath
July 4, 2026
in BIOINFORMATICS, STUDENT PORTAL
Reading Time: 9 mins read
0
A A
0

In the study of string alignment and dynamic programming, the basic elements used to evaluate an alignment are matches, mismatches, and spaces. However, these alone are not sufficient to produce alignments that are biologically meaningful. To overcome this limitation, the concept of a gap is introduced. Gaps allow an alignment to better reflect real biological mutational events and help produce alignments whose structure matches patterns actually observed in living organisms, particularly in DNA and protein sequences.

Definition of a Gap

A gap is defined as any maximal, consecutive run of spaces occurring in a single string of a given alignment. In simple words, when several spaces appear one after another (without interruption) in one string of the alignment, that entire stretch is treated as one single gap, rather than being counted as several individual spaces.

Certain rules apply while identifying a gap:

  • If a gap begins before the start of a string, it is bordered on its right side by the first character of that string.
  • If a gap begins after the end of a string, it is bordered on its left side by the last character of that string.
  • In all other cases, a gap must be bordered on both sides by actual characters of the string.
  • A gap can be as small as a single space, or it can extend to cover many consecutive spaces.

It is important to note that if the last space of one string happens to align with a space in the other string, these are two separate gaps (one in each string) and are not merged into a single gap, since they occur in different strings.

ADVERTISEMENT

Gaps in the Objective Function

In the simplest scoring scheme that accounts for gaps, each gap is given a constant weight, denoted Wg, regardless of how many spaces it contains. This means that within a gap, each individual space is treated as “free” — that is, the score contribution of aligning a character with a space, or a space with a character, is taken as zero. The total value of an alignment is then calculated by adding a term for the number of gaps (k) multiplied by Wg, along with the usual terms for matches and mismatches.

READ ALSO

Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment

Local Alignment: Finding Substrings of High Similarity | Notes

The value of Wg has a direct influence on the shape of the optimal alignment:

ADVERTISEMENT
  • A large value of Wg discourages the formation of many gaps, so the alignment tends to have long, unbroken aligned regions.
  • A smaller value of Wg allows the alignment to be broken into more, shorter fragments, permitting a more fragmented alignment.

Thus, by tuning Wg, one can control how spaces are distributed across the alignment, which in turn determines whether the alignment appears as a few long blocks or many small scattered pieces.

ADVERTISEMENT

Why Gaps Are Biologically Important

The biological reasoning behind introducing gaps is closely connected to the justification for local alignment. A gap of one string opposite a substring of the other string corresponds to either a deletion of that substring from the first string, or an insertion of that substring into the second string. Such insertion or deletion of an entire substring, especially in DNA, typically happens as the result of a single mutational event, rather than through many small individual mutations.

Several biological mechanisms are responsible for creating gaps of varying sizes:

  1. Unequal crossing-over during meiosis – causes an insertion in one strand and a corresponding deletion in the other.
  2. DNA slippage during replication – occurs when the replication machinery loses its position on the template, slips backward, and repeats a section, producing a duplicated stretch.
  3. Insertion of transposable elements (“jumping genes”) – these mobile genetic elements insert themselves into DNA strings.
  4. Insertion of DNA by retroviruses – viral genetic material becomes inserted into the host DNA.
  5. Translocation of DNA between chromosomes – DNA segments move from one chromosome to another.

These mechanisms explain why long insertions and deletions, appearing as gaps, are common biological phenomena and must be modeled explicitly rather than treated as a series of unrelated single-space mutations.

Gaps and Evolutionary History

When determining the evolutionary relationship between species over long time periods, gaps often carry more informative value than simple substitutions. This is because point mutations (single character substitutions) occur very frequently and rapidly, whereas mutational events that create gaps occur much less frequently. As a result, corresponding genes in two different species may show large differences at the level of individual character substitutions, making it difficult to establish evolutionary relationships using substitution-based similarity alone. However, since large insertions and deletions are rarer events, shared gaps in aligned strings can serve as reliable markers for reconstructing evolutionary history and are sometimes used as “evolutionary characters” while building evolutionary trees.

Gaps and Protein Domains

At the protein level, many proteins are built from combinations of protein domains selected from a limited repertoire. Consequently, two protein sequences may be highly similar over certain stretches but differ where one protein contains a domain absent in the other. Such a region, where one protein has an extra domain, naturally appears as a gap when the two proteins are aligned. In fact, many biologists regard the correct identification of these long (“major”) gaps as the central challenge of protein alignment — once major gaps are correctly placed, the remaining alignment (reflecting ordinary point mutations) becomes relatively straightforward.

cDNA Matching: An Illustration of Gap Importance

Biological Background

In eukaryotic organisms, a gene consists of alternating exons (expressed, protein-coding sequences) and introns(intervening, non-coding sequences). While the number of exons/introns per gene is usually modest, introns can be far longer than exons.

The process of protein synthesis from a eukaryotic gene proceeds as follows:

  • An RNA transcript is first produced from the gene’s DNA, covering both introns and exons, with base complementation (A→U, T→A, C→G, G→C).
  • The intron-exon boundaries are identified, and introns are spliced out of the transcript.
  • The remaining exon sequences are joined together to form messenger RNA (mRNA), which leaves the nucleus and directs protein synthesis.

Since only a fraction of genes are expressed in any specific cell type, capturing the mRNA of a particular cell allows researchers to determine which genes are active. This captured mRNA is converted into a complementary DNA strand known as cDNA, which unlike the original gene contains only the exon sequences (introns removed). Building comprehensive cDNA libraries for various cell types formed a major component of the Human Genome Project and also led to disputes regarding patenting of cDNA sequences.

The Matching Problem

Once cDNA is obtained, the challenge is to locate the corresponding gene within a long, sequenced stretch of genomic DNA. This becomes a string alignment problem: aligning the shorter cDNA string against the longer genomic DNA string in a manner that correctly reveals the locations of the exons. Since the cDNA lacks introns, the expected alignment consists of a few short regions of very high similarity (matching exons) separated by long gaps (corresponding to introns). Some mismatches and spaces may occur within the matching regions due to sequencing errors, but these should form only a small percentage of each region.

Why an Objective Function Without Gaps Fails

If the scoring scheme includes only matches, mismatches, and spaces (with no explicit gap term), the resulting alignment does not correctly capture the exon structure. The reasoning is as follows:

  • A low space penalty is required so that the alignment is not forced to keep the cDNA compressed together, since long intron gaps must be permitted.
  • A high mismatch penalty is required, since only a small number of mismatches (due to sequencing errors) are expected.
  • Given these settings, but with no gap term, the optimal alignment tends to become simply the longest common subsequence (LCS) between the cDNA and the long genomic string.
  • Since DNA has only four possible characters and introns are long, this LCS is likely to match nearly all characters of the cDNA, producing a higher score than the true, biologically correct exon-matching alignment (which would leave a few mismatches due to sequencing errors).
  • However, this LCS-based alignment scatters the cDNA thinly across the entire genomic sequence instead of correctly picking out the compact exon regions, thereby failing to reflect true biology.

The Solution: Adding a Gap Term

By introducing a constant gap weight Wg into the objective function for each gap present, and tuning Wg appropriately, the optimal alignment can be made to correctly cut the cDNA into segments matching its exons in the longer DNA sequence. This demonstrates the practical necessity of the gap concept for solving real biological alignment problems.

Processed Pseudogenes

A more complex variant of this problem involves pseudogenes — near-copies of functional genes that have mutated enough to lose their function, and which are common in eukaryotes. Pseudogenes are believed to arise through gene duplication followed by mutation, representing either failed trial genes or potential future genes. A pseudogene typically retains both introns and exons of its original gene and may be located far from, or even on a different chromosome than, its parent gene.

A more specific and interesting case is the processed pseudogene, which contains only exon sequences (introns already removed), formed when an mRNA is reverse-transcribed back into DNA (via the enzyme Reverse Transcriptase) and randomly inserted into the genome. Locating such processed pseudogenes is similar to, but more difficult than, cDNA matching, since the actual cDNA sequence is not available in hand; it requires repeat-finding methods, local alignment techniques, and careful gap-weight selection.

Caveat: Alternative Practical Approach

Although the gap-weighted alignment model is theoretically important, in practice, cDNA and pseudogene matching problems are often approached using local alignment without explicit gap weighting. Local alignment algorithms can identify multiple highly similar substring pairs (not just the single best pair). In the cDNA/pseudogene context, these individual highly similar pairs typically correspond to the exons themselves, and the complete match between cDNA and the gene can be reconstructed by piecing together several non-overlapping local alignments. This method is, in fact, the more commonly used approach in practical applications.

Conclusion

The introduction of gaps as a distinct construct in sequence alignment allows the alignment model to reflect real biological mutational events such as insertions, deletions, transposon activity, and DNA slippage, which typically occur as single events affecting long stretches of sequence rather than as many independent single-character changes. By assigning an appropriate gap weight (Wg) in the objective function, alignments can be guided to correctly represent structures such as exon-intron boundaries in cDNA matching, evolutionary relationships through shared gaps, and protein domain differences. This makes the concept of gaps essential for producing alignments that are both computationally optimal and biologically realistic.

  • Share on WhatsApp (Opens in new window) WhatsApp
  • Share on Facebook (Opens in new window) Facebook
  • Share on Reddit (Opens in new window) Reddit
  • Share on X (Opens in new window) X
  • Print (Opens in new window) Print
Shibasis Rath

Shibasis Rath

"𝓒𝓸𝓷𝓷𝓮𝓬𝓽𝓲𝓷𝓰 𝓡𝓮𝓼𝓮𝓪𝓻𝓬𝓱 𝓣𝓸 𝓡𝓮𝓪𝓵𝓲𝓽𝔂" 𝓲𝓼𝓷'𝓽 𝓙𝓾𝓼𝓽 𝓪 𝓜𝓸𝓽𝓽𝓸 - 𝓘𝓽'𝓼 𝓜𝔂 𝓜𝓲𝓼𝓼𝓲𝓸𝓷

Related Posts

Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment
BIOINFORMATICS

Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment

July 4, 2026
Local Alignment: Finding Substrings of High Similarity
BIOINFORMATICS

Local Alignment: Finding Substrings of High Similarity | Notes

July 4, 2026
Weighted Edit Distance Explained: Operation-Weight & Alphabet-Weight (Notes)
BIOINFORMATICS

Weighted Edit Distance Explained: Operation-Weight & Alphabet-Weight (Notes)

July 4, 2026

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

I agree to the Terms & Conditions and Privacy Policy.

POPULAR NEWS

Chewing gum releases thousands of microplastic particles directly into your mouth with every piece you chew

Chewing gum releases thousands of microplastic particles directly into your mouth with every piece you chew

by Shibasis Rath
May 8, 2026
0

Microplastics are turning up in places researchers never expected: deep-sea sediments, Arctic ice, and human blood. Now, a UCLA pilot...

woman in white tank top lying on bed

New Studys Says Gen Z is the least sexually active young cohort in modern recorded history

by Shibasis Rath
January 24, 2026
0

A generation that grew up with dating apps in their pockets, pornography a tap away, and sex discussed more openly...

grayscale photo of girl in polka dot long sleeve shirt

Yelling Isn’t Just Yelling: How a Hostile Home Rewires a Child’s Brain for Constant Alert

by Shibasis Rath
March 8, 2026
0

To a parent in the heat of the moment, a raised voice may feel like simple frustration. To a child...

a group of gen Z kids walking down a street

Is Gen Z the First Generation Less Intelligent Than Their Parents?

by Shibasis Rath
February 5, 2026
0

Gen Z intelligence decline is emerging as a serious concern among neuroscientists and education researchers. For over a century, each...

72-Hour Fasting Can Reset Your Entire Immune System, USC Study Shows

72-Hour Fasting Can Reset Your Entire Immune System, USC Study Shows

by Shibasis Rath
February 28, 2026
0

A 72-hour fast can trigger a powerful immune system reset. Scientists call this stem cell regeneration. The process clears old...

EDITOR CHOICE‘S

  • All
  • NEWS
  • SPOTLIGHTS
Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment

Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment

by Shibasis Rath
July 4, 2026
0

In alignment problems, the use of gaps in the objective function helps in finding alignments that satisfy an expected shape,...

Gaps in Sequence Alignment and Their Role in cDNA Matching

Gaps in Sequence Alignment and Their Role in cDNA Matching

by Shibasis Rath
July 4, 2026
0

In the study of string alignment and dynamic programming, the basic elements used to evaluate an alignment are matches, mismatches,...

Local Alignment: Finding Substrings of High Similarity

Local Alignment: Finding Substrings of High Similarity | Notes

by Shibasis Rath
July 4, 2026
0

Sequence comparison is one of the fundamental problems in computational biology and string algorithms. While global alignment compares two strings...

Weighted Edit Distance Explained: Operation-Weight & Alphabet-Weight (Notes)

Weighted Edit Distance Explained: Operation-Weight & Alphabet-Weight (Notes)

by Shibasis Rath
July 4, 2026
0

Edit distance measures how many insertions, deletions, and substitutions are required to transform one string into another. The basic edit...

ADVERTISEMENT

RathBiotaClan – RBC

RathBiotaClan – Connecting Research To Reality

Your trusted source for life science news, biology research & discoveries. Covering neuroscience, genetics, ecology, and more — connecting research to reality.

About Us

Privacy Policies

Contact Us

Editorial Standard

Latest Posts

  • Choices for Gap Weights: Constant, Affine & Arbitrary Gap Weights in Sequence Alignment
  • Gaps in Sequence Alignment and Their Role in cDNA Matching
  • Local Alignment: Finding Substrings of High Similarity | Notes
  • Weighted Edit Distance Explained: Operation-Weight & Alphabet-Weight (Notes)

SHIBASIS RATH

Contact Mail

rathbiotaclan@gmail.com

No Result
View All Result
MSME (Udyam) Certified Science Platform
Govt. of India

Get Us On PlayStore

playstore app for rathbiotaclan
  • About Us
  • Advertise With Us
  • Cancellation and Refund Policy
  • Contact Us
  • Contribute
  • Editorial Standards
  • Home
  • Pricing Details
  • Privacy Policies
  • Shipping Policy
  • Terms & Conditions

© 2026 RathBiotaClan. All rights reserved.

Welcome Back!

Sign In with Google
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Google
OR

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

No Result
View All Result
  • HOME
  • HEALTH SCIENCE
  • NEUROSCIENCE
    • PHYSIOLOGY
    • IMMUNOLOGY
    • CANCER
  • DISCOVERIES
    • SPOTLIGHTS
    • STUDENT PORTAL
    • SCIENCE FEATURED
  • MOLECULAR BIOLOGY
    • GENETICS
    • BIOTECHNOLOGY
    • BIOINFORMATICS
    • BIOCHEMISTRY
    • BIOPHYSICS
  • ZOOLOGY & ECOLOGY
    • ENVIRONMENTAL SCIENCE
    • ECOLOGY
    • EVOLUTION
  • MICRO & PLANT SCIENCE
    • MICROBIOLOGY
    • CELL BIOLOGY
    • DEVELOPMENTAL BIOLOGY
  • PSYCHOLOGY
  • Login
  • Sign Up
SAVED POSTS

© 2026 RathBiotaClan. All rights reserved.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.