This page is part of the GeneWarrior Documentation. Go to the main site of GeneWarrior

What is an Alignment?

See also how to create an alignment..

Homologous (related) sequences are aligned in such a way that letters (nucleotides for DNA or amino acids in the case of proteins) which are listed in the same column are thought to originate from the same ancestral letter.
If all letters in a column are identical, this position is thought to be conserved. If the letters differ then it is assumed that somewhere along evolutionary time span a change (mutation) has taken place.
Because homologous sequences are rarely of the same length, it is necessary to induce gaps into the alignment, so that the following columns are correctly aligned.
Example:
Sequence1 MCQFHRYM
Sequence2 M-QFHRYM
Sequence3 M-QFHRDM

The alignment consists of 8 columns, and with the exception of column #2 and #7, all columns are conserved.
Column #2 contains gaps because only Sequence1 contains an aspartic acid (D) at this position. From an evolutionary perspective this could either be explained by the ancestral sequence having an aspartic acid at this position, but Sequence2 and Sequence3 having lost it; or the ancestral sequence not having the aspartic acid and Sequence1 has gained it during the course of evolution.
Column #7 on the other hand contains no gaps, but the position is also not conserved. Sequence1 and Sequence2 show a tyrosine (Y) at this position, whereas Sequence3 shows an aspartic acid (D). One can either hypothesize that the ancestral sequence contained a tyrosine and Sequence3 suffered a mutation event or that the ancestral sequence contained an aspartic acid and Sequence1 and Sequence2 are the mutated sequences.

Substitution Matrix

If we have following two sequences (note the differences in red):
Sequence1 TLEVEPS
Sequence2 TDVEPS

and we try to align them by hand, we would get two possible alignments:
Alignment 1:
Sequence1 TLEVEPS
Sequence2 TD-VEPS

Alignment 2:
Sequence1 TLEVEPS
Sequence2 T-DVEPS


How do we decide which of those two possibilities to use?
The answer lies in the substitution matrices. A Substitution matrix is a table which scores the possibilities of mutations (substitutions). For example a substitution L to D has a score of -4 in the BLOSUM62-Matrix and a substitution from E to D has a score of +2.
The substitution score describes the probability of such a mutation occuring. The higher the score the more likely such a mutation is. In the example of the first column (T remains T) this event has a score of +5; as we could have imagined it is thus more likely that no mutation occurs (higher score) then that a mutation occurs (lower score).
Gaps are scored in a similar fashion (i.e. possibility that a amino acid vanishes).
Scores are calculated by aligning many sequences by hand and calculating the possibilities of all possible substitutions. The Blosum62-Matrix can be viewed on the NCBI website.
In the example above we would decide to use Alignment 2, since E to D has a higher score (has a higher probability) than L to D.

Type of alignments

There are two main types of alignments, global and local alignments.
Example:
Sequence1 GVTQLNRLAA
Sequence2 DTQLRRLCDA

Global Alignment
GVTQLNRLA-A
-DTQLRRLCDA

Local Alignment
TQLNRL
TQLRRL


Global alignments try to align the entire sequence, whereas local alignments only output the part of the sequence which gives the best (highest scoring) alignment (i.e. the region which has the highest similarity).
A special case is the cost-free end (CFE) alignment. This is essentially a global alignment, with the difference that gaps are free at the start and the end of the alignment. This allows to resolve overlapping alignments.

Cost-free End (CFE) Alignment
LRMETRELNYGRL-------
--------NYGRLQNQLAKK

GeneWarrior let's you choose between Local alignments (named "Highest similarity region" in the toolbox) and CFE alignments (named "Full sequences") for pairwise alignments (alignment of two sequences).
For multiple sequence alignment (MSA) a third-party software is used (MUSCLE), which performs a mix of Global and Local alignment.

See the Tutorial on how to create an alignment.

Back to Documentation index