BLAST or FASTA searching is not covered in this tutorial because they are not currently part of main EMBOSS package (although interfaces are available); these searches are offered at many web sites worldwide. Database searches are an important part of the bioinformatician's arsenal. When you screen a new sequence against a database of known sequences, you are trying to answer the following questions:
Is there any protein of known structure that has sufficient similarity to the sequence of the unknown protein to suggest a familial relationship?
If not, which sequence of any known proteins is most similar to the sequence of the unknown protein?
If you can identify a relationship to a protein of known structure, it is possible to infer that the new protein shares a common structure with its relative and to assign its general fold. However, what if the homologue has no known structure? If its function has been identified then you might expect the unknown protein to have a similar or related function. However, many exceptions do exist. A classic example is lysozyme, which shares around 50% sequence identity and 70% sequence similarity with alpha-lactalbumin. The two proteins also share similar folds, but their functions are entirely different: the two key catalytic residues of lysozyme are not conserved in alpha-lactalbumin, and the acidic calcium binding motif important to the function of alpha-lactalbumin is not present in most lysoszymes. It is essential that, where possible, you confirm any computer based predictions with benchwork.
What can you do if sequence similarity alone does not satisfactorily identify a relative? Next are shown a few more applications that can help you predict the function of your sequence.
In a number of cases, the active site of a protein can be recognized by a specific `fingerprint' or `template', a fairly small set of residues that are unique to a family of proteins. An example is the sequence GXGXXG
(where G
=glycine and X
=any amino acid) which defines a GTP binding site. Searching for a (rather loose) predefined string of characters in a sequence is called 'pattern matching'.
The EMBOSS program patmatmotifs looks for sequence motifs by searching with a pattern search algorithm through the given protein sequence for the patterns defined in the PROSITE database. PROSITE is a database of protein families and domains, based on the observation that, while there are a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor.
%
patmatmotifs
Search a motif database with a protein sequence Input protein sequence:L07770.pep
Output report [l07770_1.patmatmotifs]:L07770.patmatmotifs
%
more L07770.patmatmotifs
######################################## # Program: patmatmotifs # Rundate: Wed 18 Feb 2009 14:58:32 # Commandline: patmatmotifs # -sequence L07770.pep # Report_format: dbmotif # Report_file: l07770_1.patmatmotifs ######################################## #======================================= # # Sequence: L07770_1 from: 1 to: 354 # HitCount: 2 # # Full: No # Prune: Yes # Data_file: /m2/emboss/emboss/emboss/data/PROSITE/prosite.lines # #======================================= Length = 17 Start = position 123 of sequence End = position 139 of sequence Motif = G_PROTEIN_RECEP_F1_1 TLGGEVALWSLVVLAVERYMVVCKPMA | | 123 139 Length = 17 Start = position 290 of sequence End = position 306 of sequence Motif = OPSIN PVFMTVPAFFAKSSAIYNPVIYIVLNK | | 290 306 #--------------------------------------- #---------------------------------------
In this case you already know that the sequence is a rhodopsin. However, if you had an unknown sequence, identifying motifs might provide you with information to help you plan further experiments.
PRINTS is a database that defines functional protein families, identifying each domain by a number of short, particularly well conserved sequences. A full match to one of these "fingerprints" will match all the relevant short sequences in the correct order. A partial match is recorded if some are missing or if they occur in an incorrect order. The PRINTS database can be searched using the pscan program which is available within EMBOSS. However, PRINTS is now part of InterPro and so it is advisable to install and use the EMBASSY wrapper to the IPRSCAN package assuming it's available for your platform.
%
pscan
Scans proteins using PRINTS Input protein sequence(s):L07770.pep
Minimum number of elements per fingerprint [2]: Maximum number of elements per fingerprint [20]: Output file [L07770_1.pscan]:L07770.pscan
Scanning L07770+1...
%
more L07770.pscan
CLASS 1 Fingerprints with all elements in order Fingerprint GPCRRHODOPSN Elements 7 Accession number PR00237 Rhodopsin-like GPCR superfamily signature Element 1 Threshold 54% Score 61% Start position 39 Length 25 Element 2 Threshold 49% Score 49% Start position 72 Length 22 Element 3 Threshold 48% Score 55% Start position 117 Length 23 Element 4 Threshold 50% Score 69% Start position 152 Length 22 Element 5 Threshold 51% Score 82% Start position 204 Length 24 Element 6 Threshold 42% Score 72% Start position 250 Length 25 Element 7 Threshold 46% Score 68% Start position 288 Length 27 CLASS 2 All elements match but not all in the correct order Fingerprint RHODOPSIN Elements 6 Accession number PR00579 Rhodopsin signature Element 1 Threshold 80% Score 100% Start position 3 Length 19 Element 2 Threshold 76% Score 94% Start position 22 Length 17 Element 3 Threshold 53% Score 90% Start position 85 Length 17 Element 4 Threshold 71% Score 100% Start position 191 Length 17 Element 5 Threshold 56% Score 97% Start position 271 Length 19 Element 6 Threshold 81% Score 95% Start position 319 Length 14 CLASS 3 Not all elements match but those that do are in order CLASS 4 Remaining partial matches
The simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology. Multiple alignments are used to find diagnostic patterns to characterize protein families, to detect or demonstrate homology between new sequences and existing families of sequences, to help predict the secondary and tertiary structures of the new sequences, to suggest oligonucleotide primers for PCR and as an essential prelude to molecular evolutionary analysis.
One of the most popular programs for performing multiple sequence alignments is clustalw. EMBOSS has an interface to clustalw called emma. clustalw (and thus emma) creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. It can also produce a dendrogram showing the clustering relationships used to create the alignment. The dendrogram shows the order of the pairwise alignments of sequences and clusters of sequences that together generate the final alignment, but it is not an evolutionary tree (although the length of the branches is related to the relative distance of the sequences).
clustalw finds global optimal alignments. The alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pairwise alignment. When gaps are inserted into a sequence to produce an alignment, they are inserted at the same position in all the sequences of the cluster. Each pairwise alignment uses the method of Needleman and Wunsch extended for use with clusters of aligned sequences.
pscan reported that the sequence belongs to the rhodopsin family. This is a very large family of sequences - for example, you can see the Pfam entry for rhodopsin by doing a keyword search at http://www.sanger.ac.uk/Software/Pfam
We will now retrieve some further members of the family from SwissProt and produce a multiple alignment; you'll then use this multiple alignment to produce a profile of this group of sequences and use that to align them all to the original sequence.
First, let's retrieve the sequences using seqret:
%
seqret
Reads and writes (returns) a set of sequences Input (gapped) sequence(s):sw:ops*2_*
output sequence(s) [ops2_drome.fasta]:ops2.fasta
Note the use of the wildcard character *
to retrieve all swissprot sequences whose identifiers begin with ops*2_
.
%
emma
Multiple alignment program - interface to ClustalW program Input sequence:ops2.fasta
Output sequence set [ops2_drome.aln]:ops2.aln
Dendrogram (tree file) from clustalw output file [ops2_drome.dnd]:ops2.dnd
CLUSTAL W (1.83) Multiple Sequence Alignments Sequence type explicitly set to Protein Sequence format is Pearson Sequence 1: OPS2_DROME 381 aa Sequence 2: OPS2_DROPS 381 aa Sequence 3: OPS2_SCHGR 380 aa Sequence 4: OPSC2_HEMSA 377 aa Sequence 5: OPSD2_ANGAN 352 aa Sequence 6: OPSD2_PATYE 399 aa Sequence 7: OPSG2_ASTFA 353 aa Sequence 8: OPSG2_CARAU 349 aa Sequence 9: OPSG2_DANRE 349 aa Sequence 10: OPSR2_DANRE 356 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 92 Sequences (1:3) Aligned. Score: 33 Sequences (1:4) Aligned. Score: 38 Sequences (1:5) Aligned. Score: 22 Sequences (1:6) Aligned. Score: 21 Sequences (1:7) Aligned. Score: 26 Sequences (1:8) Aligned. Score: 23 Sequences (1:9) Aligned. Score: 23 Sequences (1:10) Aligned. Score: 25 Sequences (2:3) Aligned. Score: 32 Sequences (2:4) Aligned. Score: 37 Sequences (2:5) Aligned. Score: 20 Sequences (2:6) Aligned. Score: 22 Sequences (2:7) Aligned. Score: 23 Sequences (2:8) Aligned. Score: 24 Sequences (2:9) Aligned. Score: 23 Sequences (2:10) Aligned. Score: 21 Sequences (3:4) Aligned. Score: 33 Sequences (3:5) Aligned. Score: 22 Sequences (3:6) Aligned. Score: 20 Sequences (3:7) Aligned. Score: 21 Sequences (3:8) Aligned. Score: 20 Sequences (3:9) Aligned. Score: 19 Sequences (3:10) Aligned. Score: 22 Sequences (4:5) Aligned. Score: 24 Sequences (4:6) Aligned. Score: 24 Sequences (4:7) Aligned. Score: 24 Sequences (4:8) Aligned. Score: 25 Sequences (4:9) Aligned. Score: 22 Sequences (4:10) Aligned. Score: 23 Sequences (5:6) Aligned. Score: 21 Sequences (5:7) Aligned. Score: 35 Sequences (5:8) Aligned. Score: 67 Sequences (5:9) Aligned. Score: 66 Sequences (5:10) Aligned. Score: 37 Sequences (6:7) Aligned. Score: 20 Sequences (6:8) Aligned. Score: 23 Sequences (6:9) Aligned. Score: 22 Sequences (6:10) Aligned. Score: 20 Sequences (7:8) Aligned. Score: 39 Sequences (7:9) Aligned. Score: 38 Sequences (7:10) Aligned. Score: 82 Sequences (8:9) Aligned. Score: 85 Sequences (8:10) Aligned. Score: 42 Sequences (9:10) Aligned. Score: 41 Guide tree file created: [00004255C] Start of Multiple Alignment There are 9 groups Aligning... Group 1: Sequences: 2 Score:5375 Group 2: Sequences: 3 Score:5387 Group 3: Sequences: 2 Score:5429 Group 4: Sequences: 5 Score:2568 Group 5: Sequences: 2 Score:6128 Group 6: Sequences: 3 Score:2747 Group 7: Sequences: 4 Score:2500 Group 8: Sequences: 9 Score:1827 Group 9: Delayed Sequence:6 Score:1930 Alignment Score 27426 GCG-Alignment file created [00004255B]
We have aligned ops2 sequences from two fruit fly species, two crab species, locust and scallop. Let's see what emma made of them:
%
more ops2.aln
>OPSG2_CARAU --------------------MNGTEGNNFYVPLSNRTGLVRSPFEYPQYYLAEPWQFKLL AVYMFFLICLGLPINGLTLICTAQHKKLRQPLNFILVNLAVAGAIMVCFGFTVTFYT-AI NGYFALGPTGCAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSSTHASAGIAF TWVMAMACAAPPLVG-WSRYIPEGIQCSCGPDYYTLNPEYNNESYVLYMFICHFILPVTI IFFTYGRLVCTVKAAAAQQQD------------SASTQKAEREVTKMVILMVLGFLVAWT PYATVAAWIFFNKGAAFSAQFMAIPAFFSKTSALYNPVIYVLLNKQFRSCMLTTLFCGKN PLGDEESSTVSTSKTEVSSVSPA------------------------------------- --------------------------- >OPSG2_DANRE --------------------MNGTEGNNFYIPMSNRTGLVRSPYEYTQYYLADPWQFKAL AFYMFFLICFGLPINVLTLLVTAQHKKLRQPLNYILVNLAFAGTIMAFFGFTVTFYC-SI NGYMALGPTGCAIEGFFATLGGQVALWSLVVLAIERYIVVCKPMGSFKFSSNHAMAGIAF TWVMASSCAVPPLFG-WSRYIPEGMQTSCGPDYYTLNPEFNNESYVLYMFSCHFCVPVTT IFFTYGSLVCTVKAAAAQQQE------------SESTQKAEREVTRMVILMVLGFLVAWV PYASFAAWIFFNRGAAFSAQAMAIPAFFSKASALFNPIIYVLLNKQFRSCMLNTLFCGKS PLGDDESSSVSTSKTEVSSVSPA------------------------------------- --------------------------- >OPSD2_ANGAN --------------------MNGTEGPNFYVPMSNVTGVVRSPFEYPQYYLAEPWAYSAL AAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYT-SM HGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAVERWMVVCKPMSNFRFGENHAIMGVAF TWVMALACAAPPLFG-WSRYIPEGMQCSCGMDHYAPNPETYNESFVIYMFICHFTIPLTV ISFCYGRLVCTVKEATAQQQE------------SETTQRAEREVTRMVIIMVISFLVCWV PYASVAWYIFTHQGSSFGPIFMTIPAFFAKSSSLYNPLIYICMNKQSRNCMITTLCCGKN PFEEEEGASTTASKTEASSVSSVSPA---------------------------------- --------------------------- >OPSG2_ASTFA ---------MAAHEPVFAARRHNEDTTRESAFVYTNANNTRDPFEGPNYHIAPRWVYNVS SLWMIFVVIASVFTNGLVIVATAKFKKLRHPLNWILVNLAIADLGETVLASTISVIN-QI FGYFILGHPMCVFEGWTVSVCGITALWSLTIISWERWVVVCKPFGNVKFDGKWAAGGIIF SWVWAIIWCTPPIFG-WSRYWPHGLKTSCGPDVFSGSEDPGVASYMITLMLTCCILPLSI IIICYIFVWSAIHQVAQQQKD------------SESTQKAEKEVSRMVVVMILAFIVCWG PYASFATFSAVNPGYAWHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRSCIMQLFGKKVE -----DASEVSGSTTEVSTAS--------------------------------------- --------------------------- >OPSR2_DANRE --------MAEWANAAFAARRRGDETTRDNAFSYTNSNNTRDPFEGPNYHIAPRWVYNVA TVWMFFVVVASTFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETLFASTISVIN-QV FGYFILGHPMCIFEGYTVSVCGIAGLWSLTVISWERWVVVCKPFGNVKFDGKWASAGIIF SWVWAAVWCAPPIFG-WSRYWPHGLKTSCGPDVFGGNEDPGVQSYMLVLMITCCILPLAI IILCYIAVFLAIHAVAQQQKD------------SESTQKAEKEVSRMVVVMILAFCLCWG PYTAFACFAAANPGYAFHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRVCIMQLFGKKVD -----DGSEVSTSKTEVSSVAPA------------------------------------- --------------------------- >OPS2_DROME MERSHLPETPFDLAHSGPRFQAQSSGNGSVLDNVLPDMAHLVNPYWSRFAPMDPMMSKIL GLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIIN-FY YETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKILF IWMMAVFWTVMPLIG-WSAYVPEGNLTACSIDYMTRMWNPRSYLITYSLFVYYTPLFLIC YSYWFIIAAVAAHEKAMREQAKKMNVKSLRS-SEDCDKSAEGKLAKVALTTISLWFMAWT PYLVICYFGLF-KIDGLTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVF GNTDEPKPDAPASDTETTSEADSKA----------------------------------- --------------------------- >OPS2_DROPS MERSLLPEPPLAMALLGPRFEAQTGGNRSVLDNVLPDMAPLVNPYWSRFAPMDPTMSKIL GLFTLVILIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIIN-FY YETWVLGPLWCDIYAACGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKIAF IWMMAVFWTIMPLIG-WSSYVPEGNLTACSIDYMTRQWNPRSYLITYSLFVYYTPLFMIC YSYWFIIATVAAHEKAMRDQAKKMNVKSLRS-SEDCDKSAENKLAKVALTTISLWFMAWT PYLIICYFGLF-KIDGLTPLTTIWGATFAKTSAVYNPIVYGISHPKYRLVLKEKCPMCVC GSTDEPKPDAPPSDTETTSEAESKA----------------------------------- --------------------------- >OPSC2_HEMSA ---MTNATGPQMAYYGAASMDFGYPEGVSIVDFVRPEIKPYVHQHWYNYPPVNPMWHYLL GVIYLFLGTVSIFGNGLVIYLFNKSAALRTPANILVVNLALSDLIMLTTNVPFFTYNCFS GGVWMFSPQYCEIYACLGAITGVCSIWLLCMISFDRYNIICNGFNGPKLTTGKAVVFALI SWVIAIGCALPPFFG-WGNYILEGILDSCSYDYLTQDFNTFSYNIFIFVFDYFLPAAIIV FSYVFIVKAIFAHEAAMRAQAKKMNVSTLRS-NEADAQRAEIRIAKTALVNVSLWFICWT PYALISLKGVMGDTSGITPLVSTLPALLAKSCSCYNPFVYAISHPKYRLAITQHLPWFCV HETETKSNDDSQSNSTVAQDKA-------------------------------------- --------------------------- >OPS2_SCHGR -----MVNTTDFYPVPAAMAYESSVGLPLLGWNVPTEHLDLVHPHWRSFQVPNKYWHFGL AFVYFMLMCMSSLGNGIVLWIYATTKSIRTPSNMFIVNLALFDVLMLL-EMPMLVVSSLF YQRPVGWELGCDIYAALGSVAGIGSAINNAAIAFDRYRTISCPIDG-RLTQGQVLALIAG TWVWTLPFTLMPLLRIWSRFTAEGFLTTCSFDYLTDDEDTKVFVGCIFAWSYAFPLCLIC CFYYRLIGAVREHEKMLRDQAKKMNVKSLQSNADTEAQSAEIRIAKVALTIFFLFLCSWT PYAVVAMIGAFGNRAALTPLSTMIPAVTAKIVSCIDPWVYAINHPRFRAEVQKRMKWLHL GEDARSSKSDTSSTATDRTVGNVSASA--------------------------------- --------------------------- >OPSD2_PATYE --------------------------------------MPFPLNRTDTALVISPSEFRII GIFISICCIIGVLGNLLIIIVFAKRRSVRRPINFFVLNLAVSDLIVALLGYPMTAAS-AF SNRWIFDNIGCKIYAFLCFNSGVISIMTHAALSFCRYIIICQYGYRKKITQTTVLRTLFS IWSFAMFWTLSPLFG-WSSYVIEVVPVSCSVNWYGHGLGDVSYTISVIVAVYVFPLSIIV FSYGMILQEKVCKDSRKNGIR------AQQRYTPRFIQDIEQRVTFISFLMMAAFMVAWT PYAIMSALAIG--SFNVENSFAALPTLFAKASCAYNPFIYAFTNANFRDTVVEIMAPWTT RRVGVSTLPWPQVTYYPRRRTSAVNTTDIEFPDDNIFIVNSSVNGPTVKREKIVQRNPIN VRLGIKIEPRDSRAATENTFTADFSVI
The sequences are very similar, but there are some differences - note the gaps that have been inserted. Also note that since this is a global alignment algorithm, gaps have been inserted to make all the sequences the same length.
Differences in alignment can be very difficult to see in this format. The program prettyplot can enhance visualisation of your results, by aligning the sequences on top of one another.
%
prettyplot
Displays aligned sequences, with colouring and boxing Input sequence set:ops2.aln
Graph type [x11]:
A graphic display will appear on your screen detailing your alignment. Identical residues are shown in red, and similar residues in green. This type of display can give you a first impression of regions of conservation.
As with all EMBOSS graphical programs you can capture the output in a file rather than just viewing it on screen. The output is controlled by the -graph
family of associated qualifiers (type prettyplot -help -verbose
to get a complete listing of options.
We will save the pretty plot to a file rhodopsin.ps
in colour postscript format. To do this you use -graph cps
and -goutfile rhodopsin
:
%
prettyplot ops2.aln -goutfile rhodopsin -graph cps
Displays aligned sequences, with colouring and boxing Created rhodopsin.ps
This has created a file rhodopsin.ps
that can be printed on a postscript printer or turned into a PDF document with ps2pdf (not an EMBOSS program but commonly found on many UNIX/Linux systems). PDF documents can then be viewed with a PDF viewer such as Acrobat Reader.
To adjust the output of prettyplot (e.g. to increase the number of residues per line) there are a number of options that can be set. Read the help file and try to plot with/without a consensus, different numbers of residues per line and so on. (hint: prettyplot -help
)
A very powerful technique for characterizing the putative structure and function of a sequence is profile analysis. This is a sequence comparison method for finding and aligning distantly related sequences. The comparison allows a new sequence to be aligned optimally to a family of similar sequences. The comparison uses a scoring matrix and an existing optimal alignment of two or more similar protein sequences. The group or 'family' of similar sequences are first aligned together to create a multiple sequence alignment. The information in the multiple sequence alignment is then represented as a table of position-specific symbol comparison values and gap penalties. This table is called a profile. The similarity of new sequences to an existing profile can be tested by comparing each new sequence to the profile using a modification of the Smith/Waterman algorithm.
prophecy is an EMBOSS program for creating a profile from a set of multiple-aligned sequences. You'll use the ops2 alignment to show you prophecy:
%
prophecy
Creates matrices/profiles from multiple alignments Input sequence:ops2.aln
Profile type F : Frequency G : Gribskov H : Henikoff Select type [F]:g
Scoring matrix [Epprofile]: Enter a name for the profile [mymatrix]:ops2 sequences
Gap opening penalty [3.0]: Gap extension penalty [0.3]: Output file [ops2.prophecy]:ops2.prophecy
Now let's use the profile you just created to align L07770.pep to the opsin2 sequences:
%
prophet
Gapped alignment for profiles Input sequence(s):L07770.pep
Profile or weight matrix file:ops2.prophecy
Gap opening coefficient [1.0]: Gap extension coefficient [0.1]: Output alignment [l07770_1.prophet]:ops2.prophet
%
more ops2.prophet
######################################## # Program: prophet # Rundate: Wed 18 Feb 2009 15:58:33 # Commandline: prophet # -sequence L07770.pep # -infile ops2.prophecy # -outfile ops2.prophet # Align_format: simple # Report_file: ops2.prophet ######################################## #======================================= # # Aligned_sequences: 2 # 1: ops2 # 2: L07770_1 # Matrix: EBLOSUM62 # # Length: 368 # Identity: 193/368 (52.4%) # Similarity: 250/368 (67.9%) # Gaps: 16/368 ( 4.3%) # Score: 2671.089 # # #======================================= ops2 21 MNGTEGNNFYVDNVNPTGLPRVPFEWPNYYLADPWMFKILALFMFFLIIA 70 ||||||.||||...|.||:.|.||::|.||||:||.:..||.:||.||:. L07770_1 1 MNGTEGPNFYVPMSNKTGVVRSPFDYPQYYLAEPWQYSALAAYMFLLILL 50 ops2 71 SCFGNGLVLYITAKHKKLRTPLNFILVNLAFADLIMALFGSPVTVINCFI 120 ....|.:.|::|.:|||||||||:||:||.||:..|.|.|..||:... : L07770_1 51 GLPINFMTLFVTIQHKKLRTPLNYILLNLVFANHFMVLCGFTVTMYTS-M 99 ops2 121 YGYFVLGPLGCDIEAFLGSLGGIVSLWSLCVIAFERYIVICKPFGGFKFT 170 :|||:.|..||.||.|..:|||.|:||||.|:|.|||:|:|||...|:|. L07770_1 100 HGYFIFGQTGCYIEGFFATLGGEVALWSLVVLAVERYMVVCKPMANFRFG 149 ops2 171 GKHAIAGIAFTWVMAIFWAAPPLFGIWSRYIPEGILTSCGPDYYTGNEDP 220 ..|||.|:||||:||:..||||||| ||||||||:..|||.||||...:. L07770_1 150 ENHAIMGVAFTWIMALSCAAPPLFG-WSRYIPEGMQCSCGVDYYTLKPEV 198 ops2 221 GSYSIVIYLFIYHFPLPLICISYCYIILACAAHEAAAQQQAKKMNVKSLR 270 .:.|.|||:||.||.:|||.|.:||..|.|...|||||||. L07770_1 199 NNESFVIYMFIVHFTIPLIVIFFCYGRLLCTVKEAAAQQQE--------- 239 ops2 271 SNSSESTQKAEREVAKMVLLMILLFLVAWTPYASFAAFGAFNKGAAFTPL 320 |.:|||||:||.:||::|::.||:.|.|||..|.:...::|:.|.|: L07770_1 240 ---SATTQKAEKEVTRMVVIMVVFFLICWVPYAYVAFYIFTHQGSNFGPV 286 ops2 321 AAAIPAFFAKSSALYNPIIYVIMNPQFRSCIKETLPCGVNGETDEESSDV 370 ...:|||||||||:|||:||:::|.|||:|:..||.||.|...||:.|.. L07770_1 287 FMTVPAFFAKSSAIYNPVIYIVLNKQFRNCLITTLCCGKNPFGDEDGSSA 336 ops2 371 STSKTEVSSVSPA--KAA 386 :|||||.||||.: ..| L07770_1 337 ATSKTEASSVSSSQVSPA 354 #--------------------------------------- #---------------------------------------
The vertical bars (|
) represent residues that are identical between the ops2
consensus and the rhodopsin, while the colons (:
) represent conservative substitutions. Aligning members of a family can reveal conserved regions that may be important for structure and/or function.