Protein loops on structural similar scaffold
Definition

Loop: irregular regions of neither alpha helix nor beta sheet defined by DSSP

Stem: regular secondary structure fragments connecing a loop

Motif: loop and its stems.


Composition of the database

This database is composed of a series of motif families. Each family is an entry. The family is named by loop length, type of stem and a serial number beginning from 0. So the loop length in family 4ab1 is 4, the upstream of loop in 4ab1 is a helix, the downstream is a strand.

The sequential and structural parameters are stored in a file with the same name of that family. The coordinates of every motif in each family are stored in a single file in PDB format. These files are extracted from original PDB entries and are translated and rotated so that all the motifs in the family can superimposed.

The following is a sample with remarks in RED color


A sample family of 6bb1

#Head information The first part is some summary information
NAME 6bb1 Name of that family
TYPE beta-beta Type of stems
REPRESENTATIVE 1mlb-A_33
The representative of that family is named by the PDB code, chain id and a number where the motif begins.
MEMBERS 12 How many motifs in that family
CLUSTER 5 Conformational sub-clusters

#Loop element
Name of motif motif sequence. coordinate file ..subcluster
>1mlb-A_33 ..lhwyqqKSHESPrllik. 6bb1_0.pdb 0
>1hil-A_39 ..ltwyqqKPGQPPkvliy. 6bb1_1.pdb 0
>1fdl-L_33 ..lawyqqKQGKSPqllvy. 6bb1_2.pdb 0
>1vge-L_33 ..lawyqqKPGKAPrlliy. 6bb1_3.pdb 0
>1eap-A_33 ..igwyqhKPGKGPrllih. 6bb1_4.pdb 1
>1wtl-B_33 ..vnwfqqRPGQAPkvliy. 6bb1_5.pdb 0
>2fb4-L_35 ...nwyqqLPGMAPklliy. 6bb1_6.pdb 2
>1bre-B_34 ...iwyqqKLGKAPnlliy. 6bb1_7.pdb 0
>1jhl-L_33 ..lawyqeKPGKTNnlliy. 6bb1_8.pdb 3
>4bjl-B_34 ..vtwyqhLSGTAPklliy. 6bb1_9.pdb 4
>1rei-A_33 ..lnwyqqTPGKAPklliy. 6bb1_10.pdb 0
>7fab-L_36 ...kwyqqLPGTAPkl.... 6bb1_11.pdb 0

#Motif sequence
Aligned motif sequences
..LHWYQQKSHESPRLLIK.
..LTWYQQKPGQPPKVLIY.
..LAWYQQKQGKSPQLLVY.
..LAWYQQKPGKAPRLLIY.
..IGWYQHKPGKGPRLLIH.
..VNWFQQRPGQAPKVLIY.
...NWYQQLPGMAPKLLIY.
...IWYQQKLGKAPNLLIY.
..LAWYQEKPGKTNNLLIY.
..VTWYQHLSGTAPKLLIY.
..LNWYQQTPGKAPKLLIY.
...KWYQQLPGTAPKL....

#Loop sequence
Aligned loop sequences
KSHESP
KPGQPP
KQGKSP
KPGKAP
KPGKGP
RPGQAP
LPGMAP
KLGKAP
KPGKTN
LSGTAP
TPGKAP
LPGTAP

#Fingerprint
Sequence pattern. the most frequently used amino acid is in the first row. a ":" means same as above
K P G K A P
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : Q : :
L : : : S :
: S : T : :
: : : : G :
R L : E P :
T Q H M T N

#Geometry parameters
DISTANCE 6.62 ( 0.18) Averaged distance between each end of loop
Vector1 ( 0.22 -0.80 -0.56) Direction of beginning stem
Vector2 ( -0.45 0.78 0.43) Direction of ending stem
Vector3 ( 0.12 0.99 -0.06) Divection of loop
Struc diversity(A) ( 2.0 4.1 6.8 3.9 2.8 1.9)
Maximum variation of each CA atoms in loop of that family

#RMSD matrix of conformation of loops (A)
0.00 0.99 0.61 0.49 1.57 0.47 1.99 0.32 2.19 1.26 0.60 0.73
0.99 0.00 0.80 1.04 0.94 1.09 2.86 0.87 1.58 1.27 0.83 0.36
0.61 0.80 0.00 0.62 1.57 0.81 2.36 0.46 2.14 1.30 0.67 0.46
0.49 1.04 0.62 0.00 1.72 0.65 1.92 0.48 2.30 1.05 0.65 0.79
1.57 0.94 1.57 1.72 0.00 1.68 3.29 1.53 0.89 1.61 1.31 1.19
0.47 1.09 0.81 0.65 1.68 0.00 2.08 0.51 2.33 1.42 0.94 0.88
1.99 2.86 2.36 1.92 3.29 2.08 0.00 2.13 3.74 2.22 2.19 2.61
0.32 0.87 0.46 0.48 1.53 0.51 2.13 0.00 2.12 1.19 0.57 0.59
2.19 1.58 2.14 2.30 0.89 2.33 3.74 2.12 0.00 1.87 1.79 1.79
1.26 1.27 1.30 1.05 1.61 1.42 2.22 1.19 1.87 0.00 0.89 1.22
0.60 0.83 0.67 0.65 1.31 0.94 2.19 0.57 1.79 0.89 0.00 0.64
0.73 0.36 0.46 0.79 1.19 0.88 2.61 0.59 1.79 1.22 0.64 0.00

#Sequence identities difference matrix of loop (%)
0.0 67.0 50.0 67.0 67.0 83.0 83.0 67.0 83.0 67.0 83.0 83.0
67.0 0.0 50.0 33.0 33.0 33.0 50.0 50.0 50.0 67.0 50.0 50.0
50.0 50.0 0.0 33.0 33.0 67.0 67.0 33.0 50.0 67.0 50.0 67.0
67.0 33.0 33.0 0.0 17.0 33.0 33.0 17.0 33.0 50.0 17.0 33.0
67.0 33.0 33.0 17.0 0.0 50.0 50.0 33.0 33.0 67.0 33.0 50.0
83.0 33.0 67.0 33.0 50.0 0.0 33.0 50.0 67.0 50.0 33.0 33.0
83.0 50.0 67.0 33.0 50.0 33.0 0.0 50.0 67.0 33.0 33.0 17.0
67.0 50.0 33.0 17.0 33.0 50.0 50.0 0.0 50.0 50.0 33.0 50.0
83.0 50.0 50.0 33.0 33.0 67.0 67.0 50.0 0.0 83.0 50.0 67.0
67.0 67.0 67.0 50.0 67.0 50.0 33.0 50.0 83.0 0.0 50.0 17.0
83.0 50.0 50.0 17.0 33.0 33.0 33.0 33.0 50.0 50.0 0.0 33.0
83.0 50.0 67.0 33.0 50.0 33.0 17.0 50.0 67.0 17.0 33.0 0.0

#Sequence identities difference matrix of motif (%)
0.0 40.0 35.0 30.0 40.0 55.0 45.0 40.0 45.0 45.0 40.0 55.0
40.0 0.0 35.0 25.0 40.0 25.0 30.0 35.0 35.0 35.0 25.0 45.0
35.0 35.0 0.0 20.0 40.0 50.0 40.0 30.0 30.0 45.0 30.0 50.0
30.0 25.0 20.0 0.0 25.0 35.0 25.0 20.0 20.0 35.0 15.0 40.0
40.0 40.0 40.0 25.0 0.0 50.0 40.0 35.0 35.0 40.0 35.0 50.0
55.0 25.0 50.0 35.0 50.0 0.0 25.0 40.0 50.0 35.0 25.0 45.0
45.0 30.0 40.0 25.0 40.0 25.0 0.0 25.0 40.0 25.0 15.0 25.0
40.0 35.0 30.0 20.0 35.0 40.0 25.0 0.0 30.0 35.0 25.0 40.0
45.0 35.0 30.0 20.0 35.0 50.0 40.0 30.0 0.0 45.0 30.0 55.0
45.0 35.0 45.0 35.0 40.0 35.0 25.0 35.0 45.0 0.0 30.0 35.0
40.0 25.0 30.0 15.0 35.0 25.0 15.0 25.0 30.0 30.0 0.0 35.0
55.0 45.0 50.0 40.0 50.0 45.0 25.0 40.0 55.0 35.0 35.0 0.0

#Sequence identities difference matrix of global PDB chains (%)
0.0 25.0 20.0 38.0 23.0 72.0 61.0 72.0 67.0 61.0 73.0 60.0
25.0 0.0 25.0 40.0 27.0 68.0 58.0 70.0 71.0 58.0 69.0 57.0
20.0 25.0 0.0 36.0 21.0 68.0 59.0 65.0 68.0 58.0 67.0 59.0
38.0 40.0 36.0 0.0 40.0 62.0 58.0 61.0 64.0 60.0 63.0 58.0
23.0 27.0 21.0 40.0 0.0 68.0 60.0 66.0 69.0 62.0 67.0 60.0
72.0 68.0 68.0 62.0 68.0 0.0 76.0 18.0 39.0 76.0 19.0 77.0
61.0 58.0 59.0 58.0 60.0 76.0 0.0 76.0 79.0 10.0 75.0 20.0
72.0 70.0 65.0 61.0 66.0 18.0 76.0 0.0 35.0 76.0 16.0 78.0
67.0 71.0 68.0 64.0 69.0 39.0 79.0 35.0 0.0 78.0 37.0 79.0
61.0 58.0 58.0 60.0 62.0 76.0 10.0 76.0 78.0 0.0 76.0 19.0
73.0 69.0 67.0 63.0 67.0 19.0 75.0 16.0 37.0 76.0 0.0 76.0
60.0 57.0 59.0 58.0 60.0 77.0 20.0 78.0 79.0 19.0 76.0 0.0

The similarity of two sequences is calculated using global alignment score of align program of FASTA package

#FASTA SCORE matrix of loop sequence
    -   14   20   15   14   13    3   13    4   10    8    4
    -    -   24   35   34   37   24   21   23   12   28   23
    -    -    -   30   29   23   13   29   19   15   23   14
    -    -    -    -   40   38   28   31   28   18   38   29
    -    -    -    -    -   33   23   26   26   13   33   24
    -    -    -    -    -    -   30   24   21   18   34   29
    -    -    -    -    -    -    -   14   11   26   30   37
    -    -    -    -    -    -    -    -   14   16   24   15
    -    -    -    -    -    -    -    -    -    1   21   12
    -    -    -    -    -    -    -    -    -    -   20   32
    -    -    -    -    -    -    -    -    -    -    -   31
    -    -    -    -    -    -    -    -    -    -    -    -
#FASTA SCORE matrix of motif sequence
    -   66   73   75   67   60   57   58   51   56   67   49
    -    -   86   99   81  100   86   76   79   74   95   66
    -    -    -  100   79   76   73   87   83   71   88   57
    -    -    -    -   97   93   90   89   92   76  105   73
    -    -    -    -    -   80   74   69   74   75   86   61
    -    -    -    -    -    -   95   73   68   75  100   69
    -    -    -    -    -    -    -   71   65   86  103   85
    -    -    -    -    -    -    -    -   75   69   81   54
    -    -    -    -    -    -    -    -    -   55   80   48
    -    -    -    -    -    -    -    -    -    -   81   73
    -    -    -    -    -    -    -    -    -    -    -   79
    -    -    -    -    -    -    -    -    -    -    -    -
#FASTA SCORE matrix of global PDB chain sequence
    - 1093 1177  921 1090  218  501  199  250  488  188  502
    -    - 1100  885 1046  262  557  219  204  541  221  527
    -    -    -  940 1143  272  521  284  234  529  261  527
    -    -    -    -  876  342  525  327  297  506  311  524
    -    -    -    -    -  251  500  243  185  483  250  493
    -    -    -    -    -    -   73  598  475   50  575   49
    -    -    -    -    -    -    -   54   22 1276   81 1110
    -    -    -    -    -    -    -    -  480   58  588   22
    -    -    -    -    -    -    -    -    -   15  452   17
    -    -    -    -    -    -    -    -    -    -   62 1122
    -    -    -    -    -    -    -    -    -    -    -   47
    -    -    -    -    -    -    -    -    -    -    -    -