Modules nécessaires
import numpy as np
import pandas as pd
def pprint(m):
return pd.DataFrame(m, columns=aa)
Rappel : Quelques diapositives sur l'alignement de 2 séquences et les matrices de substitution
Nous allons construire une matrice de type Dayhoff (la PAM250) pour l'alignement de séquences protéiques, ceci à partir d'alignements de séquences. La méthode utilise
A partir de ces informations, la matrice de probabilité de mutations d'un aa en un autre peut-être calculée pour une distance évolutive correspondant au temps qu'il faut pour qu'une mutation sur 100 aa soit fixée dans la population.
A partir de cette matrice, il est ensuite possible de calculer les matrices correspondant à des distances évolutives plus grandes. Et à partir de ces dernières matrices, on pourra calculer les matrices de scrores de substitutions utilisables pour faire l'alignement de séquences.
Nous allons donc suivre le cheminement décrit dans A Model of Evolutionary Change in Proteins, M.O Dayhoff, R.M. Schwartz and B.C. Orcutt, Atlas of Proteins Sequence and Structure, 1978
La première étape (déjà réalisée pour vous) consiste à réaliser les alignements globaux par familles de séquences, inférer les arbres de parenté correspondant afin de dénombrer le nombre de passage d'un acide aminé à un autre.
cm
¶Cette matrice de comptage correspond à la figure 80 :
Cette matrice est symétrique.
Le fichier correspondant à la figure a été reconstitué : dayhoff.fig80.counts.csv.
counts = pd.read_csv('dayhoff.fig80.counts.csv')
counts
aa | A | R | N | D | C | Q | E | G | H | ... | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | A | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | R | 30.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | N | 109.0 | 17.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | D | 154.0 | 0.0 | 532.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | C | 33.0 | 10.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 | Q | 93.0 | 120.0 | 50.0 | 76.0 | 0.0 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
6 | E | 266.0 | 0.0 | 94.0 | 831.0 | 0.0 | 422.0 | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
7 | G | 579.0 | 10.0 | 156.0 | 162.0 | 10.0 | 30.0 | 112.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
8 | H | 21.0 | 103.0 | 226.0 | 43.0 | 10.0 | 243.0 | 23.0 | 10.0 | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
9 | I | 66.0 | 30.0 | 36.0 | 13.0 | 17.0 | 8.0 | 35.0 | 0.0 | 3.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
10 | L | 95.0 | 17.0 | 37.0 | 0.0 | 0.0 | 75.0 | 15.0 | 17.0 | 40.0 | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
11 | K | 57.0 | 477.0 | 322.0 | 85.0 | 0.0 | 147.0 | 104.0 | 60.0 | 23.0 | ... | 39.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
12 | M | 29.0 | 17.0 | 0.0 | 0.0 | 0.0 | 20.0 | 7.0 | 7.0 | 0.0 | ... | 207.0 | 90.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
13 | F | 20.0 | 7.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.0 | 20.0 | ... | 167.0 | 0.0 | 17.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
14 | P | 345.0 | 67.0 | 27.0 | 10.0 | 10.0 | 93.0 | 40.0 | 49.0 | 50.0 | ... | 43.0 | 43.0 | 4.0 | 7.0 | NaN | NaN | NaN | NaN | NaN | NaN |
15 | S | 772.0 | 137.0 | 432.0 | 98.0 | 117.0 | 47.0 | 86.0 | 450.0 | 26.0 | ... | 32.0 | 168.0 | 20.0 | 40.0 | 269.0 | NaN | NaN | NaN | NaN | NaN |
16 | T | 590.0 | 20.0 | 169.0 | 57.0 | 10.0 | 37.0 | 31.0 | 50.0 | 14.0 | ... | 52.0 | 200.0 | 28.0 | 10.0 | 73.0 | 696.0 | NaN | NaN | NaN | NaN |
17 | W | 0.0 | 27.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | ... | 13.0 | 0.0 | 0.0 | 10.0 | 0.0 | 17.0 | 0.0 | NaN | NaN | NaN |
18 | Y | 20.0 | 3.0 | 36.0 | 0.0 | 30.0 | 0.0 | 10.0 | 0.0 | 40.0 | ... | 23.0 | 10.0 | 0.0 | 260.0 | 0.0 | 22.0 | 23.0 | 6.0 | NaN | NaN |
19 | V | 365.0 | 20.0 | 13.0 | 17.0 | 33.0 | 27.0 | 37.0 | 97.0 | 30.0 | ... | 303.0 | 17.0 | 77.0 | 10.0 | 50.0 | 43.0 | 186.0 | 0.0 | 17.0 | NaN |
20 rows × 21 columns
On remplace les NaN pas des 0
counts.fillna(0,inplace=True)
counts
aa | A | R | N | D | C | Q | E | G | H | ... | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | A | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | R | 30.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | N | 109.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | D | 154.0 | 0.0 | 532.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | C | 33.0 | 10.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 | Q | 93.0 | 120.0 | 50.0 | 76.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
6 | E | 266.0 | 0.0 | 94.0 | 831.0 | 0.0 | 422.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | G | 579.0 | 10.0 | 156.0 | 162.0 | 10.0 | 30.0 | 112.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 | H | 21.0 | 103.0 | 226.0 | 43.0 | 10.0 | 243.0 | 23.0 | 10.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | I | 66.0 | 30.0 | 36.0 | 13.0 | 17.0 | 8.0 | 35.0 | 0.0 | 3.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
10 | L | 95.0 | 17.0 | 37.0 | 0.0 | 0.0 | 75.0 | 15.0 | 17.0 | 40.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
11 | K | 57.0 | 477.0 | 322.0 | 85.0 | 0.0 | 147.0 | 104.0 | 60.0 | 23.0 | ... | 39.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
12 | M | 29.0 | 17.0 | 0.0 | 0.0 | 0.0 | 20.0 | 7.0 | 7.0 | 0.0 | ... | 207.0 | 90.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
13 | F | 20.0 | 7.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.0 | 20.0 | ... | 167.0 | 0.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
14 | P | 345.0 | 67.0 | 27.0 | 10.0 | 10.0 | 93.0 | 40.0 | 49.0 | 50.0 | ... | 43.0 | 43.0 | 4.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15 | S | 772.0 | 137.0 | 432.0 | 98.0 | 117.0 | 47.0 | 86.0 | 450.0 | 26.0 | ... | 32.0 | 168.0 | 20.0 | 40.0 | 269.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
16 | T | 590.0 | 20.0 | 169.0 | 57.0 | 10.0 | 37.0 | 31.0 | 50.0 | 14.0 | ... | 52.0 | 200.0 | 28.0 | 10.0 | 73.0 | 696.0 | 0.0 | 0.0 | 0.0 | 0.0 |
17 | W | 0.0 | 27.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | ... | 13.0 | 0.0 | 0.0 | 10.0 | 0.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 |
18 | Y | 20.0 | 3.0 | 36.0 | 0.0 | 30.0 | 0.0 | 10.0 | 0.0 | 40.0 | ... | 23.0 | 10.0 | 0.0 | 260.0 | 0.0 | 22.0 | 23.0 | 6.0 | 0.0 | 0.0 |
19 | V | 365.0 | 20.0 | 13.0 | 17.0 | 33.0 | 27.0 | 37.0 | 97.0 | 30.0 | ... | 303.0 | 17.0 | 77.0 | 10.0 | 50.0 | 43.0 | 186.0 | 0.0 | 17.0 | 0.0 |
20 rows × 21 columns
On transforme en matrice et on rend la matrice symétrique en lui ajoutant sa transposée
cm = counts.drop('aa', axis=1).to_numpy()
aa = list(counts['aa'])
print(aa)
['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
pprint(cm)
A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 30.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 109.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 154.0 | 0.0 | 532.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 33.0 | 10.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 | 93.0 | 120.0 | 50.0 | 76.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
6 | 266.0 | 0.0 | 94.0 | 831.0 | 0.0 | 422.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
7 | 579.0 | 10.0 | 156.0 | 162.0 | 10.0 | 30.0 | 112.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8 | 21.0 | 103.0 | 226.0 | 43.0 | 10.0 | 243.0 | 23.0 | 10.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
9 | 66.0 | 30.0 | 36.0 | 13.0 | 17.0 | 8.0 | 35.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
10 | 95.0 | 17.0 | 37.0 | 0.0 | 0.0 | 75.0 | 15.0 | 17.0 | 40.0 | 253.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
11 | 57.0 | 477.0 | 322.0 | 85.0 | 0.0 | 147.0 | 104.0 | 60.0 | 23.0 | 43.0 | 39.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
12 | 29.0 | 17.0 | 0.0 | 0.0 | 0.0 | 20.0 | 7.0 | 7.0 | 0.0 | 57.0 | 207.0 | 90.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
13 | 20.0 | 7.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.0 | 20.0 | 90.0 | 167.0 | 0.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
14 | 345.0 | 67.0 | 27.0 | 10.0 | 10.0 | 93.0 | 40.0 | 49.0 | 50.0 | 7.0 | 43.0 | 43.0 | 4.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
15 | 772.0 | 137.0 | 432.0 | 98.0 | 117.0 | 47.0 | 86.0 | 450.0 | 26.0 | 20.0 | 32.0 | 168.0 | 20.0 | 40.0 | 269.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
16 | 590.0 | 20.0 | 169.0 | 57.0 | 10.0 | 37.0 | 31.0 | 50.0 | 14.0 | 129.0 | 52.0 | 200.0 | 28.0 | 10.0 | 73.0 | 696.0 | 0.0 | 0.0 | 0.0 | 0.0 |
17 | 0.0 | 27.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 13.0 | 0.0 | 0.0 | 10.0 | 0.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 |
18 | 20.0 | 3.0 | 36.0 | 0.0 | 30.0 | 0.0 | 10.0 | 0.0 | 40.0 | 13.0 | 23.0 | 10.0 | 0.0 | 260.0 | 0.0 | 22.0 | 23.0 | 6.0 | 0.0 | 0.0 |
19 | 365.0 | 20.0 | 13.0 | 17.0 | 33.0 | 27.0 | 37.0 | 97.0 | 30.0 | 661.0 | 303.0 | 17.0 | 77.0 | 10.0 | 50.0 | 43.0 | 186.0 | 0.0 | 17.0 | 0.0 |
cm = cm + cm.transpose()
pprint(cm)
A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 30.0 | 109.0 | 154.0 | 33.0 | 93.0 | 266.0 | 579.0 | 21.0 | 66.0 | 95.0 | 57.0 | 29.0 | 20.0 | 345.0 | 772.0 | 590.0 | 0.0 | 20.0 | 365.0 |
1 | 30.0 | 0.0 | 17.0 | 0.0 | 10.0 | 120.0 | 0.0 | 10.0 | 103.0 | 30.0 | 17.0 | 477.0 | 17.0 | 7.0 | 67.0 | 137.0 | 20.0 | 27.0 | 3.0 | 20.0 |
2 | 109.0 | 17.0 | 0.0 | 532.0 | 0.0 | 50.0 | 94.0 | 156.0 | 226.0 | 36.0 | 37.0 | 322.0 | 0.0 | 7.0 | 27.0 | 432.0 | 169.0 | 3.0 | 36.0 | 13.0 |
3 | 154.0 | 0.0 | 532.0 | 0.0 | 0.0 | 76.0 | 831.0 | 162.0 | 43.0 | 13.0 | 0.0 | 85.0 | 0.0 | 0.0 | 10.0 | 98.0 | 57.0 | 0.0 | 0.0 | 17.0 |
4 | 33.0 | 10.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 10.0 | 17.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 | 117.0 | 10.0 | 0.0 | 30.0 | 33.0 |
5 | 93.0 | 120.0 | 50.0 | 76.0 | 0.0 | 0.0 | 422.0 | 30.0 | 243.0 | 8.0 | 75.0 | 147.0 | 20.0 | 0.0 | 93.0 | 47.0 | 37.0 | 0.0 | 0.0 | 27.0 |
6 | 266.0 | 0.0 | 94.0 | 831.0 | 0.0 | 422.0 | 0.0 | 112.0 | 23.0 | 35.0 | 15.0 | 104.0 | 7.0 | 0.0 | 40.0 | 86.0 | 31.0 | 0.0 | 10.0 | 37.0 |
7 | 579.0 | 10.0 | 156.0 | 162.0 | 10.0 | 30.0 | 112.0 | 0.0 | 10.0 | 0.0 | 17.0 | 60.0 | 7.0 | 17.0 | 49.0 | 450.0 | 50.0 | 0.0 | 0.0 | 97.0 |
8 | 21.0 | 103.0 | 226.0 | 43.0 | 10.0 | 243.0 | 23.0 | 10.0 | 0.0 | 3.0 | 40.0 | 23.0 | 0.0 | 20.0 | 50.0 | 26.0 | 14.0 | 3.0 | 40.0 | 30.0 |
9 | 66.0 | 30.0 | 36.0 | 13.0 | 17.0 | 8.0 | 35.0 | 0.0 | 3.0 | 0.0 | 253.0 | 43.0 | 57.0 | 90.0 | 7.0 | 20.0 | 129.0 | 0.0 | 13.0 | 661.0 |
10 | 95.0 | 17.0 | 37.0 | 0.0 | 0.0 | 75.0 | 15.0 | 17.0 | 40.0 | 253.0 | 0.0 | 39.0 | 207.0 | 167.0 | 43.0 | 32.0 | 52.0 | 13.0 | 23.0 | 303.0 |
11 | 57.0 | 477.0 | 322.0 | 85.0 | 0.0 | 147.0 | 104.0 | 60.0 | 23.0 | 43.0 | 39.0 | 0.0 | 90.0 | 0.0 | 43.0 | 168.0 | 200.0 | 0.0 | 10.0 | 17.0 |
12 | 29.0 | 17.0 | 0.0 | 0.0 | 0.0 | 20.0 | 7.0 | 7.0 | 0.0 | 57.0 | 207.0 | 90.0 | 0.0 | 17.0 | 4.0 | 20.0 | 28.0 | 0.0 | 0.0 | 77.0 |
13 | 20.0 | 7.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 17.0 | 20.0 | 90.0 | 167.0 | 0.0 | 17.0 | 0.0 | 7.0 | 40.0 | 10.0 | 10.0 | 260.0 | 10.0 |
14 | 345.0 | 67.0 | 27.0 | 10.0 | 10.0 | 93.0 | 40.0 | 49.0 | 50.0 | 7.0 | 43.0 | 43.0 | 4.0 | 7.0 | 0.0 | 269.0 | 73.0 | 0.0 | 0.0 | 50.0 |
15 | 772.0 | 137.0 | 432.0 | 98.0 | 117.0 | 47.0 | 86.0 | 450.0 | 26.0 | 20.0 | 32.0 | 168.0 | 20.0 | 40.0 | 269.0 | 0.0 | 696.0 | 17.0 | 22.0 | 43.0 |
16 | 590.0 | 20.0 | 169.0 | 57.0 | 10.0 | 37.0 | 31.0 | 50.0 | 14.0 | 129.0 | 52.0 | 200.0 | 28.0 | 10.0 | 73.0 | 696.0 | 0.0 | 0.0 | 23.0 | 186.0 |
17 | 0.0 | 27.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 13.0 | 0.0 | 0.0 | 10.0 | 0.0 | 17.0 | 0.0 | 0.0 | 6.0 | 0.0 |
18 | 20.0 | 3.0 | 36.0 | 0.0 | 30.0 | 0.0 | 10.0 | 0.0 | 40.0 | 13.0 | 23.0 | 10.0 | 0.0 | 260.0 | 0.0 | 22.0 | 23.0 | 6.0 | 0.0 | 17.0 |
19 | 365.0 | 20.0 | 13.0 | 17.0 | 33.0 | 27.0 | 37.0 | 97.0 | 30.0 | 661.0 | 303.0 | 17.0 | 77.0 | 10.0 | 50.0 | 43.0 | 186.0 | 0.0 | 17.0 | 0.0 |
On peut vérifier que le nombre mutations correspond à celui présenté dans l'article :
np.sum(cm) / 10 / 2
1571.5
Ce qui correspond à 1572 dans l'article (ici, à diviser par 10 car la table a été multipliée par 10 dans la figure ; et aussi par 2 pour ne pas compter 2 fois les mutations).
m
¶La mutabilité est la probabilité qu'a un aa de muter. Son calcul est illustré dans la figure 81 :
Et, elle nous est donnée pour chaque aa dans la table 21 :
Le fichier correspondant a été reconstitué : dayhoff.mutabilities.csv
mutations = pd.read_csv('dayhoff.mutabilities.csv')
mutations
aa | mut | |
---|---|---|
0 | N | 134 |
1 | S | 120 |
2 | D | 106 |
3 | E | 102 |
4 | A | 100 |
5 | T | 97 |
6 | I | 96 |
7 | M | 94 |
8 | Q | 93 |
9 | V | 74 |
10 | H | 66 |
11 | R | 65 |
12 | K | 56 |
13 | P | 56 |
14 | G | 49 |
15 | Y | 41 |
16 | F | 41 |
17 | L | 40 |
18 | C | 20 |
19 | W | 18 |
Ordre des aa
aam = list(mutations['aa'])
print(aam)
['N', 'S', 'D', 'E', 'A', 'T', 'I', 'M', 'Q', 'V', 'H', 'R', 'K', 'P', 'G', 'Y', 'F', 'L', 'C', 'W']
On trie dans le même ordre que dans la matrice cm : pour chaque aan ib cherche son index dans aam pour obtenir sa mutabilité.
m = np.array( [ mutations.mut.iloc[ aam.index( aa[i] ) ] for i in range(len(aa)) ] )
m
array([100, 65, 134, 106, 20, 93, 102, 49, 66, 96, 40, 56, 94, 41, 56, 120, 97, 18, 41, 74])
f
¶Les fréquences d'apparition des aa dans les séquences sont données dans la table 22 :
frequencies = pd.read_csv('dayhoff.frequencies.csv')
frequencies
aa | freq | |
---|---|---|
0 | A | 0.087 |
1 | R | 0.041 |
2 | N | 0.040 |
3 | D | 0.047 |
4 | C | 0.033 |
5 | Q | 0.038 |
6 | E | 0.050 |
7 | G | 0.089 |
8 | H | 0.034 |
9 | I | 0.037 |
10 | L | 0.085 |
11 | K | 0.081 |
12 | M | 0.015 |
13 | F | 0.040 |
14 | P | 0.051 |
15 | S | 0.070 |
16 | T | 0.058 |
17 | W | 0.010 |
18 | Y | 0.030 |
19 | V | 0.065 |
f = np.array(frequencies.freq)
f
array([0.087, 0.041, 0.04 , 0.047, 0.033, 0.038, 0.05 , 0.089, 0.034, 0.037, 0.085, 0.081, 0.015, 0.04 , 0.051, 0.07 , 0.058, 0.01 , 0.03 , 0.065])
On met la somme des fréquences à 1
np.sum(f)
1.0010000000000001
f = f * 1/np.sum(f)
np.sum(f)
1.0
mm
¶Il s'agit maintenant de calculer la matrice de probabilité de mutation correspondant à une distance évolutive de 1-PAM (temps qu'il faut pour qu'une mutation sur 100aa soit fixée dans la population). Elle nous est donnée en figure 82 :
Ainsi que les formules pour son calcul :
Les $A_{ij}$ correspondent à notre matrice cm
. Les $m_j$ correspondent à m
.
Il nous faut donc déterminer le facteur $\lambda$ permettant de calculer mm
correspondant à 1-PAM.
Nous allons utiliser les fréquences f
pour considérer une séquence de 100aa. Elle sera donc composée de (arrondi à 1 chiffre après la virgule) :
np.round(100 * f, 1)
array([8.7, 4.1, 4. , 4.7, 3.3, 3.8, 5. , 8.9, 3.4, 3.7, 8.5, 8.1, 1.5, 4. , 5.1, 7. , 5.8, 1. , 3. , 6.5])
Chacun de ces aa sera donc muté en fonction de sa mutabilité relative m
et on souhaite mettre le total de mutations à l'échelle (avec $\lambda$) pour avoir 1 seule mutation sur 100aa.
Le taux de mutations se calcule avec la formule suivante :
$\mathrm{rate} = \lambda \sum f_i m_i$
avec $f_i$ la fréquence d'apparition des aa (aa.freq
) et $m_i$ la mutabilité de chaque aa (m
).
On peut donc en déduire $\lambda$ pour un taux de 1% : $\lambda = \frac{0.01}{\sum f_i m_i}$
lmbd = 0.01 / np.sum(f * m)
lmbd
0.00013303032719347212
A présent, il est possible de remplir la matrice de probabilité de mutation avec les formules
en dehors de la diagonale : $M_{ij} = \frac{\lambda m_j A_{ij}}{\sum_{i \ne j} A_{ij}}$ (il y a vraisemblablement une erreur dans l'article : il ne faut pas utiliser $m_j$ ici. On pourra utiliser une constante à la place → 1000)
et
pour la diagonale : $M_{ii} = 1 - \lambda m_i$
mm = lmbd * 1000 * cm / np.sum(cm)
np.fill_diagonal(mm, 1 - lmbd * m)
pprint(np.round( mm * 10000))
A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9867.0 | 1.0 | 5.0 | 7.0 | 1.0 | 4.0 | 11.0 | 25.0 | 1.0 | 3.0 | 4.0 | 2.0 | 1.0 | 1.0 | 15.0 | 33.0 | 25.0 | 0.0 | 1.0 | 15.0 |
1 | 1.0 | 9914.0 | 1.0 | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 4.0 | 1.0 | 1.0 | 20.0 | 1.0 | 0.0 | 3.0 | 6.0 | 1.0 | 1.0 | 0.0 | 1.0 |
2 | 5.0 | 1.0 | 9822.0 | 23.0 | 0.0 | 2.0 | 4.0 | 7.0 | 10.0 | 2.0 | 2.0 | 14.0 | 0.0 | 0.0 | 1.0 | 18.0 | 7.0 | 0.0 | 2.0 | 1.0 |
3 | 7.0 | 0.0 | 23.0 | 9859.0 | 0.0 | 3.0 | 35.0 | 7.0 | 2.0 | 1.0 | 0.0 | 4.0 | 0.0 | 0.0 | 0.0 | 4.0 | 2.0 | 0.0 | 0.0 | 1.0 |
4 | 1.0 | 0.0 | 0.0 | 0.0 | 9973.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 1.0 | 1.0 |
5 | 4.0 | 5.0 | 2.0 | 3.0 | 0.0 | 9876.0 | 18.0 | 1.0 | 10.0 | 0.0 | 3.0 | 6.0 | 1.0 | 0.0 | 4.0 | 2.0 | 2.0 | 0.0 | 0.0 | 1.0 |
6 | 11.0 | 0.0 | 4.0 | 35.0 | 0.0 | 18.0 | 9864.0 | 5.0 | 1.0 | 1.0 | 1.0 | 4.0 | 0.0 | 0.0 | 2.0 | 4.0 | 1.0 | 0.0 | 0.0 | 2.0 |
7 | 25.0 | 0.0 | 7.0 | 7.0 | 0.0 | 1.0 | 5.0 | 9935.0 | 0.0 | 0.0 | 1.0 | 3.0 | 0.0 | 1.0 | 2.0 | 19.0 | 2.0 | 0.0 | 0.0 | 4.0 |
8 | 1.0 | 4.0 | 10.0 | 2.0 | 0.0 | 10.0 | 1.0 | 0.0 | 9912.0 | 0.0 | 2.0 | 1.0 | 0.0 | 1.0 | 2.0 | 1.0 | 1.0 | 0.0 | 2.0 | 1.0 |
9 | 3.0 | 1.0 | 2.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 9872.0 | 11.0 | 2.0 | 2.0 | 4.0 | 0.0 | 1.0 | 5.0 | 0.0 | 1.0 | 28.0 |
10 | 4.0 | 1.0 | 2.0 | 0.0 | 0.0 | 3.0 | 1.0 | 1.0 | 2.0 | 11.0 | 9947.0 | 2.0 | 9.0 | 7.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | 13.0 |
11 | 2.0 | 20.0 | 14.0 | 4.0 | 0.0 | 6.0 | 4.0 | 3.0 | 1.0 | 2.0 | 2.0 | 9926.0 | 4.0 | 0.0 | 2.0 | 7.0 | 8.0 | 0.0 | 0.0 | 1.0 |
12 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 9.0 | 4.0 | 9875.0 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 3.0 |
13 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 4.0 | 7.0 | 0.0 | 1.0 | 9945.0 | 0.0 | 2.0 | 0.0 | 0.0 | 11.0 | 0.0 |
14 | 15.0 | 3.0 | 1.0 | 0.0 | 0.0 | 4.0 | 2.0 | 2.0 | 2.0 | 0.0 | 2.0 | 2.0 | 0.0 | 0.0 | 9926.0 | 11.0 | 3.0 | 0.0 | 0.0 | 2.0 |
15 | 33.0 | 6.0 | 18.0 | 4.0 | 5.0 | 2.0 | 4.0 | 19.0 | 1.0 | 1.0 | 1.0 | 7.0 | 1.0 | 2.0 | 11.0 | 9840.0 | 29.0 | 1.0 | 1.0 | 2.0 |
16 | 25.0 | 1.0 | 7.0 | 2.0 | 0.0 | 2.0 | 1.0 | 2.0 | 1.0 | 5.0 | 2.0 | 8.0 | 1.0 | 0.0 | 3.0 | 29.0 | 9871.0 | 0.0 | 1.0 | 8.0 |
17 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 9976.0 | 0.0 | 0.0 |
18 | 1.0 | 0.0 | 2.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 1.0 | 0.0 | 0.0 | 11.0 | 0.0 | 1.0 | 1.0 | 0.0 | 9945.0 | 1.0 |
19 | 15.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 4.0 | 1.0 | 28.0 | 13.0 | 1.0 | 3.0 | 0.0 | 2.0 | 2.0 | 8.0 | 0.0 | 1.0 | 9902.0 |
Sommes des colonnes (qui devraient être de 1)
np.sum(mm, 1)
array([1.00212053, 0.99605967, 0.99176499, 0.99469411, 0.99852452, 0.99392627, 0.99537437, 1.0011679 , 0.99514784, 0.99349756, 1.00072293, 1.00052874, 0.98995005, 0.99743238, 0.99757439, 0.99881657, 0.99714846, 0.99793983, 0.99671708, 0.99863364])
Correction pour avoir les sommes à 1
sum_to_reach = 1 - np.diag(mm)
sum_to_reach
array([0.01330303, 0.00864697, 0.01782606, 0.01410121, 0.00266061, 0.01237182, 0.01356909, 0.00651849, 0.00878 , 0.01277091, 0.00532121, 0.0074497 , 0.01250485, 0.00545424, 0.0074497 , 0.01596364, 0.01290394, 0.00239455, 0.00545424, 0.00984424])
sum_actual = np.sum(mm,1) - np.diag(mm)
sum_actual
array([0.01542356, 0.00470664, 0.00959105, 0.00879532, 0.00118513, 0.0062981 , 0.00894346, 0.00768638, 0.00392784, 0.00626847, 0.00604414, 0.00797843, 0.0024549 , 0.00288663, 0.00502409, 0.01478021, 0.0100524 , 0.00033437, 0.00217132, 0.00847788])
scales = sum_to_reach / sum_actual
np.round(scales , 2)
array([0.86, 1.84, 1.86, 1.6 , 2.25, 1.96, 1.52, 0.85, 2.24, 2.04, 0.88, 0.93, 5.09, 1.89, 1.48, 1.08, 1.28, 7.16, 2.51, 1.16])
Mise à l'échelle
foo = mm * scales
np.fill_diagonal(foo, np.diag(mm))
mm=foo
np.sum(mm, 1)
array([1.0067449 , 0.99843316, 0.99540873, 0.99895481, 0.99894783, 0.9974195 , 0.99992932, 1.00285829, 0.99806445, 0.99584044, 1.00756897, 1.00667145, 0.9903712 , 1.00014113, 0.99878996, 1.00403659, 0.99961424, 0.99813732, 0.99846064, 1.00360706])
Fonction pour calculer le taux de mutations
def mutation_rate(m, f):
return np.sum(f * (1 - np.diag(m)))
mutation_rate(mm, f)
0.010000000000000009
pprint(np.round(mm*10000))
A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9867.0 | 2.0 | 9.0 | 10.0 | 3.0 | 8.0 | 17.0 | 21.0 | 2.0 | 6.0 | 4.0 | 2.0 | 6.0 | 2.0 | 22.0 | 35.0 | 32.0 | 0.0 | 2.0 | 18.0 |
1 | 1.0 | 9914.0 | 1.0 | 0.0 | 1.0 | 10.0 | 0.0 | 0.0 | 10.0 | 3.0 | 1.0 | 19.0 | 4.0 | 1.0 | 4.0 | 6.0 | 1.0 | 8.0 | 0.0 | 1.0 |
2 | 4.0 | 1.0 | 9822.0 | 36.0 | 0.0 | 4.0 | 6.0 | 6.0 | 21.0 | 3.0 | 1.0 | 13.0 | 0.0 | 1.0 | 2.0 | 20.0 | 9.0 | 1.0 | 4.0 | 1.0 |
3 | 6.0 | 0.0 | 42.0 | 9859.0 | 0.0 | 6.0 | 53.0 | 6.0 | 4.0 | 1.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 4.0 | 3.0 | 0.0 | 0.0 | 1.0 |
4 | 1.0 | 1.0 | 0.0 | 0.0 | 9973.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 5.0 | 1.0 | 0.0 | 3.0 | 2.0 |
5 | 3.0 | 9.0 | 4.0 | 5.0 | 0.0 | 9876.0 | 27.0 | 1.0 | 23.0 | 1.0 | 3.0 | 6.0 | 4.0 | 0.0 | 6.0 | 2.0 | 2.0 | 0.0 | 0.0 | 1.0 |
6 | 10.0 | 0.0 | 7.0 | 56.0 | 0.0 | 35.0 | 9864.0 | 4.0 | 2.0 | 3.0 | 1.0 | 4.0 | 2.0 | 0.0 | 3.0 | 4.0 | 2.0 | 0.0 | 1.0 | 2.0 |
7 | 21.0 | 1.0 | 12.0 | 11.0 | 1.0 | 2.0 | 7.0 | 9935.0 | 1.0 | 0.0 | 1.0 | 2.0 | 2.0 | 1.0 | 3.0 | 21.0 | 3.0 | 0.0 | 0.0 | 5.0 |
8 | 1.0 | 8.0 | 18.0 | 3.0 | 1.0 | 20.0 | 1.0 | 0.0 | 9912.0 | 0.0 | 1.0 | 1.0 | 0.0 | 2.0 | 3.0 | 1.0 | 1.0 | 1.0 | 4.0 | 1.0 |
9 | 2.0 | 2.0 | 3.0 | 1.0 | 2.0 | 1.0 | 2.0 | 0.0 | 0.0 | 9872.0 | 9.0 | 2.0 | 12.0 | 7.0 | 0.0 | 1.0 | 7.0 | 0.0 | 1.0 | 32.0 |
10 | 3.0 | 1.0 | 3.0 | 0.0 | 0.0 | 6.0 | 1.0 | 1.0 | 4.0 | 22.0 | 9947.0 | 2.0 | 45.0 | 13.0 | 3.0 | 1.0 | 3.0 | 4.0 | 2.0 | 15.0 |
11 | 2.0 | 37.0 | 25.0 | 6.0 | 0.0 | 12.0 | 7.0 | 2.0 | 2.0 | 4.0 | 1.0 | 9926.0 | 19.0 | 0.0 | 3.0 | 8.0 | 11.0 | 0.0 | 1.0 | 1.0 |
12 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 5.0 | 8.0 | 4.0 | 9875.0 | 1.0 | 0.0 | 1.0 | 2.0 | 0.0 | 0.0 | 4.0 |
13 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 | 6.0 | 0.0 | 4.0 | 9945.0 | 0.0 | 2.0 | 1.0 | 3.0 | 28.0 | 0.0 |
14 | 13.0 | 5.0 | 2.0 | 1.0 | 1.0 | 8.0 | 3.0 | 2.0 | 5.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 9926.0 | 12.0 | 4.0 | 0.0 | 0.0 | 2.0 |
15 | 28.0 | 11.0 | 34.0 | 7.0 | 11.0 | 4.0 | 6.0 | 16.0 | 2.0 | 2.0 | 1.0 | 7.0 | 4.0 | 3.0 | 17.0 | 9840.0 | 38.0 | 5.0 | 2.0 | 2.0 |
16 | 22.0 | 2.0 | 13.0 | 4.0 | 1.0 | 3.0 | 2.0 | 2.0 | 1.0 | 11.0 | 2.0 | 8.0 | 6.0 | 1.0 | 5.0 | 32.0 | 9871.0 | 0.0 | 2.0 | 9.0 |
17 | 0.0 | 2.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 9976.0 | 1.0 | 0.0 |
18 | 1.0 | 0.0 | 3.0 | 0.0 | 3.0 | 0.0 | 1.0 | 0.0 | 4.0 | 1.0 | 1.0 | 0.0 | 0.0 | 21.0 | 0.0 | 1.0 | 1.0 | 2.0 | 9945.0 | 1.0 |
19 | 13.0 | 2.0 | 1.0 | 1.0 | 3.0 | 2.0 | 2.0 | 3.0 | 3.0 | 57.0 | 11.0 | 1.0 | 17.0 | 1.0 | 3.0 | 2.0 | 10.0 | 0.0 | 2.0 | 9902.0 |
A partir de la 1ère matrice mm
calculée précédemment, on peut maintenant dériver d'autres matrices pour des distances évolutives plus élevées. Pour 2-PAM, il faut multiplier (produit matriciel) la matrice par elle-même : np.matmul(mm, mm)
. Pour PAM250, il faut donc le faire 250 fois.
Si $M$ est diagonalisable, il existe une matrice inversible $P$ et une matrice diagonale $D$ vérifiant : $M = P \times D \times P^{−1}$
$P$ est formée des vecteurs propres de $M$ et $D$ a les valeurs propres associées sur la diagonale.
e_val, e_vec = np.linalg.eig(mm)
P = e_vec
D = np.diag(e_val)
mm250 = np.matmul(np.matmul(P, D**250) , np.linalg.inv(P))
mm250 = mm250.transpose()
np.sum(mm250,1)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
pprint(np.round(mm250 * 100))
A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 13.0 | 3.0 | 4.0 | 5.0 | 2.0 | 3.0 | 5.0 | 12.0 | 2.0 | 3.0 | 6.0 | 6.0 | 1.0 | 2.0 | 7.0 | 9.0 | 8.0 | 0.0 | 1.0 | 7.0 |
1 | 6.0 | 17.0 | 4.0 | 3.0 | 1.0 | 5.0 | 4.0 | 5.0 | 5.0 | 2.0 | 4.0 | 18.0 | 1.0 | 1.0 | 5.0 | 6.0 | 5.0 | 2.0 | 1.0 | 4.0 |
2 | 9.0 | 4.0 | 6.0 | 8.0 | 1.0 | 5.0 | 7.0 | 10.0 | 5.0 | 2.0 | 4.0 | 10.0 | 1.0 | 2.0 | 5.0 | 8.0 | 6.0 | 0.0 | 2.0 | 4.0 |
3 | 9.0 | 3.0 | 7.0 | 11.0 | 1.0 | 6.0 | 11.0 | 10.0 | 4.0 | 2.0 | 3.0 | 8.0 | 1.0 | 1.0 | 4.0 | 7.0 | 6.0 | 0.0 | 1.0 | 4.0 |
4 | 5.0 | 2.0 | 2.0 | 1.0 | 52.0 | 1.0 | 1.0 | 4.0 | 2.0 | 2.0 | 2.0 | 2.0 | 0.0 | 1.0 | 3.0 | 7.0 | 4.0 | 0.0 | 3.0 | 4.0 |
5 | 8.0 | 5.0 | 5.0 | 7.0 | 1.0 | 10.0 | 9.0 | 7.0 | 7.0 | 2.0 | 6.0 | 10.0 | 1.0 | 1.0 | 5.0 | 6.0 | 5.0 | 0.0 | 1.0 | 4.0 |
6 | 9.0 | 3.0 | 6.0 | 10.0 | 1.0 | 7.0 | 12.0 | 9.0 | 4.0 | 2.0 | 4.0 | 8.0 | 1.0 | 1.0 | 4.0 | 7.0 | 5.0 | 0.0 | 1.0 | 4.0 |
7 | 12.0 | 2.0 | 4.0 | 5.0 | 2.0 | 3.0 | 5.0 | 27.0 | 2.0 | 2.0 | 3.0 | 5.0 | 1.0 | 1.0 | 5.0 | 9.0 | 6.0 | 0.0 | 1.0 | 5.0 |
8 | 6.0 | 6.0 | 6.0 | 5.0 | 2.0 | 8.0 | 6.0 | 5.0 | 15.0 | 2.0 | 5.0 | 8.0 | 1.0 | 3.0 | 5.0 | 6.0 | 4.0 | 1.0 | 3.0 | 4.0 |
9 | 8.0 | 3.0 | 3.0 | 3.0 | 2.0 | 2.0 | 3.0 | 5.0 | 2.0 | 10.0 | 15.0 | 5.0 | 2.0 | 5.0 | 3.0 | 5.0 | 6.0 | 0.0 | 2.0 | 15.0 |
10 | 6.0 | 2.0 | 2.0 | 2.0 | 1.0 | 3.0 | 2.0 | 3.0 | 2.0 | 6.0 | 34.0 | 4.0 | 3.0 | 6.0 | 3.0 | 4.0 | 4.0 | 1.0 | 2.0 | 10.0 |
11 | 7.0 | 9.0 | 5.0 | 5.0 | 1.0 | 5.0 | 5.0 | 6.0 | 3.0 | 2.0 | 4.0 | 24.0 | 2.0 | 1.0 | 4.0 | 7.0 | 6.0 | 0.0 | 1.0 | 4.0 |
12 | 7.0 | 4.0 | 3.0 | 3.0 | 1.0 | 3.0 | 3.0 | 5.0 | 2.0 | 6.0 | 20.0 | 9.0 | 7.0 | 4.0 | 3.0 | 5.0 | 5.0 | 0.0 | 2.0 | 10.0 |
13 | 4.0 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | 2.0 | 5.0 | 13.0 | 2.0 | 2.0 | 32.0 | 2.0 | 3.0 | 3.0 | 1.0 | 15.0 | 5.0 |
14 | 11.0 | 4.0 | 4.0 | 4.0 | 2.0 | 4.0 | 4.0 | 8.0 | 3.0 | 2.0 | 5.0 | 6.0 | 1.0 | 1.0 | 19.0 | 9.0 | 6.0 | 0.0 | 1.0 | 5.0 |
15 | 11.0 | 4.0 | 5.0 | 5.0 | 3.0 | 3.0 | 5.0 | 11.0 | 3.0 | 3.0 | 4.0 | 8.0 | 1.0 | 2.0 | 6.0 | 10.0 | 8.0 | 1.0 | 2.0 | 5.0 |
16 | 11.0 | 3.0 | 4.0 | 5.0 | 2.0 | 3.0 | 5.0 | 9.0 | 2.0 | 4.0 | 6.0 | 8.0 | 1.0 | 2.0 | 5.0 | 9.0 | 11.0 | 0.0 | 2.0 | 7.0 |
17 | 2.0 | 7.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 6.0 | 4.0 | 1.0 | 4.0 | 1.0 | 4.0 | 2.0 | 55.0 | 3.0 | 2.0 |
18 | 4.0 | 2.0 | 3.0 | 2.0 | 4.0 | 2.0 | 2.0 | 3.0 | 3.0 | 3.0 | 7.0 | 3.0 | 1.0 | 20.0 | 2.0 | 4.0 | 3.0 | 1.0 | 31.0 | 4.0 |
19 | 9.0 | 2.0 | 3.0 | 3.0 | 2.0 | 2.0 | 3.0 | 7.0 | 2.0 | 9.0 | 13.0 | 5.0 | 2.0 | 3.0 | 4.0 | 6.0 | 6.0 | 0.0 | 2.0 | 17.0 |
On pourra chercher les différences avec les valeurs de la figure 83 :
Il est maintenant possible de calculer la PAM250.
Pour un alignement de l'acide aminé $AA_i$ avec l'acide aminé $AA_j$, on utilise la formule suivante :
$\frac{P(\mathrm{alignement. AA_i. avec. AA_j. est. du. a. l'evolution})}{P(\mathrm{alignement. AA_i. avec. AA_j. est. du. au. hasard})} = \frac{f_i.M_{ji}}{f_i.f_j} = \frac{f_j.M_{ij}}{f_i.f_j}$
avec
mutation.mat
précédemment calculée (mutation matrix) etaa.freq
.On en déduit la vraisemblance d'un alignement entre $AA_i$ et $AA_j$ : $L_{i,j} = \frac{f_j.M_{i,j}}{f_i.f_j} = \frac{M_{i,j}}{f_i}$
En pratique, on utilise les logarithmes car $\log( S_1 \times S_2 \times \dots \times S_n ) = \log S_1 + \log S_2 + \dots + \log S_n$.
La matrice de substitution se calcule donc à l'aide de la formule suivante : $S_{i,j} = \log_{10}\frac{M_{i,j}}{f_i}$
Il n'y a plus qu'à appliquer la formule $S_{i,j} = \log_{10}\frac{M_{i,j}}{F_i}$ avec $F_i$ la fréquence d'apparition des a.a. et $M_{i,j}$ la probabilité que l'a.a. $j$ soit remplacé par l'a.a. $i$.
PAM250= np.log10 ( mm250 / f)
pprint(np.round(PAM250*10))
A | R | N | D | C | Q | E | G | H | I | L | K | M | F | P | S | T | W | Y | V | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2.0 | -2.0 | 0.0 | 0.0 | -2.0 | -0.0 | 0.0 | 1.0 | -1.0 | -1.0 | -2.0 | -1.0 | -1.0 | -4.0 | 1.0 | 1.0 | 1.0 | -6.0 | -3.0 | 0.0 |
1 | -2.0 | 6.0 | 0.0 | -1.0 | -4.0 | 1.0 | -1.0 | -3.0 | 2.0 | -2.0 | -3.0 | 3.0 | -1.0 | -4.0 | -0.0 | -0.0 | -1.0 | 2.0 | -4.0 | -3.0 |
2 | 0.0 | -0.0 | 2.0 | 2.0 | -4.0 | 1.0 | 1.0 | 0.0 | 2.0 | -2.0 | -3.0 | 1.0 | -2.0 | -4.0 | -1.0 | 1.0 | 0.0 | -4.0 | -2.0 | -2.0 |
3 | 0.0 | -1.0 | 2.0 | 4.0 | -5.0 | 2.0 | 3.0 | 1.0 | 1.0 | -2.0 | -4.0 | 0.0 | -3.0 | -6.0 | -1.0 | 0.0 | -0.0 | -7.0 | -4.0 | -2.0 |
4 | -2.0 | -4.0 | -4.0 | -5.0 | 12.0 | -5.0 | -5.0 | -3.0 | -3.0 | -2.0 | -6.0 | -5.0 | -5.0 | -4.0 | -3.0 | -0.0 | -2.0 | -8.0 | 0.0 | -2.0 |
5 | -0.0 | 1.0 | 1.0 | 2.0 | -5.0 | 4.0 | 2.0 | -1.0 | 3.0 | -2.0 | -2.0 | 1.0 | -1.0 | -5.0 | 0.0 | -1.0 | -1.0 | -5.0 | -4.0 | -2.0 |
6 | 0.0 | -1.0 | 1.0 | 3.0 | -5.0 | 3.0 | 4.0 | 0.0 | 1.0 | -2.0 | -3.0 | -0.0 | -2.0 | -5.0 | -1.0 | -0.0 | -0.0 | -7.0 | -4.0 | -2.0 |
7 | 1.0 | -3.0 | 0.0 | 1.0 | -3.0 | -1.0 | 0.0 | 5.0 | -2.0 | -3.0 | -4.0 | -2.0 | -3.0 | -5.0 | -1.0 | 1.0 | 0.0 | -7.0 | -5.0 | -1.0 |
8 | -1.0 | 2.0 | 2.0 | 1.0 | -3.0 | 3.0 | 1.0 | -2.0 | 6.0 | -2.0 | -2.0 | -0.0 | -2.0 | -2.0 | -0.0 | -1.0 | -1.0 | -2.0 | -0.0 | -2.0 |
9 | -1.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -3.0 | -2.0 | 5.0 | 2.0 | -2.0 | 2.0 | 1.0 | -2.0 | -1.0 | 0.0 | -5.0 | -1.0 | 4.0 |
10 | -2.0 | -3.0 | -3.0 | -4.0 | -6.0 | -2.0 | -3.0 | -4.0 | -2.0 | 2.0 | 6.0 | -3.0 | 4.0 | 2.0 | -3.0 | -3.0 | -2.0 | -2.0 | -1.0 | 2.0 |
11 | -1.0 | 3.0 | 1.0 | 0.0 | -5.0 | 1.0 | -0.0 | -2.0 | -0.0 | -2.0 | -3.0 | 5.0 | 0.0 | -5.0 | -1.0 | -0.0 | 0.0 | -3.0 | -4.0 | -2.0 |
12 | -1.0 | -0.0 | -2.0 | -3.0 | -5.0 | -1.0 | -2.0 | -3.0 | -2.0 | 2.0 | 4.0 | 0.0 | 6.0 | 0.0 | -2.0 | -2.0 | -1.0 | -4.0 | -2.0 | 2.0 |
13 | -3.0 | -4.0 | -3.0 | -6.0 | -4.0 | -5.0 | -5.0 | -5.0 | -2.0 | 1.0 | 2.0 | -5.0 | 0.0 | 9.0 | -5.0 | -3.0 | -3.0 | 1.0 | 7.0 | -1.0 |
14 | 1.0 | -0.0 | -0.0 | -1.0 | -3.0 | 0.0 | -1.0 | -1.0 | -0.0 | -2.0 | -3.0 | -1.0 | -2.0 | -5.0 | 6.0 | 1.0 | 0.0 | -5.0 | -5.0 | -1.0 |
15 | 1.0 | -0.0 | 1.0 | 0.0 | 0.0 | -0.0 | -0.0 | 1.0 | -1.0 | -1.0 | -3.0 | -0.0 | -2.0 | -3.0 | 1.0 | 2.0 | 1.0 | -2.0 | -3.0 | -1.0 |
16 | 1.0 | -1.0 | 0.0 | -0.0 | -2.0 | -1.0 | -0.0 | -0.0 | -1.0 | 0.0 | -2.0 | -0.0 | -1.0 | -3.0 | 0.0 | 1.0 | 3.0 | -5.0 | -3.0 | 0.0 |
17 | -6.0 | 2.0 | -4.0 | -7.0 | -8.0 | -5.0 | -7.0 | -7.0 | -3.0 | -5.0 | -2.0 | -3.0 | -4.0 | 0.0 | -6.0 | -2.0 | -5.0 | 17.0 | -0.0 | -6.0 |
18 | -3.0 | -4.0 | -2.0 | -4.0 | 0.0 | -4.0 | -4.0 | -5.0 | -0.0 | -1.0 | -1.0 | -4.0 | -3.0 | 7.0 | -5.0 | -3.0 | -3.0 | 0.0 | 10.0 | -2.0 |
19 | 0.0 | -3.0 | -2.0 | -2.0 | -2.0 | -2.0 | -2.0 | -1.0 | -2.0 | 4.0 | 2.0 | -2.0 | 2.0 | -1.0 | -1.0 | -1.0 | 0.0 | -6.0 | -2.0 | 4.0 |
A comparer à la figure 84 :