blue = '<font style="color: #000099;">'
black = '<font style="color: #000000;">'
red = '<font style="color: #990000;">'

Prise en main de la librairie R - `igraph`

La librairie igraph met à disposition tout un ensemble de fonctions pour le traitement et la visualisation de graphes. Elle est codée en C. Nous allons utiliser aujourd’hui son interfaçage avec R. Pour la charger :

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

Différentes manières de l’installer :

install.packages ou avec rstudio
conda : voir http://silico.biotoul.fr/p/Conda

Pour charger un graphe (différents format possibles : pajek, newick, …) :

g = read.graph("http://silico.biotoul.fr/site/images/9/9f/Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.tgr", directed=FALSE)
g

## IGRAPH e6b0444 U--- 42 408 -- 
## + edges from e6b0444:
##   [1] 1-- 2 1-- 3 1-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--10 1--11 1--12 1--13
##  [13] 1--14 1--15 1--16 1--17 1--18 1--19 1--20 1--21 1--22 1--23 1--24 1--25
##  [25] 1--26 1--27 2-- 6 2-- 7 2--10 2--16 2--17 2--18 2--21 2--22 2--23 2--26
##  [37] 3-- 4 3-- 8 3-- 9 3--11 3--12 3--14 3--15 3--16 3--19 3--20 3--24 3--25
##  [49] 3--27 3--28 3--29 3--30 3--31 3--32 3--33 3--34 3--35 3--36 3--37 3--38
##  [61] 3--39 4-- 8 4-- 9 4--11 4--12 4--14 4--15 4--16 4--19 4--20 4--24 4--25
##  [73] 4--27 4--28 4--29 4--30 4--31 4--32 4--33 4--34 4--35 4--36 4--37 4--38
##  [85] 4--39 5-- 6 5-- 7 5--13 5--16 5--17 5--18 5--21 5--22 5--26 6-- 7 6--10
##  [97] 6--13 6--14 6--16 6--17 6--18 6--21 6--22 6--23 6--26 7--10 7--13 7--14
## + ... omitted several edges

Consulter l’aide de la fonction (?read_graph) pour voir les autres formats supportés.

Pour l’afficher, plot(g) suffit, les paramètres par défauts seront utilisés et un algorithme de dessin sera sélectionné :

plot(g)

Pour l’afficher, il faut au préalable en effectuer le dessin (layout) : Soit en une ligne en passant la fonction de dessin :

plot(g, layout=layout.fruchterman.reingold)

soit en sauvegardant ce layout dans une variable :

lfr = layout.fruchterman.reingold(g)
plot(g, layout=lfr, vertex.size=3, vertex.label=NA)

Remarque : le fait de sauvegarder la disposition des sommets, (i) évite de la recalculer pour chaque plot, (ii) permet d’avoir la même disposition à chaque plot (certains algorithmes, tel que FR, ne donnent pas la même disposition à chaque fois).

Consulter l’aide des fonction plot.igraph et layout.fructhterman.reingold pour voir les options ainsi que les autres algorithmes de dessin disponibles.

Vous trouverez la documentation de la librairie sur le site dédié. Pour celle de l’interface R en ligne : https://igraph.org/r/

Pour obtenir la liste des sommets : V(g)
la liste des arêtes : E(g)

V(g)

## + 42/42 vertices, from e6b0444:
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

E(g)

## + 408/408 edges from e6b0444:
##   [1] 1-- 2 1-- 3 1-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--10 1--11 1--12 1--13
##  [13] 1--14 1--15 1--16 1--17 1--18 1--19 1--20 1--21 1--22 1--23 1--24 1--25
##  [25] 1--26 1--27 2-- 6 2-- 7 2--10 2--16 2--17 2--18 2--21 2--22 2--23 2--26
##  [37] 3-- 4 3-- 8 3-- 9 3--11 3--12 3--14 3--15 3--16 3--19 3--20 3--24 3--25
##  [49] 3--27 3--28 3--29 3--30 3--31 3--32 3--33 3--34 3--35 3--36 3--37 3--38
##  [61] 3--39 4-- 8 4-- 9 4--11 4--12 4--14 4--15 4--16 4--19 4--20 4--24 4--25
##  [73] 4--27 4--28 4--29 4--30 4--31 4--32 4--33 4--34 4--35 4--36 4--37 4--38
##  [85] 4--39 5-- 6 5-- 7 5--13 5--16 5--17 5--18 5--21 5--22 5--26 6-- 7 6--10
##  [97] 6--13 6--14 6--16 6--17 6--18 6--21 6--22 6--23 6--26 7--10 7--13 7--14
## [109] 7--16 7--17 7--18 7--21 7--22 7--23 7--26 8-- 9 8--11 8--12 8--14 8--15
## + ... omitted several edges

Quel est l’ordre du graphe ? Combien a-t-il d’arêtes ?

ordre

length(V(g))

## [1] 42

nb d’arêtes

length(E(g))

## [1] 408

On peut assigner des étiquettes aux sommets : V(g)$name = vector_of_labels

Charger les étiquettes des sommets :

Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.cod

En décomposant, appeçu du contenu du fichier :

head(read.table("http://silico.biotoul.fr/site/images/6/61/Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.cod"))

La 2ème colonne contient le nom des sommets. Il s’agit donc de l’assigner aux sommets du graphe que l’on a déjà chargé :

V(g)$name=as.character(read.table("http://silico.biotoul.fr/site/images/6/61/Cleandb_Luca_1_S_1_1_65_Iso_Tr_1-CC1.cod")[,2])
plot(g, layout=lfr, vertex.size=3, vertex.label=V(g)$name)

Les paramètres de la fonction plot de igraph sont décrits dans la documentation.

On peut redéfinir les paramètres par défaut :

igraph.options(vertex.label.cex=.8) # font size
igraph.options(vertex.label.family='sans')
igraph.options(vertex.size=3) 
igraph.options(vertex.color=NA)

Il est possible de stocker de l’information sur le graphe, les sommets et/ou les arêtes avec les fonctions dédiées :

vertex_attr(g, name="name")

##  [1] "AfulA01.AF0890"     "AperA01.APE0061"    "BantA01.AAP27658.1"
##  [4] "BcerA01.AAP10714.1" "BjapA01.BLR6158"    "BmelA02.BMEII0702" 
##  [7] "CkorA01.ACB07774.1" "CtepA01.BMPA"       "DradA01.DR2070"    
## [10] "HbutA01.ABM80170.1" "HwalA01.RBSB"       "LinnA01.TCSA"      
## [13] "MlotA01.MLL0501"    "NspeA01.ALR5361"    "PabyA01.TMPC-LIKE" 
## [16] "PaerA01.AAG03536.1" "ParsA01.ABP50747.1" "PcalA01.ABO08469.1"
## [19] "PfurA01.PF1695"     "PhorA01.PH1714"     "PislA01.ABL88151.1"
## [22] "PyaeA01.PAE3414"    "SmarA01.ABN69244.1" "SmutA01.SMU.1121C" 
## [25] "TmarA01.AAD35196.1" "TneuA01.ACB40241.1" "TonnA01.ACJ16081.1"
## [28] "AperA01.APE2592"    "BsubA01.YUFN"       "CkorA01.ACB07069.1"
## [31] "HbutA01.ABM81242.1" "MlotA01.MLR3148"    "OiheA01.OB3388"    
## [34] "ParsA01.ABP51144.1" "PcalA01.ABO09261.1" "PislA01.ABL87654.1"
## [37] "PyaeA01.PAE3121"    "SmarA01.ABN69796.1" "TneuA01.ACB40566.1"
## [40] "HmarA01.YUFN"       "HsalA01.RBSB"       "HspeA01.VNG0903C"

vertex_attr(g, name="name", index=10)

## [1] "HbutA01.ABM80170.1"

Quel est le diamètre du graphe ? (cf. diameter ou distances)

diameter(g)

## [1] 3

la même chose avec distances

distances(g)[1:6, 1:6]

##                    AfulA01.AF0890 AperA01.APE0061 BantA01.AAP27658.1
## AfulA01.AF0890                  0               1                  1
## AperA01.APE0061                 1               0                  2
## BantA01.AAP27658.1              1               2                  0
## BcerA01.AAP10714.1              1               2                  1
## BjapA01.BLR6158                 1               2                  2
## BmelA02.BMEII0702               1               1                  2
##                    BcerA01.AAP10714.1 BjapA01.BLR6158 BmelA02.BMEII0702
## AfulA01.AF0890                      1               1                 1
## AperA01.APE0061                     2               2                 1
## BantA01.AAP27658.1                  1               2                 2
## BcerA01.AAP10714.1                  0               2                 2
## BjapA01.BLR6158                     2               0                 1
## BmelA02.BMEII0702                   2               1                 0

max(distances(g))

## [1] 3

Lister les points d’articulation (articulation.points)

articulation.points(g)

## + 1/42 vertex, named, from e6b0444:
## [1] HwalA01.RBSB

Longueur moyenne des plus courts chemins (sans valuation) (average.path.length)

average.path.length(g)

## [1] 1.702671

Pour obtenir la même chose à partir de distances (il faut ajouter la diagonale (=0) correspondant à la moyenne)

mean( distances(g) + diag(42)*1.702671)

## [1] 1.702671

Afficher sa ’‘’représentation canonique’’’ (canonical.permutation) et la matrice d’adjacence correspondante

La permutation

canonical.permutation(g)

## $labeling
##  [1] 41  6 40 39  4 15 14 37 36  5 42 35  7 38 34 16  8 13 33 32 12 11  9 31 30
## [26] 10 29 28 24 23 27 26 22 25 21 20 19 18 17  3  2  1
## 
## $info
## $info$nof_nodes
## [1] 118
## 
## $info$nof_leaf_nodes
## [1] 3
## 
## $info$nof_bad_nodes
## [1] 0
## 
## $info$nof_canupdates
## [1] 1
## 
## $info$max_level
## [1] 27
## 
## $info$group_size
## [1] "6067901693952000"

La matrice d’adjacence correspondant à cette permutation

get.adjacency(permute.vertices(g, canonical.permutation(g)$labeling))

## 42 x 42 sparse Matrix of class "dgCMatrix"

##    [[ suppressing 42 column names 'HspeA01.VNG0903C', 'HsalA01.RBSB', 'HmarA01.YUFN' ... ]]

##                                                                               
## HspeA01.VNG0903C   . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## HsalA01.RBSB       1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
## HmarA01.YUFN       1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## BjapA01.BLR6158    . . . . . . 1 1 . 1 1 1 1 1 1 1 . . . . . . . . . . . . . .
## HbutA01.ABM80170.1 . . . . . 1 . 1 1 1 1 1 1 1 1 1 . . . . . . . . . . . . . .
## AperA01.APE0061    . . . . 1 . . 1 1 1 1 1 1 1 1 1 . . . . . . . . . . . . . .
## MlotA01.MLL0501    . . . 1 . . . 1 1 1 1 1 1 1 1 1 . . . . . . . . . . . . . .
## ParsA01.ABP50747.1 . . . 1 1 1 1 . 1 1 1 1 1 1 1 1 . . . . . . . . . . . . . .
## SmarA01.ABN69244.1 . . . . 1 1 1 1 . 1 1 1 1 1 1 1 . . . . . . . . . . . . . .
## TneuA01.ACB40241.1 . . . 1 1 1 1 1 1 . 1 1 1 1 1 1 . . . . . . . . . . . . . .
## PyaeA01.PAE3414    . . . 1 1 1 1 1 1 1 . 1 1 1 1 1 . . . . . . . . . . . . . .
## PislA01.ABL88151.1 . . . 1 1 1 1 1 1 1 1 . 1 1 1 1 . . . . . . . . . . . . . .
## PcalA01.ABO08469.1 . . . 1 1 1 1 1 1 1 1 1 . 1 1 1 . . . . . . . . . . . . . .
## CkorA01.ACB07774.1 . . . 1 1 1 1 1 1 1 1 1 1 . 1 1 . . . . . . . . . . . . . .
## BmelA02.BMEII0702  . . . 1 1 1 1 1 1 1 1 1 1 1 . 1 . . . . . . . . . . . . . .
## PaerA01.AAG03536.1 . . . 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . . . . . . . . . . .
## TneuA01.ACB40566.1 . . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1
## SmarA01.ABN69796.1 . . . . . . . . . . . . . . . . 1 . 1 1 1 1 1 1 1 1 1 1 1 1
## PyaeA01.PAE3121    . . . . . . . . . . . . . . . . 1 1 . 1 1 1 1 1 1 1 1 1 1 1
## PislA01.ABL87654.1 . . . . . . . . . . . . . . . . 1 1 1 . 1 1 1 1 1 1 1 1 1 1
## PcalA01.ABO09261.1 . . . . . . . . . . . . . . . . 1 1 1 1 . 1 1 1 1 1 1 1 1 1
## OiheA01.OB3388     . . . . . . . . . . . . . . . . 1 1 1 1 1 . 1 1 1 1 1 1 1 1
## CkorA01.ACB07069.1 . . . . . . . . . . . . . . . . 1 1 1 1 1 1 . 1 1 1 1 1 1 1
## BsubA01.YUFN       . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 . 1 1 1 1 1 1
## ParsA01.ABP51144.1 . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 . 1 1 1 1 1
## MlotA01.MLR3148    . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 . 1 1 1 1
## HbutA01.ABM81242.1 . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 . 1 1 1
## AperA01.APE2592    . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 . 1 1
## TonnA01.ACJ16081.1 . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 . 1
## TmarA01.AAD35196.1 . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 .
## SmutA01.SMU.1121C  . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## PhorA01.PH1714     . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## PfurA01.PF1695     . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## PabyA01.TMPC-LIKE  . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## LinnA01.TCSA       . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## DradA01.DR2070     . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## CtepA01.BMPA       . . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## NspeA01.ALR5361    . . . . . . . . 1 1 1 1 1 1 1 1 . . . . . . . . 1 1 1 1 1 1
## BcerA01.AAP10714.1 . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## BantA01.AAP27658.1 . . . . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## AfulA01.AF0890     . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 . . . . . . . . . . . . 1 1
## HwalA01.RBSB       1 1 1 . . . . . . . . . . . . 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##                                           
## HspeA01.VNG0903C   . . . . . . . . . . . 1
## HsalA01.RBSB       . . . . . . . . . . . 1
## HmarA01.YUFN       . . . . . . . . . . . 1
## BjapA01.BLR6158    . . . . . . . . . . 1 .
## HbutA01.ABM80170.1 . . . . . . . . . . 1 .
## AperA01.APE0061    . . . . . . . . . . 1 .
## MlotA01.MLL0501    . . . . . . . . . . 1 .
## ParsA01.ABP50747.1 . . . . . . . . . . 1 .
## SmarA01.ABN69244.1 . . . . . . . 1 . . 1 .
## TneuA01.ACB40241.1 . . . . . . . 1 . . 1 .
## PyaeA01.PAE3414    . . . . . . . 1 . . 1 .
## PislA01.ABL88151.1 . . . . . . . 1 . . 1 .
## PcalA01.ABO08469.1 . . . . . . . 1 . . 1 .
## CkorA01.ACB07774.1 . . . . . . . 1 . . 1 .
## BmelA02.BMEII0702  . . . . . . . 1 . . 1 .
## PaerA01.AAG03536.1 . . . . . . . 1 1 1 1 1
## TneuA01.ACB40566.1 1 1 1 1 1 1 1 . 1 1 . 1
## SmarA01.ABN69796.1 1 1 1 1 1 1 1 . 1 1 . 1
## PyaeA01.PAE3121    1 1 1 1 1 1 1 . 1 1 . 1
## PislA01.ABL87654.1 1 1 1 1 1 1 1 . 1 1 . 1
## PcalA01.ABO09261.1 1 1 1 1 1 1 1 . 1 1 . 1
## OiheA01.OB3388     1 1 1 1 1 1 1 . 1 1 . 1
## CkorA01.ACB07069.1 1 1 1 1 1 1 1 . 1 1 . 1
## BsubA01.YUFN       1 1 1 1 1 1 1 . 1 1 . 1
## ParsA01.ABP51144.1 1 1 1 1 1 1 1 1 1 1 . 1
## MlotA01.MLR3148    1 1 1 1 1 1 1 1 1 1 . 1
## HbutA01.ABM81242.1 1 1 1 1 1 1 1 1 1 1 . 1
## AperA01.APE2592    1 1 1 1 1 1 1 1 1 1 . 1
## TonnA01.ACJ16081.1 1 1 1 1 1 1 1 1 1 1 1 1
## TmarA01.AAD35196.1 1 1 1 1 1 1 1 1 1 1 1 1
## SmutA01.SMU.1121C  . 1 1 1 1 1 1 1 1 1 1 1
## PhorA01.PH1714     1 . 1 1 1 1 1 1 1 1 1 1
## PfurA01.PF1695     1 1 . 1 1 1 1 1 1 1 1 1
## PabyA01.TMPC-LIKE  1 1 1 . 1 1 1 1 1 1 1 1
## LinnA01.TCSA       1 1 1 1 . 1 1 1 1 1 1 1
## DradA01.DR2070     1 1 1 1 1 . 1 1 1 1 1 1
## CtepA01.BMPA       1 1 1 1 1 1 . 1 1 1 1 1
## NspeA01.ALR5361    1 1 1 1 1 1 1 . 1 1 1 1
## BcerA01.AAP10714.1 1 1 1 1 1 1 1 1 . 1 1 1
## BantA01.AAP27658.1 1 1 1 1 1 1 1 1 1 . 1 1
## AfulA01.AF0890     1 1 1 1 1 1 1 1 1 1 . 1
## HwalA01.RBSB       1 1 1 1 1 1 1 1 1 1 1 .

Lister les composantes connexes (clusters) avant et après suppression du point d’articulation

clusters(g)

## $membership
##     AfulA01.AF0890    AperA01.APE0061 BantA01.AAP27658.1 BcerA01.AAP10714.1 
##                  1                  1                  1                  1 
##    BjapA01.BLR6158  BmelA02.BMEII0702 CkorA01.ACB07774.1       CtepA01.BMPA 
##                  1                  1                  1                  1 
##     DradA01.DR2070 HbutA01.ABM80170.1       HwalA01.RBSB       LinnA01.TCSA 
##                  1                  1                  1                  1 
##    MlotA01.MLL0501    NspeA01.ALR5361  PabyA01.TMPC-LIKE PaerA01.AAG03536.1 
##                  1                  1                  1                  1 
## ParsA01.ABP50747.1 PcalA01.ABO08469.1     PfurA01.PF1695     PhorA01.PH1714 
##                  1                  1                  1                  1 
## PislA01.ABL88151.1    PyaeA01.PAE3414 SmarA01.ABN69244.1  SmutA01.SMU.1121C 
##                  1                  1                  1                  1 
## TmarA01.AAD35196.1 TneuA01.ACB40241.1 TonnA01.ACJ16081.1    AperA01.APE2592 
##                  1                  1                  1                  1 
##       BsubA01.YUFN CkorA01.ACB07069.1 HbutA01.ABM81242.1    MlotA01.MLR3148 
##                  1                  1                  1                  1 
##     OiheA01.OB3388 ParsA01.ABP51144.1 PcalA01.ABO09261.1 PislA01.ABL87654.1 
##                  1                  1                  1                  1 
##    PyaeA01.PAE3121 SmarA01.ABN69796.1 TneuA01.ACB40566.1       HmarA01.YUFN 
##                  1                  1                  1                  1 
##       HsalA01.RBSB   HspeA01.VNG0903C 
##                  1                  1 
## 
## $csize
## [1] 42
## 
## $no
## [1] 1

Une seule, bien sûr.

Suppression du point d’articulation

g1 = delete_vertices(g, articulation_points(g))
plot(g1)

ou aussi

plot(g - articulation_points(g))

Composantes connexes

clusters(g1)

## $membership
##     AfulA01.AF0890    AperA01.APE0061 BantA01.AAP27658.1 BcerA01.AAP10714.1 
##                  1                  1                  1                  1 
##    BjapA01.BLR6158  BmelA02.BMEII0702 CkorA01.ACB07774.1       CtepA01.BMPA 
##                  1                  1                  1                  1 
##     DradA01.DR2070 HbutA01.ABM80170.1       LinnA01.TCSA    MlotA01.MLL0501 
##                  1                  1                  1                  1 
##    NspeA01.ALR5361  PabyA01.TMPC-LIKE PaerA01.AAG03536.1 ParsA01.ABP50747.1 
##                  1                  1                  1                  1 
## PcalA01.ABO08469.1     PfurA01.PF1695     PhorA01.PH1714 PislA01.ABL88151.1 
##                  1                  1                  1                  1 
##    PyaeA01.PAE3414 SmarA01.ABN69244.1  SmutA01.SMU.1121C TmarA01.AAD35196.1 
##                  1                  1                  1                  1 
## TneuA01.ACB40241.1 TonnA01.ACJ16081.1    AperA01.APE2592       BsubA01.YUFN 
##                  1                  1                  1                  1 
## CkorA01.ACB07069.1 HbutA01.ABM81242.1    MlotA01.MLR3148     OiheA01.OB3388 
##                  1                  1                  1                  1 
## ParsA01.ABP51144.1 PcalA01.ABO09261.1 PislA01.ABL87654.1    PyaeA01.PAE3121 
##                  1                  1                  1                  1 
## SmarA01.ABN69796.1 TneuA01.ACB40566.1       HmarA01.YUFN       HsalA01.RBSB 
##                  1                  1                  2                  2 
##   HspeA01.VNG0903C 
##                  2 
## 
## $csize
## [1] 38  3
## 
## $no
## [1] 2

Obtenir le line graph (line.graph)

lg = line.graph((g))
plot(lg, vertex.label=NA)

Il semble ressembler à g mais en plus dense.

Fichier dressing au format ncol : dressing.ncol, Bellman-Ford.ncol et celui pour Floyd-Warshall Floyd-Warshall.ncol

dressing

Graphe utilisé pour l’algorithme Bellman-Ford

Graphe utilisé pour l’algorithme Floyd-Warshall

Effectuer le tri topologique du graphe dressing.

le graphe

dressing = read.graph('http://silico.biotoul.fr/site/images/0/02/M1BBS_Graphe_dressing.ncol', format='ncol', directed=T)
dressing

## IGRAPH 94fca13 DN-- 8 8 -- 
## + attr: name (v/c)
## + edges from 94fca13 (vertex names):
## [1] chaussettes   ->chaussures pantalon      ->chaussures
## [3] chemise       ->ceinture   cravate       ->veste     
## [5] ceinture      ->veste      pantalon      ->ceinture  
## [7] sous-vetements->chaussures sous-vetements->pantalon

tri topologique (la réponse à la question s’il est sans circuit est plus bas)

topological.sort(dressing)

## + 8/8 vertices, named, from 94fca13:
## [1] chaussettes    chemise        cravate        sous-vetements pantalon      
## [6] chaussures     ceinture       veste

Parcours en largeur et en profondeur, et Bellman-Ford (graph.bfs, graph.dfs, distances)

Parcours en largeur

p=bfs(dressing, root='sous-vetements', rank=TRUE, dist=TRUE, unreachable=FALSE)
p

## $root
## [1] 8
## 
## $neimode
## [1] "out"
## 
## $order
## + 8/8 vertices, named, from 94fca13:
## [1] sous-vetements chaussures     pantalon       ceinture       veste         
## [6] <NA>           <NA>           <NA>          
## 
## $rank
##    chaussettes     chaussures       pantalon        chemise       ceinture 
##            NaN              2              3            NaN              4 
##        cravate          veste sous-vetements 
##            NaN              5              1 
## 
## $father
## NULL
## 
## $pred
## NULL
## 
## $succ
## NULL
## 
## $dist
##    chaussettes     chaussures       pantalon        chemise       ceinture 
##            NaN              1              1            NaN              2 
##        cravate          veste sous-vetements 
##            NaN              3              0

Affichage

plot(dressing, vertex.label = paste(V(dressing)$name,':', p$dist) , vertex.size=6)

Parcours en profondeur

dfs(dressing, root='sous-vetements', unreachable=FALSE, father=T, dist=T)

## $root
## [1] 7
## 
## $neimode
## [1] "out"
## 
## $order
## + 8/8 vertices, named, from 94fca13:
## [1] sous-vetements chaussures     pantalon       ceinture       veste         
## [6] <NA>           <NA>           <NA>          
## 
## $order.out
## NULL
## 
## $father
## + 8/8 vertices, named, from 94fca13:
## [1] chaussettes    chaussures     pantalon       chemise        ceinture      
## [6] cravate        veste          sous-vetements
## 
## $dist
##    chaussettes     chaussures       pantalon        chemise       ceinture 
##            NaN              1              1            NaN              2 
##        cravate          veste sous-vetements 
##            NaN              3              0

unreachable est TRUE par défaut et correspond à refaire les appels récursifs depuis les sommets non atteints à partir du sommet source fourni.

Sans circuit/cycle ?

is_dag(dressing)

## [1] TRUE

Bellman-Ford

bf = read.graph('http://silico.biotoul.fr/site/images/b/b0/M1BBS_Graphe_Bellman-Ford.ncol', format='ncol', directed=T)
bf.layout=layout.circle(bf)
bf

## IGRAPH 9db7d37 DNW- 5 10 -- 
## + attr: name (v/c), weight (e/n)
## + edges from 9db7d37 (vertex names):
##  [1] A->B A->E B->C B->D B->E C->B D->A D->C E->C E->D

Poids présents ?

E(bf)$weight

##  [1]  6  7  5 -4  8 -2  2  7 -3  9

Affichage

plot(bf, layout=bf.layout, edge.label=E(bf)$weight, vertex.size=15)

Application de BF

d=distances(bf, v='A', mode='out', algo='bellman-ford', weights = E(bf)$weight)
d

##   A B E C  D
## A 0 2 7 4 -2

Affichage

plot(bf, layout=bf.layout, edge.label=E(bf)$weight, vertex.label = paste(V(bf)$name,':', d), vertex.size=35 )

Pour le graphe FW, se sera plutôt l’algorithme de Johnson qui sera utilisé

fw = read.graph('http://silico.biotoul.fr/site/images/d/dd/M1BBS_Graphe_Graphe.Floyd-Warshall.ncol', format='ncol', directed=T)
fw.layout=layout.circle(bf)
fw

## IGRAPH 35693c8 DNW- 5 9 -- 
## + attr: name (v/c), weight (e/n)
## + edges from 35693c8 (vertex names):
## [1] A->B A->C A->E B->D B->E C->B D->A D->C E->D

Poids

E(fw)$weight

## [1]  3  8 -4  1  7  4  2 -5  6

Affichage

plot(fw, layout=bf.layout, edge.label=E(fw)$weight, vertex.size=15)

Plus courtes distances

shortest.paths(fw, mode='out')

##   A  B  C  E D
## A 0  1 -3 -4 2
## B 3  0 -4 -1 1
## C 7  4  0  3 5
## E 8  5  1  0 6
## D 2 -1 -5 -2 0

Autre exemple à partir d’une matrice d’adjacence ou d’un data.frame

Création d’un graphe à partir d’une matrice d’adjacence :

M = matrix(c(0,0,0,0, 1,0,1,0, 0,0,1,0, 4,2,0,0), ncol=4)
M

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    0    4
## [2,]    0    0    0    2
## [3,]    0    1    1    0
## [4,]    0    0    0    0

Affichage

g=graph.adjacency(M, mode='directed', weighted=TRUE)
plot(g, edge.width=E(g)$weight)

Librairie très utilisée maintenant pour la manipulation de données :

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## x purrr::compose()       masks igraph::compose()
## x tidyr::crossing()      masks igraph::crossing()
## x dplyr::filter()        masks stats::filter()
## x dplyr::groups()        masks igraph::groups()
## x dplyr::lag()           masks stats::lag()
## x purrr::simplify()      masks igraph::simplify()

Création à partir d’un data.frame (celui du 1er TP)

links.detailed = read_delim("511145.protein.links.detailed.v11.5.txt", delim=" ")

## Rows: 1083186 Columns: 10

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: " "
## chr (2): protein1, protein2
## dbl (8): neighborhood, fusion, cooccurence, coexpression, experimental, data...

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

links.detailed

Comme lors du 1er TP, ne gardons que les liens de coexpression et experimental >=800 :

links.filtered = links.detailed %>% 
  filter(protein1<protein2 & (coexpression>=800 | experimental>=800)) %>%
  select(protein1, protein2, coexpression, experimental)
links.filtered

Utilisons des noms de sommets plus lisibles :

proteins = read_delim("511145.protein.info.v11.5.txt", delim="\t")

## Rows: 4127 Columns: 4

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (3): #string_protein_id, preferred_name, annotation
## dbl (1): protein_size

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

proteins

g = graph_from_data_frame(d = links.filtered, directed = F, vertices = proteins)
V(g)$name = V(g)$preferred_name

Exploration

Accès aux sommets/arêtes : V(g) et E(g).

length(V(g))

## [1] 4127

Arêtes

length(E(g))

## [1] 5425

Beaucoup de sommets isolés (suite au filtre score >=800)

plot(g, vertex.label=NA, vertex.size=2)

Ne gardon que la plus grande CC

CC = clusters(g)
which ( CC$csize == max(CC$csize) )

## [1] 5

Extraction du sous-graphe induit

v5 = which(CC$membership == 5)
g5 = induced_subgraph(g, v5)
fr = layout.fruchterman.reingold(g5)
plot(g5, layout=fr, vertex.label=NA)

Diamètre

diameter(g5)

## [1] 17

C’est le plus long des plus courts chemins

shortest_distances = distances(g5)
max(shortest_distances)

## [1] 17

la betweenness d’un sommet ou d’une arête est le nombre de plus courts chemins passant par le sommet ou l’arête. Utiliser les fonctions `betweenness` et `edge.betweenness` pour calculer cette valeur et l’ajouter au dessin.

Centralité des sommets

node.betweenness = betweenness(g5)
# normalization min-max
minmax = function(vec, newmin=0, newmax=1) newmin + (newmax - newmin) * (vec - min(vec)) / (max(vec) - min(vec))
nb = minmax(node.betweenness, newmin=2, newmax=15)
plot(g5, layout=fr, vertex.size = nb, vertex.label=NA)

Centralité des arêtes

V(g5)$size = nb
eb = edge_betweenness(g5)
E(g5)$width = edge.width=minmax(eb, newmin=1, newmax=10)
plot(g5, layout=fr,  vertex.label=NA)

Arbre couvrant de poids minimum (`minimum.spanning.tree`)

mst = minimum.spanning.tree(g5)
mst

## IGRAPH 130610c UN-- 688 687 -- 
## + attr: name (v/c), preferred_name (v/c), protein_size (v/n),
## | annotation (v/c), size (v/n), coexpression (e/n), experimental (e/n),
## | width (e/n)
## + edges from 130610c (vertex names):
##  [1] carA--carB ftsL--ftsQ ftsI--ftsQ ftsW--ftsQ ftsA--ftsZ aceE--aceF
##  [7] aceF--lpd  ftsI--mrcB tsf --pyrH fabZ--lpxA bamA--rcsF metQ--metN
## [13] metI--metN dnaE--dnaQ yaaJ--prpC secA--phoA sbcC--sbcD phoB--phoR
## [19] dnaJ--yajL aceE--yajL cyoE--cyoB cyoD--cyoB clpP--clpX dnaK--htpG
## [25] glsA--ybaT carA--allB purK--purE fepA--entF fes --entC entF--entC
## [31] entD--entE entF--entE entF--entB fepG--entB entF--entA entB--entH
## + ... omitted several edges

Affichage

plot(mst, vertex.label=NA)

Partitionnement et détection de communautéss

Pour partitionner le graphe en communautés, différentes méthodes sont disponibles. Vous allez utilisez la betweenness des arêtes pour effectuer le partitionnement du graphe. Cette méthode sélectionne les arêtes dont la betweenness est la plus importantes afin de former des communautés (clusters).

com = edge.betweenness.community(g5)
par(cex=.5)
com

## IGRAPH clustering edge betweenness, groups: 22, mod: 0.38
## + groups:
##   $`1`
##    [1] "yaaJ" "lpxD" "fabZ" "lpxA" "yahF" "prpC" "cyoE" "cyoD" "cyoC" "cyoB"
##   [11] "cyoA" "fdrA" "gltA" "sdhC" "sdhD" "sdhA" "sdhB" "sucA" "sucB" "sucC"
##   [21] "sucD" "ybjT" "hcr"  "serS" "pyrD" "fabA" "pyrC" "plsX" "fabH" "fabD"
##   [31] "acpP" "fabF" "dauA" "fabI" "pfo"  "paaE" "pqqL" "sodB" "yeaX" "preT"
##   [41] "preA" "yfaE" "nuoN" "nuoM" "nuoL" "nuoK" "nuoJ" "nuoI" "nuoH" "nuoG"
##   [51] "nuoF" "nuoE" "nuoC" "nuoB" "nuoA" "ackA" "pta"  "fabB" "aroC" "eutD"
##   [61] "maeB" "hyfC" "hyfD" "hyfH" "hmp"  "acpS" "hycF" "hycD" "sdhE" "tdcD"
##   [71] "yraR" "bioH" "yhjJ" "spoT" "glmU" "atpB" "fre"  "sodA" "fpr"  "frdD"
##   [81] "frdC" "frdB" "frdA" "mscM" "ytfE" "qorB" "ylbE"
##   + ... omitted several groups/vertices

Contenu

names(com)

## [1] "removed.edges"    "edge.betweenness" "merges"           "bridges"         
## [5] "modularity"       "membership"       "names"            "vcount"          
## [9] "algorithm"

modularity(com)

## [1] 0.3838229

Affichage du meilleur partitionnement

plot(com, g5, layout=fr, vertex.size=6, vertex.label=NA, main=paste0("Q = ", round(modularity(com), 3)))

Layout

L’algorithme de Fruchterman-Reingold donne un résultats assez stisfaisant généralement sur les réseaux biologiques. Il modélise assez simplement un modèle physique (cf. cours). Nous allons comparer les performances sur une autre structure de grapĥe : un treillis.

lat=graph.lattice(length=10,dim=2)
# layout
lat.FR=layout.fruchterman.reingold(lat)
lat.MDS=layout.mds(lat)
# tracé
plot(lat, layout=lat.FR, vertex.size=3, vertex.label=NA)

MDS

plot(lat, layout=lat.MDS, vertex.size=3, vertex.label=NA)

Sur un petit graphe, les résultats sont assez comparables. Essayons sur un graphe un peu plus grand

lat=graph.lattice(length=30,dim=2)
# layout
lat.FR=layout.fruchterman.reingold(lat)
lat.MDS=layout.mds(lat)
# tracé
plot(lat, layout=lat.FR, vertex.size=3, vertex.label=NA)

MDS

plot(lat, layout=lat.MDS, vertex.size=3, vertex.label=NA)

FR se retrouve bloqué dans une configuration stable non optimale.

Une implémentation plus efficace de MDS consiste à ne considérer qu’un petit nombre de sommets “pivots” : High Dimensional Embedding (HDE).

Principes :

calculer les plus courts chemins entre chaque paire de sommets (utiliser la fonction shortest.paths(g))
déterminer les sommets pivots
- le premier est pris au hasard
- les suivants sont sélectionnés l’un après l’autre en maximisant à chaque fois leur distance aux pivots déjà sélectionnés
effectuer une ACP sur les coordonnées des sommets du graphes sur chacun des axes pivots
effectuer la projection par rapport aux 2 ou 3 composantes principales

pivot_selection = function(g, n.pivots) {
    g.sp = shortest.paths(g)
    g.pivots = c( 1 )
    g.others = 2:length(V(g))
    while ( length(g.pivots) < n.pivots ) {
        p.new.i = 1 # index in g.others
        p.new.best.i = g.others[1] # intialize with index of first non pivot
        p.new.best.dist = 0
        # iterate over non pivots (g.others)
        while ( p.new.i <= length(g.others) ) {
            p.new = g.others[ p.new.i ]
            p.new.dist = 0
            for (p.old in g.pivots) { # add shortest path to each selected pivots
                p.new.dist = p.new.dist + g.sp[ p.old, p.new ]
            }
            # keep it if it maximizes distance
            if (p.new.dist > p.new.best.dist) {
                p.new.best.dist = p.new.dist
                p.new.best.i = p.new.i
            }
            p.new.i = p.new.i + 1
        }
        # add best pivot to selected pivots
        g.pivots = c(g.pivots, g.others[p.new.best.i])
        # remove best pivot from non pivots
        g.others = g.others[ -p.new.best.i ]
    }
    g.pivots
}

pca = function(m, normalization=FALSE) {
  m = as.matrix(m)
  if (normalization) {
    m = apply(m, 2, function(x) x-mean(x))
    var.n = function(x) sum((x-mean(x))**2)/length(x)
    m = apply(m, 2, function(x) x/sqrt(var.n(x)) )
  }
  n = nrow(m)
  COV = 1/n * t(m) %*% m
  E=eigen(COV)
  E$values/sum(E$values)
  m.proj = m %*% E$vectors
  list(
    variances = E$values**2,
    eigen.vectors = E$vectors,
    coordinates = m.proj
  )
}

layout.hde = function(g, n.pivots=50, ndim=2) {
  # adjust number of pivots
  if (length(V(g)) <= n.pivots) {
    n.pivots = round( length(V(g))/4 )
  }
  g.pivots = pivot_selection(g, n.pivots)
  # PCA
  # coordinates in g.pivots dimensions
  g.sp = shortest.paths(g)
  U = g.sp[, g.pivots]
  pca.res = pca(U, normalization = T)
  pca.res$coordinates[, 1:ndim]
}

Utilisation

lat.HDE=layout.hde(lat, n.pivots=10)
plot(lat, layout=lat.HDE, vertex.size=3, vertex.label=NA)

R Notebook - Prise en main igraph

Prise en main de la librairie R - igraph

Autre exemple à partir d’une matrice d’adjacence ou d’un data.frame

la betweenness d’un sommet ou d’une arête est le nombre de plus courts chemins passant par le sommet ou l’arête. Utiliser les fonctions betweenness et edge.betweenness pour calculer cette valeur et l’ajouter au dessin.

Arbre couvrant de poids minimum (minimum.spanning.tree)

Partitionnement et détection de communautéss

Layout

Prise en main de la librairie R - `igraph`

la betweenness d’un sommet ou d’une arête est le nombre de plus courts chemins passant par le sommet ou l’arête. Utiliser les fonctions `betweenness` et `edge.betweenness` pour calculer cette valeur et l’ajouter au dessin.

Arbre couvrant de poids minimum (`minimum.spanning.tree`)