silico.biotoul.fr
 

Prioritization:Transcriptome

From silico.biotoul.fr

Jump to: navigation, search

Proximity measure

The proximity measure implemented is currently the following: a distance matrix is built, consisting of 1-pearson corelation coefficient for each pair of genes. The proximity of a candidate gene to a set of genes is then computed as the average of the "pearson distances" of the candidate gene to each known gene.

Data

Public datasets were downloaded from GEO as GSE series. For each species, only one platform (GPL) was selected, the one exhibiting the most hybridizations at the time of data compilation. Then a manual screen was performed to select the GSE series performed only with the selected strain (some platforms are also used with strains different from the one the array is designed for).

Outline:

  1. download: When available the raw data were downloaded, otherwise the normalized data were used
  2. identifiers mapping to our internal database (CGDB for complete genomes database)
  3. normalization for all the arrays

This results in an expression profile across all hybridizations for each gene.

The gene expression profiles are used in the prioritzation to evaluate the proximity of a given candidate gene with a set of known genes.

Details for each available strain: