silico.biotoul.fr
 

Prioritization:Transcriptome

From silico.biotoul.fr

(Difference between revisions)
Jump to: navigation, search
m
m
 
Line 1: Line 1:
 +
== Proximity measure ==
 +
The proximity measure implemented is currently the following: a distance matrix is built, consisting of 1-''pearson'' corelation coefficient for each pair of genes. The proximity of a candidate gene to a set of genes is then computed as the average of the "pearson distances" of the candidate gene to each known gene.
 +
 +
== Data ==
Public datasets were downloaded from [http://www.ncbi.nlm.nih.gov/geo/ GEO] as GSE series. For each species, only one platform (GPL) was selected, the one exhibiting the most hybridizations at the time of data compilation. Then a manual screen was performed to select the GSE series performed only with the selected strain (some platforms are also used with strains different from the one the array is designed for).
Public datasets were downloaded from [http://www.ncbi.nlm.nih.gov/geo/ GEO] as GSE series. For each species, only one platform (GPL) was selected, the one exhibiting the most hybridizations at the time of data compilation. Then a manual screen was performed to select the GSE series performed only with the selected strain (some platforms are also used with strains different from the one the array is designed for).
Line 9: Line 13:
The gene expression profiles are used in the '''prioritzation''' to evaluate the ''proximity'' of a given candidate gene with a set of ''known genes''.  
The gene expression profiles are used in the '''prioritzation''' to evaluate the ''proximity'' of a given candidate gene with a set of ''known genes''.  
-
 
-
The proximity measure implemented is currently the following: a distance matrix is built, consisting of 1-''pearson'' corelation coefficient for each pair of genes. The proximity of a candidate gene to a set of genes is then computed as the average of the "pearson distances" of the candidate gene to each known gene.
 
Details for each available strain:
Details for each available strain:

Current revision as of 15:19, 20 January 2011

Proximity measure

The proximity measure implemented is currently the following: a distance matrix is built, consisting of 1-pearson corelation coefficient for each pair of genes. The proximity of a candidate gene to a set of genes is then computed as the average of the "pearson distances" of the candidate gene to each known gene.

Data

Public datasets were downloaded from GEO as GSE series. For each species, only one platform (GPL) was selected, the one exhibiting the most hybridizations at the time of data compilation. Then a manual screen was performed to select the GSE series performed only with the selected strain (some platforms are also used with strains different from the one the array is designed for).

Outline:

  1. download: When available the raw data were downloaded, otherwise the normalized data were used
  2. identifiers mapping to our internal database (CGDB for complete genomes database)
  3. normalization for all the arrays

This results in an expression profile across all hybridizations for each gene.

The gene expression profiles are used in the prioritzation to evaluate the proximity of a given candidate gene with a set of known genes.

Details for each available strain: