silico.biotoul.fr
 

Prioritization:Transcriptome

From silico.biotoul.fr

(Difference between revisions)
Jump to: navigation, search
(Created page with '= Strains = * ''Escherichia coli'' K12')
m
Line 1: Line 1:
-
= Strains =
+
Public datasets were downloaded from [http://www.ncbi.nlm.nih.gov/geo/ GEO] as GSE series. For each species, only one platform (GPL) was selected, the one exhibiting the most hybridizations at the time of data compilation. Then a manual screen was performed to select the GSE series performed only with the selected strain (some platforms are also used with strains different from the one the array is designed for).
-
* ''Escherichia coli'' K12
+
 
 +
Outline:
 +
# download: When available the raw data were downloaded, otherwise the normalized data were used
 +
# identifiers mapping to our internal database (CGDB for complete genomes database)
 +
# normalization for all the arrays
 +
 
 +
This results in an expression profile across all hybridizations for each gene.
 +
 
 +
The gene expression profiles are used in the '''prioritzation''' to evaluate the ''proximity'' of a given candidate gene with a set of ''known genes''.
 +
 
 +
The proximity measure implemented is currently the following: a distance matrix is built, consisting of 1-''pearson'' corelation coefficient for each pair of genes. The proximity of a candidate gene to a set of genes is then computed as the average of the "pearson distances" of the candidate gene to each known gene.
 +
 
 +
Details for each available strain:
 +
* [[Prioritization:BsubA Transcriptome|''Bacillus subtilis'' 168]]
 +
* [[Prioritization:EcolA Transcriptome|''Escherichia coli'' K12]]
 +
* [[Prioritization:HspeA Transcriptome|''Halobacterium'' sp. NRC-1]]
 +
* [[Prioritization:PaerA Transcriptome|''Pseudomonas aeruginosa'' PAO1]]
 +
* [[Prioritization:SaurH Transcriptome|''Staphylococcus aureus'' NCTC 8325]]
 +
* [[Prioritization:SspeA Transcriptome|''Synechocystis'' sp. PCC6803]]
 +
* [[Prioritization:StypA Transcriptome|''Salmonella typhimurium'' LT2]]

Revision as of 14:31, 18 January 2011

Public datasets were downloaded from GEO as GSE series. For each species, only one platform (GPL) was selected, the one exhibiting the most hybridizations at the time of data compilation. Then a manual screen was performed to select the GSE series performed only with the selected strain (some platforms are also used with strains different from the one the array is designed for).

Outline:

  1. download: When available the raw data were downloaded, otherwise the normalized data were used
  2. identifiers mapping to our internal database (CGDB for complete genomes database)
  3. normalization for all the arrays

This results in an expression profile across all hybridizations for each gene.

The gene expression profiles are used in the prioritzation to evaluate the proximity of a given candidate gene with a set of known genes.

The proximity measure implemented is currently the following: a distance matrix is built, consisting of 1-pearson corelation coefficient for each pair of genes. The proximity of a candidate gene to a set of genes is then computed as the average of the "pearson distances" of the candidate gene to each known gene.

Details for each available strain: