background image

 

- 17 - 

Section 8: Conversion of gene symbols to Entrez IDs 

 

Expression data 

If the expression data and interactome data contain gene labels in different formats, 

i.e.

 

one corresponds to Entrez IDs and the other to gene symbols, then the gene symbols are 
mapped  to  Entrez  IDs.  For  the  mapping  process,  we  utilize  the  Bioconductor  [18] 
annotation  files  and,  in  some  instances,  the  gene  symbol  to  Entrez  ID  mapping  is 
unavailable. The list of gene symbols that could not be mapped to Entrez IDs is saved in 
the following files – 
 
a.

 

Error_Expr.txt

:  This  file  contains  the  gene  symbols  that  were  present  in  the 

expression data but could not be mapped to Entrez IDs. 

 
b.

 

Error_PPI_Hubs.txt

: This file contains the hubs for which the gene symbols could not 

be mapped to Entrez IDs. 

 
c.

 

Error_PPI_Int.txt

:  This file contains the interactors for which the gene symbols could 

not be mapped to Entrez IDs 

 

Interactome data 

The  functions 

generatePpiMap

  and 

generateMicroRnaMap

  are  used  to  generate  the 

interactome data  files. These functions return  the  hub-interactor  pairs in two formats – 
one corresponding to Entrez IDs and the other to gene symbols. As mentioned earlier, in 
some instances, the Entrez ID to gene symbol mapping is unavailable and the Entrez IDs 
that could not be mapped to gene symbols are saved in the following files – 
 
a.

 

Error_Mirnome_Generation.txt

: This file contains the microRNA-gene pairs for which 

the Entrez IDs could not be mapped to gene symbols 

 
b.

 

Error_PPI_Generation.txt

: This file contains the gene-gene pairs (hub-interactor pairs) 

for  which  the  Entrez  IDs  (for  at  least  one  of  the  genes  in  the  pair)  could  not  be 
mapped to gene symbols. 

 

 
  
 
 

 

Summary of Contents for GLSY22522

Page 1: ...VAN Package User Guide Version 1 0 ...

Page 2: ...ization using R or Cytoscape an example 7 5 Meta analysis of multiple datasets an example 9 6 Generating microRNA protein based interactome an example 10 7 Understanding input data and parameters 11 8 Conversion of gene symbols to Entrez Ids 15 9 Understanding output data 16 10 Combining output data with known cancer annotation 17 11 Measures of association 18 12 References 20 ...

Page 3: ...llation instructions Packages to download For Windows VAN_1 0 0 zip VANData_1 0 0 zip For Unix VAN_1 0 0 tar gz VANData_1 0 0 tar gz For Mac VAN_1 0 0 tgz VANData_1 0 0 tgz Example dataset to download Example_DataSet zip ...

Page 4: ...and VANData packages VAN_1 0 0 zip and VANData_1 0 0 zip are saved in the directory C My_Packages Now at the R command prompt type setwd C My_Packages Note that the separator has to be and not for example setwd C My_Packages will result in an error message install packages VANData_1 0 0 zip repos NULL install packages VAN_1 0 0 zip repos NULL For Mac users set the working directory using the appro...

Page 5: ...ificance The test corresponds to the null hypothesis that there is no change in association between a hub and its interactors across biological states 8 Gene_Output_1_Cor txt An example output file containing the correlations for all hub interactor pairs in two states namely StateA and StateB 9 Gene_Output_2_Cor txt An example output file containing the correlations for all hub interactor pairs in...

Page 6: ... 5 ...

Page 7: ...owever these warning messages will not affect the execution of the program 1 Combining gene expression data with a protein protein interaction PPI dataset Two conditions identifySignificantHubs exprFile Gene_Expr_Two_Conditions txt labelIndex 1 mapFile PPI_Map txt outFile Test_Output_PPI txt randomizeCount 10 For an explanation of input output data refer sections 7 and 9 If only a subset of the hu...

Page 8: ...e Gene_Expr_Four_Conditions txt labelIndex 1 mapFile PPI_Map txt outFile Test_Output_PPI_Four_Cond txt randomizeCount 10 assocType FSTAT 4 Combining gene expression and microRNA expression data with a microRNA target interactome Multiple conditions If multiple conditions are to be evaluated for a combination of gene and microRNA expression data then the exprFile input parameter should contain two ...

Page 9: ...ping at the R command prompt visualizeNetwork inputFile Gene_Output_1_Cor txt inputHub ABL1 paletteVector c red yellow green For an explanation of the data file to be provided as input refer Section 9 Similarly for an example visualization of a microRNA and its interactome at the R command prompt type visualizeNetwork inputFile Micro_Output_1_Cor txt inputHub hsa miR 551a paletteVector c red yello...

Page 10: ... file this will be the VAN output file Gene_Output_1_Cor_Signif txt b At the Advanced options check Show Text File Import Options c At the Text File Import Options uncheck the Delimiter Space d At Attribute Names check Transfer first line as attribute names e At the Interaction Definition options select Column 1 as the Source Interaction and Column 2 as the Target Interaction f At the Preview clic...

Page 11: ...ltiple conditions the above procedure is followed with additional Columns activated at the data import step refer to 3f above This should enable multiple States to be available for viewing as described at 7 above ...

Page 12: ...ession datasets However to illustrate the meta analysis feature of our package we use the output files provided in the Example dataset Section 2 At the R command prompt type inputFileVect c Gene_Output_1 txt Gene_Output_2 txt Gene_Output_3 txt Fisher s combined test summarizeHubData fileNames inputFileVect outFile Summary_Mann_Fisher txt metaAnalysis Fisher RankProd This is the RankProd implementa...

Page 13: ...m MC_Mirnome In each of the three instances two output files are generated one with suffix Entrez and the other with suffix Symb e g Test_PPI_Entrez txt and Test_PPI_Symb txt The former contains the hub interactor pairs as Entrez IDs and the latter as gene symbols During the generation of the interactome files some error files may also be generated refer Section 8 The function generateMicroRnaMap ...

Page 14: ...propriate label should be assigned to each of the N samples The keyword LABEL_END is used to separate the labels from the actual expression values and allows the user to provide in a single expression data file multiple ways of grouping the samples Figure 2 For example let us assume that the N samples can be grouped based on disease status say StateA and StateB and mutation say Wt and Mut and we a...

Page 15: ...nteractor genes and every interactor gene is connected to only the hub gene The main function for identifying the enriched network modules is identifySignificantHubs This function has 14 input parameters and a detailed description of the parameters is provided in the PDF file VAN_Package_Functions pdf Of the 14 input parameters only four have to be explicitly specified by the user exprFile labelIn...

Page 16: ...ent If the number of conditions is more than two then this parameter must be set to FSTAT labelVect NULL By default this value is set to NULL and implies that all the N samples in the expression data are used for measuring association Sometimes one may be interested in evaluating only a subset of conditions present in the expression data For example the N samples in the expression data may corresp...

Page 17: ...ld be set to ENTREZ outputDataType SYMB By default the output files save the hub and interactors as gene symbols However the user can choose to save the two as Entrez IDs by setting this parameter to ENTREZ Species Human Currently only human is supported inputCores 4 This denotes the number of microprocessor cores that are available for executing the code in parallel The number is decreased automa...

Page 18: ... symbols could not be mapped to Entrez IDs c Error_PPI_Int txt This file contains the interactors for which the gene symbols could not be mapped to Entrez IDs Interactome data The functions generatePpiMap and generateMicroRnaMap are used to generate the interactome data files These functions return the hub interactor pairs in two formats one corresponding to Entrez IDs and the other to gene symbol...

Page 19: ...oscape 2 program the second file _Cor is filtered refer Section 4 Option 2 Both the two and multiple condition files are suitable for upload and visualisation using Cytoscape Meta analysis The function summarizeHubData is used to perform meta analysis and aggregate the results obtained using multiple datasets The meta analysis output file contains all the modules that were tested for enrichment in...

Page 20: ...dules in the output file Gene_Output_1 txt Section 2 correspond to known cancer genes at the R command prompt type obtainCancerInfo hubFile Gene_Output_1 txt cancerAnnotationFile Cancer_Gene_Census xls outFile Hub_CIC_Info txt The output file Hub_CIC_Info txt contains hubs with unadjusted p value 0 05 that map to known cancer genes Unlike the microRNA interactomes which are updated regularly in th...

Page 21: ...r a given hub interactor pair i e a u r pair 1 1 1 1 1 1 B B i i B r u B B i r r u u r s s n X X X X ρ 1 and 2 2 2 1 2 2 B B i i B r u B B i r r u u r s s n X X X X ρ 2 where B1 and B2 denote the two biological states 1 B n and 2 B n denote the number of samples in B1 and B2 respectively i u X and i r X denote the expression value for the hub and the interactor respectively in the ith sample 1 B u...

Page 22: ...C 1 1 1 1 1 1 1 1 B B B i B i B r u B B i r r u u r s s n X X X X ρ and 2 2 2 2 2 1 2 2 B B B i B i B r u B B i r r u u r s s n X X X X ρ where 1 B u X and 2 B u X denote the average expression value for u in B1 and B2 respectively Similarly 1 B r X and 2 B r X denote the average expression value for r in B1 and B2 respectively Number of conditions is greater than two For every biological state we...

Page 23: ...astasis BMC Systems Biology 2010 4 1 151 8 Rual J F Venkatesan K Hao T Hirozane Kishikawa T Dricot A Li N Berriz GF Gibbons FD Dreze M Ayivi Guedehoussou N et al Towards a proteome scale map of the human protein protein interaction network Nature 2005 437 7062 1173 1178 9 Martha V S Liu Z Guo L Su Z Ye Y Fang H Ding D Tong W Xu X Constructing a robust protein protein interaction network by integra...

Page 24: ...t cancer outcome Nature Biotechnology 2009 27 2 199 204 18 Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B Gautier L Ge YC Gentry J et al Bioconductor open software development for computational biology and bioinformatics Genome Biology 2004 5 10 19 Rice JA Mathematical Statistics and Data Analysis 2 edn Belmont Wadsworth Publishing Company 1995 ...

Reviews: