background image

 

- 14 - 

the  carriage  return  (i.e.  the  key  labeled  “Enter”  on  the  keyboard)  should  be  pressed 
immediately after LABEL_END. 
 

 

Figure 2: An example input expression data file with two sets of labels 

 
b.

 

Interactome data

:  The  interactome  data  file  is  tab-separated  and  has  two  columns  – 

the first column corresponds to hubs and the second column to interactors. The first 
row  of  the  data  file  is  assumed  to  be  the  header  and  is  ignored  while  reading  the 
data.  For  protein  interactomes,  the  names  of  hubs  and  their  interactors  can  be 
provided  as  Entrez  IDs  or  gene  symbols  but  not  as  a  combination  of  the  two.  For 
microRNA interactomes, the names of interactors can be provided as Entrez IDs or 
gene symbols.  

 
 

Input parameters

 

We  assume  that  a  network  module comprises two types of genes – hub and interactor. 
The  hub  gene  is  connected  to  all  the  interactor  genes  and  every  interactor  gene  is 
connected to only the hub gene. The main function for identifying the enriched network 
modules is 

identifySignificantHubs.

 This function has 14 input parameters and a detailed 

description  of the  parameters is  provided in  the  PDF  file 

VAN_Package_Functions.pdf

Of the 14 input parameters, only four have to be explicitly specified by the user – 

exprFile

labelIndex

mapFile

, and 

outFile

. The default values for the remaining 10 parameters need 

to be changed only under certain conditions and below we describe those conditions.  
 

Parameter 

Condition  where  the  default  value  should  be 
changed 

hubSize = 5 

By  default,  only  those  network  modules 
that  have  at  least  5  interactors  in  the 
expression  data  set  are  considered  for 
downstream  analysis.  This  value  can  be 
changed  to  consider  more  dense  or  sparse 
modules. 
 

randomizeCount = 1000 

By 

default 

1000 

permutations 

are 

performed  to  determine  the  p-value.  The 
user  can  select  a  higher  number  of 
permutations,  however,  it  should  be  noted 

Set 1 

Set 2 

Summary of Contents for GLSY22522

Page 1: ...VAN Package User Guide Version 1 0 ...

Page 2: ...ization using R or Cytoscape an example 7 5 Meta analysis of multiple datasets an example 9 6 Generating microRNA protein based interactome an example 10 7 Understanding input data and parameters 11 8 Conversion of gene symbols to Entrez Ids 15 9 Understanding output data 16 10 Combining output data with known cancer annotation 17 11 Measures of association 18 12 References 20 ...

Page 3: ...llation instructions Packages to download For Windows VAN_1 0 0 zip VANData_1 0 0 zip For Unix VAN_1 0 0 tar gz VANData_1 0 0 tar gz For Mac VAN_1 0 0 tgz VANData_1 0 0 tgz Example dataset to download Example_DataSet zip ...

Page 4: ...and VANData packages VAN_1 0 0 zip and VANData_1 0 0 zip are saved in the directory C My_Packages Now at the R command prompt type setwd C My_Packages Note that the separator has to be and not for example setwd C My_Packages will result in an error message install packages VANData_1 0 0 zip repos NULL install packages VAN_1 0 0 zip repos NULL For Mac users set the working directory using the appro...

Page 5: ...ificance The test corresponds to the null hypothesis that there is no change in association between a hub and its interactors across biological states 8 Gene_Output_1_Cor txt An example output file containing the correlations for all hub interactor pairs in two states namely StateA and StateB 9 Gene_Output_2_Cor txt An example output file containing the correlations for all hub interactor pairs in...

Page 6: ... 5 ...

Page 7: ...owever these warning messages will not affect the execution of the program 1 Combining gene expression data with a protein protein interaction PPI dataset Two conditions identifySignificantHubs exprFile Gene_Expr_Two_Conditions txt labelIndex 1 mapFile PPI_Map txt outFile Test_Output_PPI txt randomizeCount 10 For an explanation of input output data refer sections 7 and 9 If only a subset of the hu...

Page 8: ...e Gene_Expr_Four_Conditions txt labelIndex 1 mapFile PPI_Map txt outFile Test_Output_PPI_Four_Cond txt randomizeCount 10 assocType FSTAT 4 Combining gene expression and microRNA expression data with a microRNA target interactome Multiple conditions If multiple conditions are to be evaluated for a combination of gene and microRNA expression data then the exprFile input parameter should contain two ...

Page 9: ...ping at the R command prompt visualizeNetwork inputFile Gene_Output_1_Cor txt inputHub ABL1 paletteVector c red yellow green For an explanation of the data file to be provided as input refer Section 9 Similarly for an example visualization of a microRNA and its interactome at the R command prompt type visualizeNetwork inputFile Micro_Output_1_Cor txt inputHub hsa miR 551a paletteVector c red yello...

Page 10: ... file this will be the VAN output file Gene_Output_1_Cor_Signif txt b At the Advanced options check Show Text File Import Options c At the Text File Import Options uncheck the Delimiter Space d At Attribute Names check Transfer first line as attribute names e At the Interaction Definition options select Column 1 as the Source Interaction and Column 2 as the Target Interaction f At the Preview clic...

Page 11: ...ltiple conditions the above procedure is followed with additional Columns activated at the data import step refer to 3f above This should enable multiple States to be available for viewing as described at 7 above ...

Page 12: ...ession datasets However to illustrate the meta analysis feature of our package we use the output files provided in the Example dataset Section 2 At the R command prompt type inputFileVect c Gene_Output_1 txt Gene_Output_2 txt Gene_Output_3 txt Fisher s combined test summarizeHubData fileNames inputFileVect outFile Summary_Mann_Fisher txt metaAnalysis Fisher RankProd This is the RankProd implementa...

Page 13: ...m MC_Mirnome In each of the three instances two output files are generated one with suffix Entrez and the other with suffix Symb e g Test_PPI_Entrez txt and Test_PPI_Symb txt The former contains the hub interactor pairs as Entrez IDs and the latter as gene symbols During the generation of the interactome files some error files may also be generated refer Section 8 The function generateMicroRnaMap ...

Page 14: ...propriate label should be assigned to each of the N samples The keyword LABEL_END is used to separate the labels from the actual expression values and allows the user to provide in a single expression data file multiple ways of grouping the samples Figure 2 For example let us assume that the N samples can be grouped based on disease status say StateA and StateB and mutation say Wt and Mut and we a...

Page 15: ...nteractor genes and every interactor gene is connected to only the hub gene The main function for identifying the enriched network modules is identifySignificantHubs This function has 14 input parameters and a detailed description of the parameters is provided in the PDF file VAN_Package_Functions pdf Of the 14 input parameters only four have to be explicitly specified by the user exprFile labelIn...

Page 16: ...ent If the number of conditions is more than two then this parameter must be set to FSTAT labelVect NULL By default this value is set to NULL and implies that all the N samples in the expression data are used for measuring association Sometimes one may be interested in evaluating only a subset of conditions present in the expression data For example the N samples in the expression data may corresp...

Page 17: ...ld be set to ENTREZ outputDataType SYMB By default the output files save the hub and interactors as gene symbols However the user can choose to save the two as Entrez IDs by setting this parameter to ENTREZ Species Human Currently only human is supported inputCores 4 This denotes the number of microprocessor cores that are available for executing the code in parallel The number is decreased automa...

Page 18: ... symbols could not be mapped to Entrez IDs c Error_PPI_Int txt This file contains the interactors for which the gene symbols could not be mapped to Entrez IDs Interactome data The functions generatePpiMap and generateMicroRnaMap are used to generate the interactome data files These functions return the hub interactor pairs in two formats one corresponding to Entrez IDs and the other to gene symbol...

Page 19: ...oscape 2 program the second file _Cor is filtered refer Section 4 Option 2 Both the two and multiple condition files are suitable for upload and visualisation using Cytoscape Meta analysis The function summarizeHubData is used to perform meta analysis and aggregate the results obtained using multiple datasets The meta analysis output file contains all the modules that were tested for enrichment in...

Page 20: ...dules in the output file Gene_Output_1 txt Section 2 correspond to known cancer genes at the R command prompt type obtainCancerInfo hubFile Gene_Output_1 txt cancerAnnotationFile Cancer_Gene_Census xls outFile Hub_CIC_Info txt The output file Hub_CIC_Info txt contains hubs with unadjusted p value 0 05 that map to known cancer genes Unlike the microRNA interactomes which are updated regularly in th...

Page 21: ...r a given hub interactor pair i e a u r pair 1 1 1 1 1 1 B B i i B r u B B i r r u u r s s n X X X X ρ 1 and 2 2 2 1 2 2 B B i i B r u B B i r r u u r s s n X X X X ρ 2 where B1 and B2 denote the two biological states 1 B n and 2 B n denote the number of samples in B1 and B2 respectively i u X and i r X denote the expression value for the hub and the interactor respectively in the ith sample 1 B u...

Page 22: ...C 1 1 1 1 1 1 1 1 B B B i B i B r u B B i r r u u r s s n X X X X ρ and 2 2 2 2 2 1 2 2 B B B i B i B r u B B i r r u u r s s n X X X X ρ where 1 B u X and 2 B u X denote the average expression value for u in B1 and B2 respectively Similarly 1 B r X and 2 B r X denote the average expression value for r in B1 and B2 respectively Number of conditions is greater than two For every biological state we...

Page 23: ...astasis BMC Systems Biology 2010 4 1 151 8 Rual J F Venkatesan K Hao T Hirozane Kishikawa T Dricot A Li N Berriz GF Gibbons FD Dreze M Ayivi Guedehoussou N et al Towards a proteome scale map of the human protein protein interaction network Nature 2005 437 7062 1173 1178 9 Martha V S Liu Z Guo L Su Z Ye Y Fang H Ding D Tong W Xu X Constructing a robust protein protein interaction network by integra...

Page 24: ...t cancer outcome Nature Biotechnology 2009 27 2 199 204 18 Gentleman RC Carey VJ Bates DM Bolstad B Dettling M Dudoit S Ellis B Gautier L Ge YC Gentry J et al Bioconductor open software development for computational biology and bioinformatics Genome Biology 2004 5 10 19 Rice JA Mathematical Statistics and Data Analysis 2 edn Belmont Wadsworth Publishing Company 1995 ...

Reviews: