Many algorithms have been proposed to learn transcription regulatory networks from gene expression data. Bayesian networks have obtained promising results, in particular, the module network method. The genes in a module share a regulation program (regression tree), consisting of a set of parents and conditional probability distributions. Hence, the method significantly decreases the search space of models and consequently avoids overfitting. The regulation program of a module is normally learned by a deterministic search algorithm, which performs a series of greedy operations to maximize the Bayesian score. The major shortcoming of the deterministic search algorithm is that its result may only represent one of several possible regulation programs. In order to account for the model uncertainty, we propose a regression tree-based Gibbs sampling algorithm for learning regulation programs in module networks. The novelty of this work is that a set of tree operations is defined for generating new regression trees from a given tree and we show that the set of tree operations is sufficient to generate a well mixing Gibbs sampler even in large data sets. The effectiveness of our algorithm is demonstrated by the experiments in synthetic data and real biological data.
|Title of host publication||2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010|
|Number of pages||8|
|Publication status||Published - 1 Jan 2010|