Xuan Guo

Assistant Professor
Discovery Park F290
  • Education
    • PhD, Georgia State University, 2015.
      Major: Computer Science
    • MS, Wuhan University, 2011.
      Major: Computer Science
    • BS, Southwest University, 2009.
      Major: Computer Science
  • Research

    High-throughput omics technologies are revolutionizing many aspects of modern biology. We are entering the era of Big Data in biology research. The massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including storage bottleneck, measurement errors, noise accumulation, and spurious correlation.

    We study data mining, machine learning, big data analysis and data fusion. The primary focus of our research is on developing advanced frameworks and algorithms to help biologists solve practical problems in systems biology, biomedicine, and natural sciences and enable them to make full use of massive, high-dimensional data for various biological inquiries.

    Metaproteomics Search

    Microbial communities drive nutrient cycling in aquatic and terrestrial ecosystems and influence the health of human, animal, and plant hosts. The metabolic activities of a microbial community can be inferred from the proteomes of its constituent microorganisms. In a typical metaproteomics experiment, total proteins are extracted from environmental samples of a microbial community and then measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using a “shotgun” proteomics approach. Thousands of organisms in complex communities lead to a much smaller number of peptide and protein identifications in metaproteomics analyses of complex communities than comparable proteomics analyses of single organisms. In this study, we aim to develop novel database searching and filtering algorithms.

    Software: Sipros Ensemble.

    Genome-wide Association Studies

    Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. In this project, we design fast and powerful methods detect genome-wide multi-locus epistatic interactions across multiple cases.

    Software: DCHEDAMMSCD.

    Sequence Assembly

    Next-generation sequencing platforms not only decrease the cost of metagenomics data analysis but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with moderate coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly frameworks and algorithms by taking advantages of high-performance computer and cyberinfrastructure.

    Software: DISCODIME.

    fMRI Analysis

    An intriguing quest regarding the brain science is: what are the origin and the principles behind the functional architectures, which define who we are and what we are to a great extent. Compared to other methods, functional Magnetic Resonance Imaging (fMRI) is one of the most common ways that can explore the functional activities of the whole brain. After decades of active research, there have been numerous evidence that the brain function is realized and emerges from the interaction of multiple concurrent neural processes or brain networks. We design novel statistic models to analyze functional brain dynamics among the nodes of brain networks along time periods.

    Software: BCCPM.

  • Publications

    Journal Articles

    Guo, X., Li, Z., Yao, Q., Mueller, R. S., Eng, J. K., Tabb, D. L., Hervey, W. J., & Pan, C. (2017). Sipros Ensemble Improves Database Searching and Filtering for Complex Metaproteomics. Bioinformatics.(Accpeted).

    Yu, N., Guo, X., Zelikovsky, A., & Pan, Y. (2017). GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences. BMC genomics, 18(4), 392. https://doi.org/10.1186/s12864-017-3731-5

    Guo, X., Zhang, J., Cai, Z., Du, D. Z., & Pan, Y. (2017). Searching genome-wide multi-locus associations for multiple diseases based on Bayesian Inference. IEEE/ACM transactions on computational biology and bioinformatics, 14(3), 600-610. DOI: 10.1109/TCBB.2016.2527648

    Yu, N., Guo, X., Gu, F., & Pan, Y. (2016). Signalign: An ontology of DNA as signal for comparative gene structure prediction using information-coding-and-processing techniques. IEEE transactions on nanobioscience, 15(2), 119-130. DOI: 10.1109/TNB.2016.2537831

    Guo, X., Liu, B., Chen, L., Chen, G., Pan, Y., & Zhang, J. (2016). Bayesian inference for functional dynamics exploring in fMRI data. Computational and mathematical methods in medicine, 2016. http://dx.doi.org/10.1155/2016/3279050

    Ding, X., Wang, J., Zelikovsky, A., Guo, X., Xie, M., & Pan, Y. (2015). Searching high-order SNP combinations for complex diseases based on energy distribution difference. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 12(3), 695-704. doi>10.1109/TCBB.2014.2363459

    Guo, X., Yu, N., Ding, X., Wang, J., & Pan, Y. (2015). Dime: A novel framework for de novo metagenomic sequence assembly. Journal of Computational Biology, 22(2), 159-177. https://doi.org/10.1089/cmb.2014.0251

    Fu, Y., Chen, G., Guo, X., Zhang, J., & Pan, Y. (2015). Analyzing the effects of pretreatment diversity on HCV drug treatment responsiveness using Bayesian partition methods. Journal of bioinformatics and proteomics review, 1(1), 1. PMCID: PMC4597793

    Guo, X., Yu, N., Gu, F., Ding, X., Wang, J., & Pan, Y. (2014). Genome-wide interaction-based association of human diseases-a survey. Tsinghua Science and Technology, 19(6), 596-616. DOI: 10.1109/TST.2014.6961029

    Guo, X., Meng, Y., Yu, N., & Pan, Y. (2014). Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC bioinformatics, 15(1), 102. https://doi.org/10.1186/1471-2105-15-102

    Zeng, T., Guo, X., & Liu, J. (2014). Negative correlation based gene markers identification in integrative gene expression data. International journal of data mining and bioinformatics, 10(1), 1-17. https://doi.org/10.1504/IJDMB.2014.062889


    Lian Z, Li X, Pan Y, Guo X, Chen L, Chen G, Wei Z, Liu T, Zhang J. Dynamic Bayesian brain network partition and connectivity change point detection. In Computational Advances in Bio and Medical Sciences (ICCABS), 2015 IEEE 5th International Conference on 2015 Oct 15 (pp. 1-6). IEEE.

    Yu N, Guo X, Zelikovsky A, Pan Y. GaussianCpG: A Gaussian model for detection of human CpG island. In Computational Advances in Bio and Medical Sciences (ICCABS), 2015 IEEE 5th International Conference on 2015 Oct 15 (pp. 1-1). IEEE.

    Yu N, Guo X, Gu F, Pan Y. DNA AS X: An information-coding-based model to improve the sensitivity in comparative gene analysis. In International Symposium on Bioinformatics Research and Applications 2015 Jun 6 (pp. 366-377). Springer, Cham.

    Guo X, Zhang J, Cai Z, Du DZ, Pan Y. DAM: A Bayesian method for detecting genome-wide associations on multiple diseases. In International Symposium on Bioinformatics Research and Applications 2015 Jun 6 (pp. 96-107). Springer, Cham.

    Yu N, Gu F, Guo X, He Z. A fine-grained flow control model for cloud-assisted data broadcasting. In Proceedings of the 18th Symposium on Communications & Networking 2015 Apr 12 (pp. 24-31). Society for Computer Simulation International.

    Guo X, Ding X, Meng Y, Pan Y. Cloud computing for de novo metagenomic sequence assembly. In International Symposium on Bioinformatics Research and Applications 2013 May 20 (pp. 185-198). Springer, Berlin, Heidelberg.

    Zeng T, Guo X, Liu J. Discovering negative correlated gene sets from integrative gene expression data for cancer prognosis. In Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on 2010 Dec 18 (pp. 489-492). IEEE.


    Nguyen, K., Guo, X., & Pan, Y. (2016). Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Evaluation. John Wiley & Sons. ISBN: 978-1-118-22904-0

    Guo, X., Yu, N., Li, B. and Pan, Y. (2016) Cloud Computing for Next-Generation Sequencing Data Analysis, in Computational Methods for Next Generation Sequencing Data Analysis (eds I. Mandoiu and A. Zelikovsky), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9781119272182.ch1