- PhD, Georgia State University, 2015.
Major: Computer Science
- MS, Wuhan University, 2011.
Major: Computer Science
- BS, Southwest University, 2009.
Major: Computer Science
- PhD, Georgia State University, 2015.
High-throughput omics technologies are revolutionizing many aspects of modern biology. We are entering the era of Big Data in biology research. The massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including storage bottleneck, measurement errors, noise accumulation, and spurious correlation.
We study data mining, machine learning, big data analysis and data fusion. The primary focus of our research is on developing advanced frameworks and algorithms to help biologists solve practical problems in systems biology, biomedicine, and natural sciences and enable them to make full use of massive, high-dimensional data for various biological inquiries.
Microbial communities drive nutrient cycling in aquatic and terrestrial ecosystems and influence the health of human, animal, and plant hosts. The metabolic activities of a microbial community can be inferred from the proteomes of its constituent microorganisms. In a typical metaproteomics experiment, total proteins are extracted from environmental samples of a microbial community and then measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS) using a “shotgun” proteomics approach. Thousands of organisms in complex communities lead to a much smaller number of peptide and protein identifications in metaproteomics analyses of complex communities than comparable proteomics analyses of single organisms. In this study, we aim to develop novel database searching and filtering algorithms.
Software: Sipros Ensemble.
Genome-wide Association Studies
Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. In this project, we design fast and powerful methods detect genome-wide multi-locus epistatic interactions across multiple cases.
Next-generation sequencing platforms not only decrease the cost of metagenomics data analysis but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with moderate coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly frameworks and algorithms by taking advantages of high-performance computer and cyberinfrastructure.
An intriguing quest regarding the brain science is: what are the origin and the principles behind the functional architectures, which define who we are and what we are to a great extent. Compared to other methods, functional Magnetic Resonance Imaging (fMRI) is one of the most common ways that can explore the functional activities of the whole brain. After decades of active research, there have been numerous evidence that the brain function is realized and emerges from the interaction of multiple concurrent neural processes or brain networks. We design novel statistic models to analyze functional brain dynamics among the nodes of brain networks along time periods.
Guo, X., Li, Z., Yao, Q., Mueller, R. S., Eng, J. K., Tabb, D. L., Hervey, W. J., & Pan, C. (2017). Sipros Ensemble Improves Database Searching and Filtering for Complex Metaproteomics. Bioinformatics.(Accpeted).
Yu, N., Guo, X., Zelikovsky, A., & Pan, Y. (2017). GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences. BMC genomics, 18(4), 392. https://doi.org/10.1186/s12864-017-3731-5
Guo, X., Zhang, J., Cai, Z., Du, D. Z., & Pan, Y. (2017). Searching genome-wide multi-locus associations for multiple diseases based on Bayesian Inference. IEEE/ACM transactions on computational biology and bioinformatics, 14(3), 600-610. DOI: 10.1109/TCBB.2016.2527648
Yu, N., Guo, X., Gu, F., & Pan, Y. (2016). Signalign: An ontology of DNA as signal for comparative gene structure prediction using information-coding-and-processing techniques. IEEE transactions on nanobioscience, 15(2), 119-130. DOI: 10.1109/TNB.2016.2537831
Guo, X., Liu, B., Chen, L., Chen, G., Pan, Y., & Zhang, J. (2016). Bayesian inference for functional dynamics exploring in fMRI data. Computational and mathematical methods in medicine, 2016. http://dx.doi.org/10.1155/2016/3279050
Ding, X., Wang, J., Zelikovsky, A., Guo, X., Xie, M., & Pan, Y. (2015). Searching high-order SNP combinations for complex diseases based on energy distribution difference. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 12(3), 695-704. doi>10.1109/TCBB.2014.2363459
Guo, X., Yu, N., Ding, X., Wang, J., & Pan, Y. (2015). Dime: A novel framework for de novo metagenomic sequence assembly. Journal of Computational Biology, 22(2), 159-177. https://doi.org/10.1089/cmb.2014.0251
Fu, Y., Chen, G., Guo, X., Zhang, J., & Pan, Y. (2015). Analyzing the effects of pretreatment diversity on HCV drug treatment responsiveness using Bayesian partition methods. Journal of bioinformatics and proteomics review, 1(1), 1. PMCID: PMC4597793
Guo, X., Yu, N., Gu, F., Ding, X., Wang, J., & Pan, Y. (2014). Genome-wide interaction-based association of human diseases-a survey. Tsinghua Science and Technology, 19(6), 596-616. DOI: 10.1109/TST.2014.6961029
Guo, X., Meng, Y., Yu, N., & Pan, Y. (2014). Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC bioinformatics, 15(1), 102. https://doi.org/10.1186/1471-2105-15-102
Zeng, T., Guo, X., & Liu, J. (2014). Negative correlation based gene markers identification in integrative gene expression data. International journal of data mining and bioinformatics, 10(1), 1-17. https://doi.org/10.1504/IJDMB.2014.062889
Lian Z, Li X, Pan Y, Guo X, Chen L, Chen G, Wei Z, Liu T, Zhang J. Dynamic Bayesian brain network partition and connectivity change point detection. In Computational Advances in Bio and Medical Sciences (ICCABS), 2015 IEEE 5th International Conference on 2015 Oct 15 (pp. 1-6). IEEE.
Yu N, Guo X, Zelikovsky A, Pan Y. GaussianCpG: A Gaussian model for detection of human CpG island. In Computational Advances in Bio and Medical Sciences (ICCABS), 2015 IEEE 5th International Conference on 2015 Oct 15 (pp. 1-1). IEEE.
Yu N, Guo X, Gu F, Pan Y. DNA AS X: An information-coding-based model to improve the sensitivity in comparative gene analysis. In International Symposium on Bioinformatics Research and Applications 2015 Jun 6 (pp. 366-377). Springer, Cham.
Guo X, Zhang J, Cai Z, Du DZ, Pan Y. DAM: A Bayesian method for detecting genome-wide associations on multiple diseases. In International Symposium on Bioinformatics Research and Applications 2015 Jun 6 (pp. 96-107). Springer, Cham.
Yu N, Gu F, Guo X, He Z. A fine-grained flow control model for cloud-assisted data broadcasting. In Proceedings of the 18th Symposium on Communications & Networking 2015 Apr 12 (pp. 24-31). Society for Computer Simulation International.
Guo X, Ding X, Meng Y, Pan Y. Cloud computing for de novo metagenomic sequence assembly. In International Symposium on Bioinformatics Research and Applications 2013 May 20 (pp. 185-198). Springer, Berlin, Heidelberg.
Zeng T, Guo X, Liu J. Discovering negative correlated gene sets from integrative gene expression data for cancer prognosis. In Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on 2010 Dec 18 (pp. 489-492). IEEE.
Nguyen, K., Guo, X., & Pan, Y. (2016). Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Evaluation. John Wiley & Sons. ISBN: 978-1-118-22904-0
Guo, X., Yu, N., Li, B. and Pan, Y. (2016) Cloud Computing for Next-Generation Sequencing Data Analysis, in Computational Methods for Next Generation Sequencing Data Analysis (eds I. Mandoiu and A. Zelikovsky), John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9781119272182.ch1
- About Us
- Give to CSE