About Me
Hi, my name’s Wontack Han and I’m a Ph.D. candidate of computer science at Indiana University. My current work is the implementation of pre-assembly program using k-mers abundance information and pipeline for marker genes/genomes through machine learning algorithms. My research interests are related to comparative analysis of not only metagenome data but also human multi-omics data sets using machine learning algorithms including deep learning. Now I am looking for the job position (post-doctoral or bioinformatics research).
Research
Comparative analysis of metagenomes can be used to detect sub-metagenomes(species or genesets) that are associated with specific phenotypes (e.g., host status). The typical workflow is toassemble and annotate metagenomic datasets individually or as a whole, followed by statisticaltests to identify differentially abundant species/genes. The subtractive assembly, a de novo assembly approach for comparative metagenomics, firstly detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. This approach can make the entire process for finding marker genes/genomes faster and more efficient.
The microbiome research is going through an evolutionary transition from focusing on the characterization of reference microbiomes associated with different environments/hosts to the translational applications, including using microbiome for disease diagnosis, improving the efficacy of cancer treatments, and prevention of diseases (e.g., using probiotics). Microbial markers have been identified from microbiome data derived from cohorts of patients with different diseases, treatment responsiveness, etc, and often predictors based on these markers were built for predicting host phenotype given a microbiome dataset. Unfortunately, these microbial markers and predictors are often not publishedso are not reusable by others. In this project, I report the curation of a repository of microbial marker genes and predictors built from these markers for microbiome-based prediction of host phenotype, and a computational pipeline called Mi2P (from Microbiome to Phenotype) for using the repository.
Due to the complexity of microbial communities, the de novo partition of metagenomic space into specific biological entities remains to be difficult. To address this problem, researchers have utilized various features, including compositional features such as tetra-nucleotide statistics and coverage signals of genetic sequences. However, the assumptions of those methods are not universally true. For example, the methods relying on abundances of genetic sequences are admittedly weak in segregating taxonomically related organisms. In the process of exploring other features, it has been realized that utilizing co-abundance across multiple samples improves the resolution of genome segregation from metagenomic data sets.
Experience
Indiana University Bloomington
Research Assistant
2015 - Present
Metagenome Lab Prof. Yuzhen Ye
Working on the metagenome data analysis.
- Subtractive assembly for detecting biomarkers from metagenome
- K-mers clustering using locality sensitive hashing
- Machine learning approach for comparative metagenomic data
Seoul National University
Research Assistant
2013 - 2015
Bioinformatics Institute Prof. Sun Kim
Distinguishing cancer phenotypes by identifying differentially expressed multi-paths in signaling pathways
- Analysis on 120 breast cancer RNA sequencing samples
Somansa
Software Engineer
2010 - 2012
QA testing for the program to monitor the leakage of personal information through the Internet.
- Packet analysis
- Testing SQL database for auditing
Education
Indiana University Bloomington
2015 - Present
Ph.D. Computer Science.
Minor Bioinformatics.
Supervisors: Prof. Yuzhen Ye.
Seoul National University
2006 - 2014
BS Electrical and Computer Engineering.
Papers
Skills
- Operating System : Linux, MacOS, Windows
- Programming Language : C/C++, JAVA
- Script Languague : Python, Bash, AWK
- Mathematical Simulation : MATLAB, R
- Parallel and GPU System : Tensorflow, OpenMP, MPI
- Database : MySQL, MSSQL