Computer Science, Bioinformatics

About Me

Hi, my name’s Wontack Han and I’m a Ph.D. candidate of computer science at Indiana University. My current work is the implementation of pre-assembly program using k-mers abundance information and pipeline for marker genes/genomes through machine learning algorithms. My research interests are related to comparative analysis of not only metagenome data but also human multi-omics data sets using machine learning algorithms including deep learning. Now I am looking for the job position (post-doctoral or bioinformatics research).

Research

Subtractive assembly of metagenome data sets

https://sourceforge.net/projects/concurrentsa/

Comparative analysis of metagenomes can be used to detect sub-metagenomes(species or genesets) that are associated with specific phenotypes (e.g., host status). The typical workflow is toassemble and annotate metagenomic datasets individually or as a whole, followed by statisticaltests to identify differentially abundant species/genes. The subtractive assembly, a de novo assembly approach for comparative metagenomics, firstly detects differential reads that distinguish between two groups of metagenomes and then only assembles these reads. This approach can make the entire process for finding marker genes/genomes faster and more efficient.

A repository of microbial marker genes

https://sourceforge.net/projects/mi2p/

The microbiome research is going through an evolutionary transition from focusing on the characterization of reference microbiomes associated with different environments/hosts to the translational applications, including using microbiome for disease diagnosis, improving the efficacy of cancer treatments, and prevention of diseases (e.g., using probiotics). Microbial markers have been identified from microbiome data derived from cohorts of patients with different diseases, treatment responsiveness, etc, and often predictors based on these markers were built for predicting host phenotype given a microbiome dataset. Unfortunately, these microbial markers and predictors are often not publishedso are not reusable by others. In this project, I report the curation of a repository of microbial marker genes and predictors built from these markers for microbiome-based prediction of host phenotype, and a computational pipeline called Mi2P (from Microbiome to Phenotype) for using the repository.

clustering tones of k-mers' abundance

https://github.com/wthanone/kmerLSH

Due to the complexity of microbial communities, the de novo partition of metagenomic space into specific biological entities remains to be difficult. To address this problem, researchers have utilized various features, including compositional features such as tetra-nucleotide statistics and coverage signals of genetic sequences. However, the assumptions of those methods are not universally true. For example, the methods relying on abundances of genetic sequences are admittedly weak in segregating taxonomically related organisms. In the process of exploring other features, it has been realized that utilizing co-abundance across multiple samples improves the resolution of genome segregation from metagenomic data sets.

Experience

Indiana University Bloomington

Research Assistant

2015 - Present

Metagenome Lab Prof. Yuzhen Ye

Working on the metagenome data analysis.

Subtractive assembly for detecting biomarkers from metagenome
K-mers clustering using locality sensitive hashing
Machine learning approach for comparative metagenomic data

Seoul National University

Research Assistant

2013 - 2015

Bioinformatics Institute Prof. Sun Kim

Distinguishing cancer phenotypes by identifying differentially expressed multi-paths in signaling pathways

Analysis on 120 breast cancer RNA sequencing samples

Somansa

Software Engineer

2010 - 2012

QA testing for the program to monitor the leakage of personal information through the Internet.

Packet analysis
Testing SQL database for auditing

Education

Indiana University Bloomington

2015 - Present

Ph.D. Computer Science.

Minor Bioinformatics.

Supervisors: Prof. Yuzhen Ye.

Seoul National University

2006 - 2014

BS Electrical and Computer Engineering.

Papers

Skills

Operating System : Linux, MacOS, Windows
Programming Language : C/C++, JAVA
Script Languague : Python, Bash, AWK
Mathematical Simulation : MATLAB, R
Parallel and GPU System : Tensorflow, OpenMP, MPI
Database : MySQL, MSSQL