University of Southern California, Los Angeles, CA

Master of Science, Computer Science


Tokyo Institute of Technology, Tokyo, Japan

International Visiting Student, Department of Communications and Computer Engineering, Takahashi Lab


Beijing University of Technology, Beijing, China

Bachelor, Software Engineering (Embedded System)


Contact Me

FaceBook Home Page

LinkedIn Home Page




Csci 526


Associate research project in Tokyo Institute of Technology


• Optimization of Application on Heterogeneous System based on the heterogeneous parallel programing

• Developed parallel snore signal processing program in Python/CUDA and performance study


Modern systems always designed in heterogeneous architecture, which contain not only latency cores but also throughput cores such as graphics processing units (GPUs). These throughput cores now are designed as programmable processors employing a large number of processor cores, is an effective tool to address the Big Data problem. My work is about an interdisciplinary study on heterogeneous architecture SoC and biomedical engineering. As an emerging technique, biomedical computing combines the diagnostic and investigative aspects of biology and medical science with the sheer power and problem-solving capabilities of modern computing. In this new rapidly-evolving discipline, computers are used to accelerate research learning, simulate patient behavior and visualize complex biological models. However, Large-scale biomedical data collection and processing could be an intensive and time-consuming task for traditional industry computation systems. I present a host system accelerated by GPU for biomedical signals processing, which covers biomedical data acquisition, processing, analysis and storage. In this work, I designed and deployed this system and took snore related signals (SS) as an example to study the performance of the system. Compared with traditional computation systems only used CPU, this system achieved a superior performance. In addition, I studied about the principles of Compute Unified Device Architecture programming language (CUDA) and its performance in e fused multiply-add operations.