I am a Ph.D. student in the Computer Science Department, University of Southern California, Los Angeles. I work for the USC-Chevron Center of Excellence for Research and Academic Training on Interactive Smart Oilfield Technologies (CiSoft) at USC. My research interests include data integration, uncertain data management, semantic web, big data and social networks. I received my BS in Computer Science from Xi'an Jiaotong University, Xi'an, China and MPhil in Computer Science from The University of Hong Kong, Hong Kong, Hong Kong SAR.


In Big Data Analysis Group @ USC, We are developing a framework for rapid integration of heterogeneous Big Data information sources. The framework captures complex interrelationships and interdependence across datasets and establishes probabilistic linkages among distributed content. The system is built using Semantic Web technologies, facilitating complex queries to be issued across the integrated data repositories. This approach complements existing techniques by providing probabilistic queries that take into account the discovered structure among the data sources. To facilitate rich analysis of the integrated datasets, we leverage existing statistical learning, machine learning, and data mining techniques. We are also developing algorithms that identify both simple and complex patterns across datasets. Such patterns are also used to improve the integration process. Automation achieved in this manner not only reduces the manual effort involved in cleansing and processing large datasets significantly, but also ensures consistency and effective use of compute and storage resources. The framework is being validated on real-world use-cases from the petroleum industry.

