Projects and Programming Experience
- A word-based French-English Machine Translation system
- Directed Research - Linear-time decoding in Machine Translation
- Feature-phrase directed sentiment extraction from text
- Python implementations of natural-language parsing algorithms
- Text summarization using Information-Retrieval Techniques
- Causal link detection in data using Monte-Carlo methods
- An AJAX/Servlet based webapp for location-based image retrieval & upload
- Design of an automatic wireless hazard handling system
-
A word-based French-English Machine Translation system :
Advisors: Dr. D. Chiang & Dr. L. Huang, ISI Natural Language Group, Marina Del Rey, CA Duration: Oct 09 - Dec 09
I built and evaluated a statistical word-based machine-translation system, based on IBM alignment models 1 and 2. The word-alignment models were trained using the expectation-maximization algorithm and data from the Europarl parallel corpus (French-English subsection). A stack-decoder was built in C++ to generate candidate translations for a test-set, which were evaluated using the BLEU metric.
Technologies used: Python, C++ Full technical report Source code
-
Linear-time decoding in Machine Translation :
Advisors: Dr. D. Chiang & Dr. L. Huang, ISI Natural Language Group, Marina Del Rey, CA Duration: Jan 10 - Present
We seek to adapt an SCFG-based left-to-right target-generation algorithm (T. Watanabe, 2006) to work with generic SCFGs. We also plan to investigate the effect of cube-pruning (D. Chiang, 2007) as a pruning mechanism for left-to-right target generation using both Top-down and Earley algorithms to generate hypotheses. Finally, we seek to investigate improved left corner parsing algorithm variants along the lines of (R.C. Moore, 2004) and make decoding linear in the length of the source using a combination of pruning and optimization techniques. This is a work in progress.
Technologies used: Python Source code
-
Feature-phrase directed sentiment extraction from text :
Advisor: Dr. V. Gopalakrishnan , Dept of Computer Science & Engg., National Institute of Technology, Trichy, India Duration: Jan 09 - May 09
I designed and implemented in C++, a Hidden-Markov-Model based Part-of-Speech tagger for the grammatical tagging of English text. I also implemented unsupervised methods for training the PoS tagger, based on (Q.I. Wang & D. Schuurmans ; 2005). The trained tagger was used to identify topic-sentiment indicating feature-phrases which were weighted by significance using a mixture-model based sentiment extractor (Y. Zhang; 2002). Sentiment polarities were then extracted from these phrases.
Technologies used: C++ Full technical report Source code
-
Python implementations of natural-language parsing algorithms :
Advisor: Dr. L. Huang , ISI Natural Language Group, Marina Del Rey, CA Duration: Feb 10 - Mar 10
As a part of course-work for CSCI -544, Spring 2010 (Natural Language Processing), I was asked to implement a group of generic Natural-Language parsing algorithms for both generic PCFGs and PCFGs in CNF. Implemented algorithms include CKY for CNF grammars, optimized to generate only hyperedges that exist in the intersected PCFG, Earley's algorithm (with left-corner filtering for performance) for generic PCFG parsing and the Knuth-77 algorithm for finding the lowest-cost tree generable by a PCFG (adapted as a parsing algorithm by finding the best parse in the intersected-PCFG). The source-code as well as performance evaluations and usage protocols are available from the link below.
Technologies used: Python Source code
-
Text summarization using Information-Retrieval Techniques :
Advisor: Dr. E. Hovy , ISI Natural Language Group, Marina Del Rey, CA Duration: Apr 10 - May 10
As a part of course-work for CSCI - 544, I was asked to implement a text-summarization program that either used Information Extraction techniques (Templates) or RST with manually generated rules in order to build and prune a discourse tree. I experimented with automatic Information-Retrieval techniques to cluster document sentences into coherent sub-topics and used carefully constructed metrics to create a well-structured summary. This method was designed with emphasis on scaling and automated discourse-structuring for summary-generation.
Technologies used: Python, Concept-Net(P.Singh & H.Liu ; 2004) Source code
-
Causal link detection in data using Monte-Carlo methods :
Advisor: Dr. Fei Sha , Dept of Computer Science, University of Southern California, Los Angeles, CA Duration: Mar 10 - May 10
I implemented in C++, a method for weighting hypotheses of Bayesian-Network structure given sparse data. Posterior probabilities of edge and path features were approximated using Monte-Carlo sampling of Bayesian-Network orderings. Bayesian-Network structures were then sampled from the sampled orderings (by sampling node-families consistent with the orderings) and count-and-divide was used to approximate the posterior weights of the features. Testing was carried out on the 5-variable Hayes-Roth dataset (UCI machine-learning repository).
Technologies used: C++ Full technical report Source code
-
An AJAX/Servlet based webapp for location-based image retrieval & upload :
Advisor: Dr. Marco Papa , Dept of Computer Science, University of Southern California, Los Angeles, CA Duration: Oct 09 - Dec 09
As a final course project for Web-technologies (CSCI - 571 , Fall 09), I was asked to create a web-application involving a mash-up of Google, Flickr and Facebook APIs: The web-app took a geographical location as a query, retrieved the global coordinates using the Google maps API, then queried Flickr for photos corresponding to that location and returned a palette of retrieved images, then prompted the user to select one and upload the image onto his Facebook account using the FB.Connect API. The project tested my ability to program the processing of JSON-data and manipulation of DOM objects, creation of AJAX queries to existing services and using JAVA servlets to act as a proxy between client and server.
Technologies used: HTML/CSS, Tomcat/Java2 (javax.servlet.http, org.json, org.jdom), Javascript Source code
-
Design of an automatic wireless hazard handling system :
Advisors: Dr. K. Geetha & Dr. N. Ramasubrahmanian , Dept of Computer Science & Engg., National Institute of Technology, Trichy, India Duration: May 07 - Apr 08
I led the design and implementation of an intelligent and easily configurable hazard monitoring system for a sprawling environment (e.g. a very large-scale campus building housing multiple depts.), to demonstrate the capabilities of an automated network of embedded systems linked in wireless. The project’s focus was on devising a versatile engineering solution of hazard monitoring objects communicating and cooperating wirelessly in real-time. An HDL-based design (in Verilog) provided
the interface to the hazard monitoring network. Unit testing and circuit-layout generation were carried out in the Altera-Quartus II hardware design package.
Technologies used: Verilog HDL, Altera Quartus II Full technical report Source code
|