Vasisht Raghavendra

Vasisht Raghavendra

Masters Graduate from University of Southern California

About Me

I am a Masters Graduate with a degree in Computer Science and a specialization in Data Science. My interests lie in the field of Data Science.

I first developed an interest for Data Science in college, when I was introduced to this fascinating subject. It intrigued me to see a program’s capability when applied on large sets of data comprising of a variety of types. Ever since then, this field has held my fascination and made me delve into the depths of this mysterious subject.

Along with the coursework, I enjoy working on Kaggle competitions as hobby projects and hacks using R programming and Python. Besides academics my interests include browsing the web about bizzare things, start-ups, stock-market and how stuff work. I also enjoy travel and hikes, be it alone or in a group. Listening to music and dancing are also my favorite pastimes.

Professional Experience

Verimatrix Inc., San Diego
Data Analytics Intern

January 2016 - April 2016

Support the R&D Department by analyzing the Conditional Access System (CAS) server logs to identify fraudulent behavior. The main goal of this project was to analyze the logs, create norms on the data and find the anomalies. An anomaly is an instance of fraud and hacking. The tasks included parsing the logs to extract important attributes using Python, building a model (Hidden Markov Model) to identify the suspicious behaviors and to scale it using a distributed environment (Spark and Hadoop). Helped the global services team to focus on potential market by analyzing their data to report the sales and the revenue earned. Analyzed data collected from Periscope application to build a proof of concept to show the increase in views during important events.

PinsightMedia+, Kansas City
Data Science Intern

May 2015 - August 2015

Worked in a start-up environment in the Product and Technology Development division on a deduplication project. The goal of this project was to identify unique persons from the data (from multiple datasets) of Pinsight by employing fuzzy matching algorithms and to create a unique PinsightMedia ID for them. The data was cleaned and prepared using Hive and Python and a deduplication framework was built usng Java on Spark framework. R, Pandas and Graphlab were used to analyze the data and generate reports.

CIBERsites India Private Limited, Bangalore
Intern

December 2013 - May 2014

Worked on the SAP Basis/Netweaver technology and underwent trainings on SAP Solutions and System Landscape, Installation Basics, Database concepts, Upgrades, Java Administration, Solution Manager, High Availability concept, System Monitoring etc. Later, was a part of a team to support multiple clients. My responsibilities included monitoring the SAP systems health by running the daily monitoring T Codes and in the event of any failure, fix the system by making suitable changes and generate a report.

Bizosys Technologies Private Limited, Bangalore
Software Intern

August 2013 - December 2013

Worked in a start-up environment as a part of a live project, ShopGirl, which involved crawling European fashion portal sites using Nutch crawler to collect the data about their various products. The data was then cleansed using Javascript and mapped on to the different properties. This was then uploaded on to Hadoop on which MapReduce was used to standardize. The ultimate goal of the project was to create a master portal of the crawled sites.

Testimonials

Niels Thorwirth

Vice President, Advanced Technology at Verimatrix

"Vasisht was an intern an our GenOne group from Jan-April 2016. He worked on analytics of consumer behavior and hacker activity. Vasisht has an impressive working knowledge of current and relevant tools to process big data and a deep understanding of relevant machine learning algorithms. He has contributed significantly to building our knowledge, strategy and technical foundation for corresponding solutions."

Eric Yam

Advanced Technology Group, R&D, Technical Staff in the CTO Office at Verimatrix

"Vasisht was our student intern in the R&D group, he demonstrated his ability to understand and tackle challenging data security issues in the development of Conditional Access System (CAS). Vasisht took direction well, asked questions as necessary and always work with enthusiasm and concentration. Generally, his work output demonstrated a fine education in the field and potential to be a good data science professional."

Projects

Spotify user behavior analysis

Analyzed the Spotify data using Python libraries on IPython notebook to get some insights into user behavior according to demographics in terms of track listens and total time spent listening and suggest some product changes.

Tweet indexer

Designed and implemented an indexer that indexes the words from a file of tweets, prompts the user for a search query (multiple words delimited by space), retrieves a query result list consisting of all tweets that contain any word in the query and outputs the tweet id and tweet text for every item in the query result list. The query result was sorted based on number of matching query words, words appearing as hashtags have more weight than plain words and phrases.

Grapheme to Phoneme conversion

The goal of the project was: * To try out different modelling techniques for solving this problem, * To compare the relative performance of these models on our CMU pronunciation dictionary dataset, * To get an intuition as to why the best one performs better than others. The CMU Pronunciation dictionary was used as the main dataset and four different models - Decision Trees, Hidden Markov Models, Conditional Random Fields and Maximum Entropy model, were tried out. M2m aligner was used to align the grapheme and its corresponding phoneme. For evaluation, the Phoneme Error Rate and the Word Error rate was calculated for all the techniques.

Dialogue tagging

Developed a Conditional Random Field classifier using Python to tag dialogue acts to sequences of utterances in a conversation using Switchboard (SWBD) corpus.

Community Detection in graphs

Implemented the community detection algorithm using a divisive hierarchical clustering (Girvan-Newman algorithm). Made use of two libraries - networkx and community, for finding out the betweeness of edges. Used matplotlib library for plotting the communities.

Community Detection

Hierarchical agglomerative clustering implementation

Implemented the hierarchical agglomerative clustering algorithm for k clusters and using Eclidean distance function in Python. The clusters were combined using lazy deletion strategy using a piority queue.

User-based Collaborative Filtering Recommendation Engine

Implemented a simple user-based collaborative filtering recommender system for predicting the ratings of movies using k nearest neighbors and Pearson correlation in Python.

Binary classification of Emails and Sentiment Analysis on IMDB reviews

Developed a Naive Bayes model, SVM classifier and Maximum Entropy classifier using Python to classify Emails as Spam/Ham and IMDB reviews as POSITIVE/NEGATIVE.

Finding frequent Itemsets

Implemented the Park-Chen-Yu(PCY), Multihash and Toivonen algorithms in Python to find frequent itemsets given several baskets of items.

MapReduce implementation

Mimicked the MapReduce framework using Python to solve various problems using this approach.

Bayesian Network to answer questions about a patient

Developed a python script to find the conditional probability of a symptom (or finding) given each of the diseases, by computing inferences in Bayesian networks of discrete random variables. The data was collected by a fairly extensive electronic medical record system, and the priori probability of the diseases was known.

CPT for an insurance company

Developed a python script to compute the conditional probability table(CPT) given a Bayesian network of factors for a disease as a directed graph indicating the causal effect. The CPT table was used to answer few queries about the data.

Titanic: Machine Learning from Disaster

Worked on a project at the Applied Statistics Club to analyze and predict what sorts of people are likely to survive in a disaster like the sinking of Titanic. This project was also a challenge on Kaggle. It involved developing a model to predict the likelihood of a person to survive, using R Programming and Python.

Fine Arts Museum of San Francisco

This project involved developing a scraper using Java and JSoup to get the various artworks data from the museum website, cleaning the data using Google Open Refine and modelling and mapping the data using Karma.

SAT Solver

Developed a boolean satisfiability solver in Python that takes a set of variables and connectives in a propositional logic statement, converts it into Conjunctive Normal Form(CNF) and returns either a satisfying assignment that would make the CNF sentence true or determines that no satisfying assignment is possible.

Real Estate Listing

This project involved listing the property details using the Zillow API. The XML data returned by Zillow was parsed using a PHP script residing on Amazon Web Services Elastic Beanstalk and converted into JSON. An android application and a responsive website(using Bootstrap) were developed and a Facebook share button was provided to publish the information.

Zillow Responsive Website
 
Zillow Android Application

XML Database for a Company

This project involved designing and implementing a XML database for a Company. The database was then queried by using XQuery on Altova XMLSpy 2015 tool and a XML Stylesheet was developed to transform the XML data into HTML.

USC Transportation Application

The USC Transportation Department wanted to improve their transportation services for which they had selected certain buildings inside campus and some data of student coordinates. They had also kept track of tram stops on campus used by the students. This data was provided with the intention of moving some tram stops so that they can cover maximum students and building. The spatial data was populated on Oracle 11g and a Java application was developed to query the database.

USC Transport Application