Abhijit Kashyap

Hi! My name is Abhijit, and I'm currently pursuing my Master's in Computer Science at the University of Southern California, Los Angeles. I'm looking for full-time opportunities involving machine learning, natural language processing and software engineering after I graduate in Spring 2019.

Outside of computer science, my interests lie in music, aviation, and space exploration.


Experience

Student Research Assistant (NLP) @ Information Sciences Institute, Los Angeles

February 2019 - Present

I’m currently working on a project about automatically replying to phishing emails in an attempt to waste the sender’s time.


Machine Learning Mentor @ Southern California Earthquake Center, Los Angeles

June 2018 - November 2018

For 3 months, I was the machine learning mentor for 10 undergraduates as part of a new venture by my organization to include ML in their internship program. This involved giving weekly classes, preparing exercises, formulating machine learning tasks given a large dataset of synthesized earthquake data and working on machine learning models to predict the probabilities of earthquakes occurring on the San Andreas fault. In the remaining 3 months, I worked on techniques like ML diagnostics and implementing better algorithms. The success of the program last year prompted the organization to have another one this year.


Web Developer @ Southern California Earthquake Center, Los Angeles

September 2017 - Present
  • I manage some of my organization's websites. This involves working on both backend (PHP development and server administration), as well as front end technologies.
  • I created an online administrative portal for event organizers to manage past and new event registrants. It greatly improved efficiency and number of registrants from the previous system, which used spreadsheets.
  • I also developed a backend OOP-based PHP framework for developing new pages. The new framework made it easier and faster to develop pages, and its design inherently enforced best practices.
  • I developed an FAQ chatbot using Dialogflow that handles questions commonly asked by site visitors about the organization's events, thus allowing employees to focus on more specific and complex questions.

R&D Intern @ Amadeus Software Labs, Bangalore, India

December 2015

I worked as an intern in the R&D division of the mobile solutions team at Amadeus Labs, Bangalore. I evaluated a few enterprise mobility platforms and developed a hybrid enterprise mobile application using IBM MobileFirst Platform. The application searches for flights and syncs data between devices. It is integrated with Facebook login, Parse application backend, and Amadeus services through REST APIs.


Skills

Software Engineering

I've worked with a wide range of technologies in multiple fields: game development, mobile application development, web development and general software development.

Python
Java
PHP
JavaScript
C/C++
Scala

Machine Learning

I have 2+ years of practical experience in ML through Kaggle competitions, personal projects, mentorship, research and coursework as well as a strong theoretical foundation through various university and online courses. I've recently become active in the area of computer vision using deep learning.

Natural Language Processing

I have 3+ years of experience in both traditional and machine learning based NLP tasks involving text classification, grammar development, parsing, dialogue systems, and more, as well as university and online courses.

Web App and Website Development

I got into web development over 5 years ago, starting off with developing responsive websites, then moving on to backend development (PHP, Python and JavaScript), working with databases (MySQL), integrating and developing APIs, and web scraping. I've worked extensively with LAMP stack and Python+Flask.

Linux

I've been a Linux user for 5 years and am proficient at using, debugging and configuring Linux desktops and servers, and writing advanced Bash scripts and commands.

Projects

My projects fall under the following categories.
Machine Learning Natural Language Processing Web App Development Android Development Apache Spark Game Development

Stack Exchange Question Classifier

Natural Language Processing Machine Learning

A text classifier that determines which website on the Stack Exchange network a given question most likely belongs on. I created a dataset of around 300,000 questions using Stack Exchange APIs and web scraping. I experimented with several machine learning models to classify the data: bag-of-words models like Multinomial Naive Bayes and a fully connected neural network, as well as sequence models like an LSTM network, CNN, and SepCNN. Using a neural bag-of-words model, I achieved a top-3 accuracy of 90%.

Concepts Text preprocessing text classification word embeddings TF-IDF vectorization bag-of-words models sequence models

Technologies Python Tensorflow Scikit-learn

Earthquake Prediction Using Outlier Detection

Machine Learning

Using a synthetic dataset of simulated earthquakes, I created an anomaly detection model that predicts whether a major earthquake will occur on some section of the San Andreas fault in the next ten years with an F1-score of 84%.

Concepts Data preprocessing Outlier detection Data visualization

Technologies Python Scikit-learn Pandas Matplotlib

View (Contact me for notebook password)

Kaggle Competition: Text Classification of Insincere Questions on Quora

Natural Language Processing Machine Learning

This was a Kaggle competition about classifying questions posted on Quora (the Q&A site) as sincere or insincere. I learned about and implemented some advanced and practical text preprocessing tricks to improve the performance of word embeddings. Using an LSTM network with TensorFlow/Keras, I ended up in the top 31% of the competition.

Concepts Text preprocessing Text classification Word embeddings Sequence models

Technologies Python Tensorflow

Kaggle Competition: Titanic Survivor Prediction

Machine Learning

By participating in this competition, I learnt about the importance of data exploration and feature engineering, as well as how to handle different types of data in datasets, choose a machine learning model and preprocess data according to the model.

Concepts Data preprocessing Data exploration Model selection

Technologies Python Scikit-learn Pandas

Mobile Phone Recommender Chatbot

Natural Language Processing Web App Development

A chatbot that recommends mobile phones to users through conversation, by understanding user preferences described in natural English - for example, "I want a phone that has a really good camera and a lot of storage". I developed a grammar specific to the types of user queries expected to identify key parts of the queries.

Concepts Dialog systems Grammars Part-of-Speech tagging Regular expressions Sentiment analysis

Technologies Python NLTK Dialogflow Flask Facebook Messenger Apps

Place and Entertainment Search Application

Android Development Web App Development

Created an Android application using Java and a backend web service using Node.js to help users find activities to do around a given area or the user's location. It leverages the Google Places and Maps APIs. This was the final project for my Web Technologies (CSCI-571) class at USC.

Concepts App development Software development Web services

Technologies Java Android Google API JavaScript Node.js

Conversational French Language Assistant

Natural Language Processing Web App Development

A chatbot that runs on Facebook Messenger and helps users by answering common questions asked by French learners, like translations, verb conjugations, noun genders, and more.

Concepts Dialog systems

Technologies Python Dialogflow Flask Facebook Messenger Apps

Class Assignments for Applied Natural Language Processing (CSCI544)

Natural Language Processing Machine Learning

Some of the main concepts covered in this class were part-of-speech tagging, parsing, text classification, language modelling, sequence models and machine translation.

  • Created a limerick detector.
  • Created a finite state transducer for converting French numbers to words.
  • Created a probabilistic context free grammar from a corpus and used it to create a probabilistic CYK parser to determine the most probable part-of-speech tags for a sentence.
  • Performed feature selection and implemented Viterbi inference for a Named Entity Recognition system on Twitter data.

Class Assignments for Foundations and Applications of Data Mining (INF553)

Apache Spark

Learnt about the MapReduce programming model with Scala and Apache Spark, and algorithms to handle large volumes of data. The class gave me in-depth knowledge of concepts like PageRank, finding similar sets and frequent itemsets, community detection in large social graphs, mining data streams, recommender systems and clustering on massive amounts of data.

  • Extracted information from the Stack Overflow Developer dataset using Spark+Scala.
  • Implemented collaborative filtering on 1 million Yelp reviews using Spark MLlib.
  • Implemented the Savasere-Omiecinski-Navathe (SON) algorithm to mine frequent itemsets (words) from a dataset of 1 million Yelp reviews.
  • Implemented K-means clustering on 1 million Yelp reviews from scratch as well as with MLlib.
  • Sampled data from a Twitter stream using reservoir sampling, and implemented the Bloom Filtering algorithm to determine whether incoming hashtags in a continuous stream of Twitter data have been seen before.

Class Assignments for Machine Learning (CSCI567)

Machine Learning

This class taught me the mathematical and theoretical details of several machine learning algorithms and familiarized me with implementing them from scratch using Python libraries like Numpy. I also learnt how to apply these algorithms on datasets.

  • Implemented linear regression, k-nearest neighbors and perceptron.
  • Implemented logistic regression, as well as forward and backward propagation for a neural network and applied it to the MNIST digits dataset.
  • Implemented the support vector machine algorithm, boosting and decision trees.
  • Implemented K-means clustering and expectation maximization on Gaussian Mixture Models. Used clustering to find the K best colors to represent an image, then quantized the image to contain only those colors.
  • Implemented Viterbi inference for a Hidden Markov Model, and Principal Component Analysis for image compression of MNIST data.

Older projects

Spell Checker
Natural Language Processing Web App Development

Created a web-based English language spell checker that automatically offers correct suggestions to incorrectly spelt words as the user types into an input box. It uses Levenshtein (edit) distance to find nearby words. New words are added to the database of known words through crowd-sourcing and verification using the Merriam Webster Dictionary API.

View on GitHub
Web Game using HTML5 Canvas
Game Development Web App Development

Created a graphical 2D game using HTML5 Canvas and JavaScript in which the player scores by destroying falling blocks before they reach the ground.

View on GitHub
Website for Music School
Web Development

Created a responsive informational website for a music school, integrated with Google Maps API for the school’s location. The site’s usage was also monitored and analyzed using Google analytics.

View website View on GitHub
Vocabulary Helper Chatbot
Natural Language Processing Web App Development

Created a natural language speaking bot that connects to the Internet Relay Chat (IRC) network as a user. It defines words and generally converses with other users on the network.

View on GitHub
Instagram Image Scraper
Web Scraping

Downloads Instagram images from a public user profile.

View on GitHub

Education

M.S. Computer Science @ University of Southern California, Los Angeles

Fall 2017 - Spring 2019

Classes CSCI567 - Machine Learning CSCI544 - Natural Language Processing INF553 - Foundations and Applications of Data Mining CSCI570 - Analysis of Algorithms CSCI585 - Database Systems CSCI572 - Information Retrieval and Web Search Engines CSCI571 - Web Technologies


B.Tech. Computer Science and Engineering @ VIT University, Vellore, India

2013 - 2017

Classes Soft Computing Data Warehousing and Data Mining Cloud Computing Software Engineering Internet and Web Programming

  • In my junior year at VIT, I co-founded a fundraising club called HEARTS to raise money for children in lower income families to get access to higher education.
  • I was also a designer for the university's official yearbook committee.

Certifications

Machine Learning by Stanford University on Coursera

August 2018

View Certificate


Deep Learning Specialization by deeplearning.ai on Coursera

January 2019

View Certificate