Resume
Welcome to my about page!
I am a passionate Computer Science graduate student with a keen interest in the fascinating fields of Machine Learning and Deep Learning. With over three years of experience as a dedicated Machine Learning Engineer, I have been fortunate enough to work on a diverse range of projects that have deepened my understanding and expertise in this rapidly evolving field.
Currently, I am proudly contributing my skills and knowledge to a dynamic startup, where I am actively involved in developing their Minimum Viable Product (MVP) solution. As part of this endeavor, I am leveraging the power of large language models to create innovative and cutting-edge solutions that push the boundaries of what is possible in the world of natural language processing.
My journey in the realm of Machine Learning began during my graduate studies, where I delved into the theoretical foundations and practical applications of this exciting domain. This academic background, coupled with my hands-on industry experience, has provided me with a well-rounded skill set and a deep understanding of the algorithms, frameworks, and techniques that power modern Machine Learning and Deep Learning systems.
I believe that the potential of Machine Learning and Deep Learning is limitless, and I am continuously exploring new avenues to expand my knowledge and stay up to date with the latest advancements in the field. My passion for learning and my drive to solve complex problems through innovative solutions fuel my enthusiasm for every project I undertake.
On this platform, I aim to share my insights, experiences, and learnings with fellow enthusiasts, researchers, and practitioners in the field. Whether you are a novice looking to dip your toes into the world of Machine Learning or an experienced professional seeking to stay ahead of the curve, I hope you find valuable information and inspiration within these pages.
Thank you for visiting, and I look forward to embarking on this exciting journey together!
Interests
- Software Development
- Machine Learning
- Data Science
- Large Language Models (LLMs)
- System Design
Education
University of South Dakota
MS in Computer Science
South Dakota, SD
May 2023
Visvesvaraya Technological University (VTU)
Bachelor in Computer Science and Engineering
Bangalore, India
Graduated July 2019
Experience
University of South Dakota
August 2022 - present
Graduate Assistant/Data Analyst
Machine Learning
: Creating a predictive model using machine learning algorithms to identify at-risk children and prevent incidents of abuse.Scripting
: Developing automated scripts using Python that merges data from different sources and reducing manual time of 1 hour to 10 minutes.Analysis
: Analyzing data to develop sustainable solution to reduce child sexual abuse and maltreatment in South Dakota.Reporting
: Implementing interactive dashboards and reports using Tableau to analyze child abuse rates.
Ensemble Matrix
Nov 2020 - Dec 2021
Python Software Engineer
APIs
: Developed backend APIs that support customer-facing product features using Django, GraphQL, and EC2.Async workloads
: Worked on backend systems that handle asynchronous workloads such as data ingestion and egestion from third-party ecommerce systems, using ECS, SQS, and Airflow.Infrastructure as Code
: Utilized Infrastructure as Code to model existing and new AWS resources as code for faster iterative deployments, specifically using AWS EC2.CI/CD
: Designed and implemented a comprehensive CI/CD pipeline for multiple projects from scratch, utilizing AWS EC2 and Github Actions.Tests
: Established code testing procedures from scratch, including linting and type checking, using pytest and SQLAlchemy.Structured Logging
: Revamped logging across multiple projects to improve bug discoverability, utilizing Datadog and Sentry for structured logging.Reporting
: Added analytics report generation tools for customers to gain more insights into business metrics, using Pandas and Django for ETL and reporting.
Bitpoint Pvt. Ltd.
August 2019 - Nov 2020
Python Software Engineer
APIs
: Created back-end APIs for medical entrance preparation application.3rd Party Integrations
: Created payment integrations to let users buy subscriptions for the application.Product Features
: Worked on the back end to add features like progress tracking, question recommendation, comments, notification emails, etc. (Django, celery, SqlAlchemy, Postgres, pytest).Authorization
: Added JWT authorization to user API.
Teaching
Achiever Gropus
Python Data Analysis and Web development using Django
2021
1. Stanford NLP Lecture Transcription using OpenAI’s Whisper
Whisper is an automatic speech recognition (ASR) model trained on hours of multilingual and multitask supervised data. It is implemented as an encoder-decoder transformer architecture where audio are splitted into 30 seconds of chunks, converted into a log-Mel spectrogram, and then passed into an encoder. The decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation. For more info about whisper, read here.
I used whisper model to transcribe Stanford NLP lectures into corresponding text captions. Here is the result of the transcribed lectures. This web app is build using Flask and deployed on AWS EC2 instance. You can find transcribed audio file in the form of text here.
2. Custom Named Entity Recognizer for clinical data
Named Entity Recognition (NER) is a subtask of Natural Language Processing (NLP) that involves identifying and categorizing named entities in text.
I have developed a custom named entity recognition (NER) model for clinical data using the spacy framework and deployed it using Streamlit. The model is capable of identifying various entities such as diseases, treatments, medications, and anatomical locations from clinical text data. The model classifies entities based on three classes: ‘MEDICINE’
, “MEDICALCONDITION”
, and “PATHOGEN”
. The dataset was used from kaggle. You can try the application on this link
3. Question Answering using Langchain and OpenAI
This application provides a simple example of how to build a question-answering system using Langchain and pre-trained language models from OpenAI and Streamlit.
Langchain helps to build Large Language Models (LLMs) through composability. It helps to combine large language models with other sources of computation.
I developed a question answering system using Langchain with OpenAI embeddings. Since, LLMs tends to have fixed context length, Langchain helps to eliminate this issue by introducing chains, where we can break the document into different chunks and run the chain on the whole document. In this application, when a user uploads a file, the contents are converted into embeddings using OpenAI embeddings and stored in Pinecone vector database. Storing embeddings this way, helps for faster retrieval of the embeddings. When a user enters the query, similarity search is conducted to retrieve the similar embeddings from the vector store and the langchain chain passes the formatted response to the LLM.
Skills
Languages
: Python, JavaScript
Databases
: MySQL, Postgresql, MongoDB
Frameworks and Libraries
: Pytorch, Tensorflow, Huggingface Transformers, OpenCV, SpaCy, Django, FastAPI, Vuejs
Tools
: Git, Linux, Docker, AWS (EC2, S3, Lambda function, Sagemaker), Hydra, Model Packaging (ONXX)