About Me

Hi there, my name is Arpan Mishra and I currently work as a Data Science Consultant at ZS Associates. I graduated in July 2021 with a Bachelor of Science in Statistics from KMC, University of Delhi.

I’m an Applied GenAI Engineer with 4.5 years of experience architecting scalable multi-agent LLM systems, retrieval-augmented generation pipelines, and AI evaluation frameworks. I deliver production solutions in regulated domains, combining GenAI, structured data processing, and cloud infrastructure.

If i’m not coding then you can find me playing my ukulele or crushing someone on chess.com, challenges are accepted ♚

Experience

This is how my professional journey has been until now


Data Science Consultant at ZS Associates
December 2025 – Present | Gurgaon, India

Clinical Trial Document Authoring Platform

  • Led development of an AI-powered platform supporting 12 clinical trial document types using a multi-agent LLM architecture, managing a team of 10 AI engineers and collaborating cross-functionally to drive production delivery.
  • Designed a Planner-based agent orchestration framework with RAG-powered context grounding to execute complex document drafting workflows with improved factual accuracy.
  • Built scalable digitization and indexing pipelines across AWS and Azure, enabling structured processing of diverse clinical document formats.
  • Developed a centralized knowledge layer (MCP server on AWS Agent Core) and established an LLM evaluation framework to ensure high-quality, transparent AI outputs in regulated environments.

Clinical Query Analysis System

  • Designed and deployed a GenAI-powered clinical query intelligence system that reduced redundant queries by 20%, improving operational efficiency across clinical workflows.
  • Architected a multi-agent LLM framework comprising an Executor Agent, Redundancy Classification Agent, and Query Category Detection Agent.
  • Processed and generated insights from ~1.3M historical clinical queries to benchmark performance and improve classification accuracy.
  • Integrated the system into the clinical query portal to enable real-time redundancy detection and query-type tagging during user input.

Data Science Associate Consultant at ZS Associates
December 2023 – December 2025 | Gurgaon, India

Safety Narrative Document Authoring

  • Developed an AI-powered pipeline to analyze SDTM and ADaM clinical trial datasets and automatically generate chronological safety narratives for patients.
  • Engineered an end-to-end ETL pipeline to transform structured clinical datasets into LLM-compatible formats for downstream reasoning and summarization.
  • Leveraged structured prompting techniques to enable temporal reasoning over patient safety events and produce coherent, medically aligned summaries.
  • Built a lightweight application to operationalize the pipeline, generating safety narratives for 600+ patients across 3 clinical trials.
  • Implemented automated Gantt chart visualizations from trial data to assist medical writers in validating event timelines.

Auto Document Redactor

  • Designed and deployed an automated document redaction pipeline to remove Patient PII from confidential clinical documents, achieving 98% recall and 90% precision.
  • Applied chain-of-thought prompting and few-shot learning techniques to accurately identify nuanced and context-sensitive PII entities.
  • Hardened the redaction workflow by integrating a rule-based and ML-driven spaCy entity recognition pipeline to improve robustness and reduce false negatives.
  • Built evaluation benchmarks and validation workflows to ensure compliance and reliability in regulated clinical environments.

Data Science Associate at ZS Associates
November 2021 – December 2023 | New Delhi, India

ISR Entity Detection Pipeline

  • Built an entity and relationship extraction pipeline from clinical protocol documents to support Industry Sponsored Research (ISR) decision-making.
  • Extracted key entities including drug, dosage, cycle, endpoints, and inclusion/exclusion criteria using spaCy NER, custom fine-tuned entity recognition models, and XGBoost-based classifiers.
  • Enabled structured protocol intelligence to help stakeholders make data-driven research sponsorship decisions.

Patient Discontinuation Prediction

  • Developed an XGBoost-based predictive model to identify patients at risk of therapy discontinuation.
  • Engineered features from claims data, patient demographics, and census datasets, applying feature selection techniques such as RFE and forward/backward elimination.
  • Packaged the solution into a reusable internal asset for client-specific model training and feature generation.

Research Intern at Inria
June 2021 – September 2021 | Lille, France

  • Worked with medical data for mental health patients with a history of suicide attempts.
  • The objective was to model the recurrence of a suicide attempt from demographic as well as medical survey data by VigilanS using parametric as well as non-parametric statistical methods.
  • Conducted spatial analysis of the patients and used geostatistical techniques to include the effect of spatial autocorrelation into the model.

Machine Learning Engineer (Part Time) at Omdena
August 2020 – February 2021 | Remote

  • Worked with satellite imagery and survey data from Census and DHS as part of a global team of 50 change makers.
  • Used Landsat 7 & 8 Satellite Images and census data to create a model predicting district-level census variables using a multi-modal, multi-task learning approach.
  • Used DHS data and Sentinel images to classify the Asset Wealth Index of clusters across India.
  • This project was hosted by World Resources Institute (WRI) and is under UN´s Sustainable Development Goal 8 (Decent Work & Economic Growth).

Skills

Programming & Tools
Python, R, SQL, Docker, Streamlit, LangGraph, LangChain

GenAI & LLM Systems
Multi-Agent Architectures, Agent Orchestration & Workflow Automation, Retrieval-Augmented Generation (RAG), Prompt Engineering (Chain-of-Thought, Few-Shot, Structured Prompting), LLM Evaluation & Validation Frameworks

Cloud & Infrastructure
AWS (SageMaker, Agent Runtime, Bedrock Data Automation, EKS, DocumentDB), Azure Document Intelligence, Google Vertex AI

Projects

These are some of the personal projects that I have built in the past.


ross

Rossman Sales Prediction

Created a tool to predict the daily sales of any store of the Rossmann drug store chain which is the 2nd largest drug store chain in Germany.

anime

Sentiment Extraction using Bert

Used Bert to detect the sentiment of a given text and further extract the words that best conveys the detected sentiment.

svm

Generating Anime Synopsis using Deep Learning

I used two techniques, LSTMs and then a fine tuned GPT2 for comparing their language modeling capabilities and the results were astounding!

pred

Global Suicide Analysis EDA

I analyzed the global suicide data for 90+ countries from the year 1985 - 2015 in R. Various statistical tests and data visualization techniques were used to explain the data.

ross

Text Analysis Webapp

The purpose of this app is to offer anyone starting off an NLP projects a fast and convenient means of exploring the text data cutting down the time between EDA and Modelling.

anime

Rubik’s Cube Rotation Prediction

Predicting the X-Axis Rotation for a give rubik’s cube using Resnet-50. This was part of the AI Blitz Challenge, a hackathon hosted by AI Crowd.

anime

Selfie Filter using CNN

I used a CNN architecture for facial keypoint detection and further used openCV to achieve the desired effect of a sunglass filter which works real time with a webcam.

Blog

Here are few of the blogs that I have written related to machine learning, data science and the projects that I have built.


SAT

Faster Machine Learning Using Hub by Activeloop

A code walkthrough of using the hub package for satellite imagery

anime

Let’s make some Anime using Deep Learning

Comparing text generation methods: LSTM vs GPT2

svm

Decoding Support Vector Machines

Intuitively understand how Support Vector Machines work

pred

Predicting HR Attrition using Support Vector Machines

Learn to train an SVM model following best practices

Contact