Soham Dandapath

I build AI systems that make it out of the notebook and into production, and I build things from scratch to understand how they really work.

01About

Mostly, I want to know how things actually work.

I'm an applied AI engineer at C3 AI. My path ran through Singapore and New York before the Bay Area: a BE in Computer Science from NTU, a stretch of internships from Shopee to Seagate, then an MS at Columbia with a focus in machine learning. The constant across all of it has been a stubborn kind of curiosity, the sort where I'll re-implement an idea from scratch just to find out how it actually works.

These days that curiosity has a job. At C3 I take AI from a vague business problem all the way to something running in production, mostly RAG-based LLM systems and probabilistic time-series forecasting. I care about three things in particular: models you can interpret, deployments you can reproduce, and tooling that makes the next engineer's job easier. Outside work I'm usually deep in a paper I'm trying to understand, which is where most of my GitHub repos come from.

02Work

What I've worked on.

2024 - now

Applied AI Engineer · C3 AI

I lead applied ML and LLM projects end to end for global enterprises. Right now I'm running a forecasting program for a major semiconductor customer, partnering with stakeholders from technical discovery through production, with an estimated ~$0.8B in annual business impact. Along the way I've shipped a RAG-based document-retrieval system for low-latency, policy-compliant search across internal docs; demand and yield forecasting apps that delivered $2.3M and roughly $5M in annual value for two customers; and an internal Python deployment toolchain that cut deploys from hours to minutes. I also own release management for our forecasting packages and mentor data scientists across teams.

2023

Data Science Intern · C3 AI

Shipped an out-of-the-box hierarchical forecasting and reconciliation system, implementing post-hoc MinT/ERM and intrinsic DeepVAR-Hierarchical approaches for cross-level coherence, and integrated probabilistic forecasts with Integrated Gradients explainability so the outputs were both uncertainty-aware and interpretable.

2022

Data Scientist · Charles & Keith

Built a tree-based sales forecasting model for seasonal planning, a 95%+ accuracy image-similarity engine for product matching, and an order-management web app that improved accuracy while cutting manufacturing costs and stockouts.

2020 - 21

Earlier internships · Shopee, Seagate, Outstrip, CogniAble

A run of hands-on ML and data work: optimizing Airflow/HDFS pipelines and a compression tool that cut storage by 90%+ at Shopee; neural-net and tree models to forecast hard-drive test time at Seagate; a React and Rails KPI dashboard at Outstrip; and a two-stream I3D action-recognition model on AWS SageMaker for early autism screening at CogniAble.

03Projects

Most of these began as "I don't really get this, let me build it."

Generative Models↗Jupyter · PyTorch

GANs, VAEs, and normalizing flows built side by side. I wanted to feel the trade-offs between them rather than read about them: how adversarial training differs from a variational bound differs from an invertible flow. It became one of my favourite references.

Diffusion Model↗Python · PyTorch

An implementation sandbox for diffusion models. I sat with the forward noising and reverse denoising process, step by step, until the math stopped feeling like magic and started feeling inevitable.

Vision Transformer↗Jupyter · PyTorch

A clean, from-scratch ViT for image classification. The interesting part was watching just how much data attention needs before it overtakes a solid convolutional baseline.

Transformer from Scratch↗Jupyter · PyTorch

The transformer rebuilt from first principles: attention, positional encodings, the lot. Re-deriving it by hand stuck far better than reading the paper a fourth time.

Co-Authorship Networkmost-starred↗Network analysis · ★ 22

A network-science study of academic co-authorship built on DBLP data for a course at NTU: graph construction, centrality, and community detection on a real, messy dataset. Quietly my most-starred repo.