Arnav Raj

Dual Degree (B.Tech + M.Tech) Computer Science & Engineering, IIT Delhi

Fourth‑year dual degree student focused on AI safety, evaluation of large language models, and building reliable ML systems. Actively exploring reinforcement learning while leveraging experience with operating systems, networks, and performance‑minded backend work. I like designing retrieval‑augmented generation pipelines, building observability + evaluation tooling that turns model behavior into measurable signals, and squeezing latency/quality trade‑offs in large‑scale inference.

RAG Model interpretability Long‑Context LLM Eval Model Observability ML for Systems RL (Exploring)

Currently seeking internship & collaborative project roles for 2025–26.

Education

Indian Institute of Technology Delhi
B.Tech and M.Tech in Computer Science & Engineering • 2022-2027
Mess Secretary, BHM
Leadership & Operations • Jun 2024–2025
Senior Editor, TechAmbit (Pan-IIT Magazine)
Editorial & Tech Strategy • 2023–Present

Honors & Awards

JEE Advanced AIR 1158 (Top 0.5% in India)
KVPY SX Fellowship awarded by Gov. India
NSEA Astronomy Top 250 in India
NSEC Chemistry Top 300 in India
Codeforces Expert (1700+ Rating)
IMC Prosperity Trading Challenge – World Rank 8 (R1)
2× Smart India Hackathon National Finalist

Experience

Harvard University Summer Research Intern – Edge Computing Lab May 2024 – Present • Remote (US)
- Built LangChain benchmarking & validation framework for RTL code generation (syntax → testbench → PPA loop with auto re‑prompting) across GPT‑4 & Llama.
- Compared prompt strategies (CoT, zero‑shot, few‑shot) under graded design complexity; tracked accuracy, efficiency, robustness.
- Automated validation: syntax + module / system testbenches; failing cases re‑prompted; passing designs sent to PPA analysis.
- Co‑authoring paper targeting DATE 2025 (preprint in preparation).
Georgia Institute of Technology Summer Research Intern – FSI Lab May 2024 – Jun 2025 • Remote (Atlanta)
- Co‑authored KG‑QAGen, submitted to NeurIPS 2025 Datasets & Benchmarks, for generating multi‑hop knowledge‑graph questions.
- Co‑developed KG‑QAGen‑D, a dataset of 20,139 long‑context multi‑hop QA pairs for structured long‑context reasoning evaluation.
- Built an end‑to‑end benchmarking pipeline (auto chunking, batched question generation, multi‑chunk answer synthesis) scaling LLM evaluation across 170 agreements.

Quick Links

Projects Publications Blog

Arnav Raj

Education

Honors & Awards

Experience

Recent News

Quick Links