Arnav Raj
Dual Degree (B.Tech + M.Tech) Computer Science & Engineering, IIT Delhi
Fourth‑year dual degree student focused on AI safety and LLM reliability. Currently contributing to RLHF training data at Abundant AI for top global AI labs. Actively exploring reinforcement learning while leveraging experience with operating systems, networks, and performance‑minded backend work. Submitted ICLR 2026 workshop paper on hallucination detection in Hyperbolic space. I like designing retrieval‑augmented generation pipelines, building observability + evaluation tooling that turns model behavior into measurable signals, and squeezing latency/quality trade‑offs in large‑scale inference.
AI Safety
RLHF
RAG
LLM Evaluation
Model Interpretability
RL (Exploring)
Open to Summer 2026 internships and collaborative projects in AI & RL.
Education
-
Indian Institute of Technology Delhi
B.Tech and M.Tech in Computer Science & Engineering • 2022-2027 -
Mess Secretary, Zanskar Hostel
Leadership & Operations • Jun 2024 – May 2025 -
Senior Editor, Tech Ambit (Pan-IIT Magazine)
Editorial & Tech Strategy • 2023 – 2025
Honors & Awards
- Founding member of AI Safety Club, IIT Delhi
- JEE Advanced All India Rank 1158 (1M+ candidates)
- 2× Smart India Hackathon National Top 5 Finalist
- KVPY SX Fellowship (Gov. India & IISc Bangalore)
- Best Mess Secretary, IIT Delhi
- National Science Olympiads: Top 250 Astronomy, Top 300 Chemistry
- Codeforces Expert (1700+ Rating)
Experience
-
Harvard University
Research Intern – Edge Computing Lab
- Built LangChain benchmarking framework for RTL code generation across GPT‑4 and Llama models.
- Implemented end‑to‑end validation pipeline: syntax checking → testbench validation → PPA analysis with automated re‑prompting for failing designs.
- Compared Chain‑of‑Thought, zero‑shot, and few‑shot prompting strategies across graded design complexity.
- Tracked accuracy and latency metrics across different prompt engineering approaches.
-
Georgia Institute of Technology
Research Intern – FSI Lab
- Co‑developed KG‑MuLQA framework for generating multi‑hop knowledge‑graph questions (ACL 2025 ARR submission).
- Created dataset of 20,139 long‑context multi‑hop QA pairs for structured reasoning evaluation.
- Built scalable LLM benchmarking pipeline with auto‑chunking, batched generation, and multi‑chunk answer synthesis across 170 credit agreements.
- Designed evaluation infrastructure for long‑context understanding in financial documents.