RL Agent for Code Optimization

Reinforcement learning system that learns to optimize Python code for performance by iteratively applying transformations. The agent observes code structure, execution metrics, and proposes modifications to reduce runtime and memory footprint while preserving correctness.

Environment Design: Custom OpenAI Gym environment where state = AST representation of code + profiling metrics; actions = code transformations (loop unrolling, memoization insertion, data structure swaps, vectorization hints).
PPO Implementation: Proximal Policy Optimization with clipped objective using Ray RLlib. Actor-critic architecture with shared CNN layers over AST graph embeddings and separate policy/value heads.
Reward Function: Multi-objective reward combining: (1) execution time reduction (40% weight), (2) peak memory reduction (30% weight), (3) correctness preservation via test suite (30% weight, with heavy penalty for breaking tests).
Code Representation: AST nodes embedded using TreeLSTM encoder, capturing hierarchical structure. Node types, variable names (anonymized), and control flow patterns encoded as feature vectors.
Action Space: Discrete action space of 12 code transformations including: insert memoization decorator, convert list→set for membership checks, replace recursion with iteration, apply list comprehension, add type hints + Cython compilation flag.
Training Curriculum: Progressive difficulty from simple sorting algorithms to graph traversal and dynamic programming. 500+ training problems with ground-truth optimal solutions for reward shaping.
Results: On held-out algorithmic benchmarks (n=50), agent achieved average 35% runtime reduction and 20% memory improvement over baseline implementations. 92% of optimized solutions passed all correctness tests.
Safety Constraints: Rollback mechanism if optimized code fails tests. Execution timeout prevents infinite loops from exploratory transformations.

Tech: Python, PyTorch, OpenAI Gym, Ray RLlib, AST module, TreeLSTM, memory_profiler, cProfile.

Future work: extending action space to include parallelization hints (multiprocessing, async) and GPU offloading suggestions.