My work sits at the intersection of reinforcement learning and combinatorial optimization — teaching neural policies to solve hard problems like Max-Cut, and learning to optimize signals in wireless systems. Currently focused on the L2A framework for graph problems.
I'm a researcher working on learning-based methods for optimization. Most of my time goes into designing neural network policies that learn to navigate large combinatorial search spaces — combining policy-gradient training with classical local search.
Outside the lab, I read a fair amount of science fiction and literary fiction, and I have a long-standing interest in theology and history. I think the best research, like the best stories, comes from sitting with a hard problem long enough to see it differently.
A reinforcement learning framework for graph combinatorial optimization, applied to Max-Cut on Barabási–Albert graphs. Couples a Transformer-based policy with local search and curriculum learning to progressively reshape the search operator.
A dual-stream Transformer (L2OTransformer) for multi-user MIMO beamforming, with user-level and antenna-level attention. Explored zero-shot generalization across varying numbers of users via masking, refined with projected gradient descent.
An open-source simulation comparing airline bumping policies, built during RCOS. Python simulation core with an HTML interface and SQL-backed data layer for analyzing overbooking strategies.
A series of RL implementations: Double DQN for CartPole and Pong, Q-learning with state discretization for Mountain Car and Pendulum, and dynamic programming for MDP environments.
Happy to talk about reinforcement learning, optimization, or anything at the intersection of the two. Reach me here:
A small corner of the internet I built for fun — games and everyday tools. Open the full app ↗