Researcher · ML & Combinatorial Optimization

Hi, I'm Zhiyuan Wang. I study how machines learn to optimize.

My work sits at the intersection of reinforcement learning and combinatorial optimization — teaching neural policies to solve hard problems like Max-Cut, and learning to optimize signals in wireless systems. Currently focused on the L2A framework for graph problems.

About

I'm a researcher working on learning-based methods for optimization. Most of my time goes into designing neural network policies that learn to navigate large combinatorial search spaces — combining policy-gradient training with classical local search.

Outside the lab, I read a fair amount of science fiction and literary fiction, and I have a long-standing interest in theology and history. I think the best research, like the best stories, comes from sitting with a hard problem long enough to see it differently.

Reinforcement Learning Combinatorial Optimization Graph Neural Networks PyTorch ML Systems
Skills
Languages
Python
C++
SQL
JavaScript
HTML / CSS
ML & RL
PyTorch
Reinforcement Learning
Graph Neural Nets
Transformers
Policy Gradient
Mathematics
Combinatorial Opt.
Linear Programming
Graph Theory
Probability
Stochastic Proc.
Engineering
Git & GitHub
Linux
LaTeX
React / Vite
Data Analysis
Selected Work

L2A — Learning to Anneal

2025–26

A reinforcement learning framework for graph combinatorial optimization, applied to Max-Cut on Barabási–Albert graphs. Couples a Transformer-based policy with local search and curriculum learning to progressively reshape the search operator.

PyTorch · Policy Gradient · Graph Transformer · Curriculum Learning

Learning to Optimize Beamforming

2025

A dual-stream Transformer (L2OTransformer) for multi-user MIMO beamforming, with user-level and antenna-level attention. Explored zero-shot generalization across varying numbers of users via masking, refined with projected gradient descent.

PyTorch · Transformers · MIMO · PGD Refinement

Airline Overbooking Simulator

2025

An open-source simulation comparing airline bumping policies, built during RCOS. Python simulation core with an HTML interface and SQL-backed data layer for analyzing overbooking strategies.

Python · Simulation · SQL · HTML

Deep RL Coursework

2024–25

A series of RL implementations: Double DQN for CartPole and Pong, Q-learning with state discretization for Mountain Car and Pendulum, and dynamic programming for MDP environments.

PyTorch · DQN · Q-Learning · Dynamic Programming
Get in Touch

Happy to talk about reinforcement learning, optimization, or anything at the intersection of the two. Reach me here:

Play Zone

A small corner of the internet I built for fun — games and everyday tools. Open the full app ↗