Gary Qiurui Ma

I am a final year Computer Science and Business Student at Hong Kong University of Science and Technology, where I am very fortunate to be advised by Prof. Tong Zhang.

In 2019, I spent a fantastic year at the Strategic Reasoning Group, University of Michigan Ann Arbor, where I was so thankful to be supervised by Prof. Michael Wellman. My sincere gratitude also goes to Prof. Tilman Borgers, whose most enlighting lectures sparked my interest in Game Theory.


After exploration of Reinforcement Learing (RL) and Stochastic Control, I have found myself deeply attracted to the intersection of Game Theory and Computer Science. I was drawn to the field initially because of its power in combining the potential of RL with a solid framework of Equilibrium concepts into a multiagent system. Now I am interested in the theories of convergence to equilibrium concepts. My current work attempts to answer the following two questions:

  1. Why and when do certain no-external-regret algorithms converge to Correlated Equilibriums or even Nash Equilibrium in some games, instead of a more general Coarse Correlated Equilibrium?
  2. Is Nash Equilibrium identifiable through sampling in unknown stochastic games without a minimax oracle?


* indicates equal contribution.
Evaluating Strategy Exploration in Empirical Game-Theoretic Analysis
Yongzhao Wang*, Qiurui Ma*, Michael Wellman,
Working Paper

In Empirical Game Theoretic Analysis (EGTA), game models are iteratively extended to include the Nash Equilibrium of the underlying true games. The Strategy Exploration process dictates which new strategies to add to the game models next based on current available information. We investigate the methodological considerations in evaluating different strategy exploration processes in EGTA and highlight a consistency criteria that past literatures violate.

Learning a Decision Module by Imitating Driver’s Control Behaviors
Junning Huang*, Sirui Xie*, Jiankai Xun, Qiurui Ma, Chunxiao Liu, Bolei Zhou,
The Conference on Robot Learning (CoRL), 2020
[paper] [project page] [code]

we propose a hybrid framework to learn neural decisions in the classical modular pipeline through end-to-end imitation learning. This hybrid framework can preserve the merits of the classical pipeline such as the strict enforcement of physical and logical constraints while learning complex driving decisions from data.


TCA-TWAS: Identification of Cell-Type-Specific Genetic Regulation of Gene Expression for Transcriptome-Wide Association Studies
Qiurui Ma*, Duo Zhang*, Brandon Jew, Sriram Sankararaman,
July 2019 - Sep 2019 | supported by CSST scholarship, UCLA
[code] [poster] [presentation] [report on data simulation]

In this study, we deconvolute builk-level gene expressions into cell-type-specific gene expressions with cell-type weights using bayesian models, circumventing the centrifusion that traditional methods require to acqure cell-type specific gene expressions. We then associate specific gene expressions with phenotypes on UKBiobank blood tissue data.

Uncertainty-Aware Model-Based Reinforcement Learning in Autonomous Driving using PILCO
Qiurui Ma*, Sirui Xie*,
Feb 2019 - June 2019 | work done at Sensetime HK
[Contact me for detailed design and implementation for IP reasons]

In this study, we bring uncertainty estimation to model based RL for autonomous driving. The model is parenthesized by a bayesian neural network to approximate PILCO and dropouts are used to estimate the uncertainty. We further train a multilayer perceptron as a controller, whose gradient could flow through the model network. We demonstrate that our model could output uncertainty towards its projections, and could navigate safely in complex environments.

Double Q Learning for Long-Short Derivatives Trading
Qiurui Ma, | Advised by James Tin-Yau Kwok
Oct 2018 - Dec 2018 | Undergraduate Research Opportunity Project at HKUST

In this project, we apply double q-learning for long and short trading on twenty years of oil derivatives. My work envolved first scraped 20 years of oil derivative data from Bloomberg and Yahoo Finance; then implemented a support-resistance line visualization tool to better analyze and feature engineer; finally implemented a double dqn module to long or short the derivative, with its performance beating the benchmark buy-and-hold strategy

Research and Work Experience

Research Assistant
Strategic Reasoning Group at University of Michigan, Ann Arbor
Feb 2020 - July 2020 | Advised by Prof. Michael Wellman
Research Assistant
Sriram Lab at University of California, Los Angeles
July 2019 - Sep 2019 | Advised by Prof. Sriram Sankararaman
Research Intern
Sensetime Hong Kong
Feb 2019 - June 2019 | Advised by Sirui Xie
Data Science Intern
Orient Overseas Container Line (OOCL) Hong Kong
June 2018 - Sep 2018 | Advised by Wan Tsz Him, Michael
OOCL employs a data science team to optimize container scheduling and routing. It was meaningful for me to have witnessed the application of Machine Learning techniques in the industry. My work involved: Predicting empty-container daily release-return quantity for ports across the world with time series models, with performances surpassing that developed by MSRA for Long Beach Port; Imputing vessel utility and empty container re-stowage with boosting, attained performance gain upon existing implementation.
Tax Intern
PricewaterhouseCoopers Hong Kong
July 2017 - Sep 2017
In my first and second year of undergrad, I was all geared towards Business. I found CS more meaningful to myself only after this internship, when I got the most sense of fulfilment out of writing a script file that helped copying and scanning the tax reports.


  • HKSAR Government Scholarship (2018-2020)
  • UCLA CSST Scholarship and Best Presentation Award (2019)
  • HKUST One Million Dollar International Entrepreneurship Competition Grand Finals Winning Award (2018)
  • HSBC/HKU Hong Kong Business Case Competition Championship (2018)
  • HKSAR Government Scholarship Fund - Talent Development Scholarship (2016)
  • Dean List (2016-2020)