About me
I am a research scientist at Google DeepMind.
Previously, I was a post-doctoral researcher at the
Maastricht University Games and AI Group,
working with Mark Winands.
During my PhD, I worked at University of Alberta with Michael Bowling
on sampling algorithms for equilibrium computation and decision-making in games.
You can read all about it in my thesis.
Before my PhD, I did an undergrad and Master's at McGill University's School of Computer Science
and Games Research @ McGill,
under the supervision of Clark Verbrugge.
I am interested in general multiagent learning (and planning), computational game theory, reinforcement learning, and game-tree search.
For an overview of what I have been involved with over the past few years, check out my COMARL seminar (slides here).
For a longer version of my interests, how I got into research, how I do it, what drives me, please check out this interview by Sanyam Bhutani on Chai Time Data Science.
In Nov '19, I gave a multiagent RL workshop at Laber Labs at NC State University
led by Eric Laber. Here are the slides,
video, and handout.
If you would like to reach me, please contact me by email. My address is my first name, followed by a dot,
followed by my last name, followed by an at symbol, followed by gmail, followed by a dot, followed by com.
For more frequent updates and other things, please reach out via social media!
Code
OpenSpiel: A Framework for Reinforcement Learning in Games
[github]
[paper]
[tutorial]
[bib]
OpenSpiel is a collection of environments and algorithms for research in general
reinforcement learning and search/planning in games. OpenSpiel supports n-player
(single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and
sequential, strictly turn-taking and simultaneous-move, perfect and imperfect
information games, as well as traditional multiagent environments such as
(partially- and fully observable) grid worlds and social dilemmas. OpenSpiel
also includes tools to analyze learning dynamics and other common evaluation
metrics. This document serves both as an overview of the code base and an
introduction to the terminology, core concepts, and algorithms across the fields
of reinforcement learning, computational game theory, and search.
CFR and MCCFR variants
[bluff11.zip]
This code contains simple examples of a number of CFR algorithms: vanilla CFR,
chance-sampled CFR, outcome sampling MCCFR, external sampling MCCFR, public chance sampling,
and pure CFR.
It also includes an expectimax-based best response algorithm so that the
exploitability of the average strategies can be obtained to measure the
convergence rate of each algorithm.
The algorithms are applied to the game Bluff(1,1), also called Dudo, Perudo,
and Liar's Dice.
Please read the README.txt contained in the archive before building
or running the code.
The code is written in C++, and has been tested using g++ on Linux, MacOS,
and Windows.
hexIT
[hexIT-0.62.zip]
hexIT is a set of Java classes for representing and displaying a hexagonal board.
It has been used to implement hexagonal board games and for course assignments.
Publications
Journal Articles and Book Chapters
-
Student of Games: A Unified Learning Algorithm for both Perfect and Imperfect Information Games.
Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard,
Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling.
Science Advances, 2023.
[paper]
[arXiv]
-
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning.
Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat.
TMLR, 2023.
[paper]
[arXiv]
[code]
-
Negotiating team formation using deep reinforcement learning.
Yoram Bachrach,
Richard Everett,
Edward Hughes,
Angeliki Lazaridou,
Joel Z.Leibo,
Marc Lanctot,
Michael Johanson,
Wojciech M.Czarnecki,
Thore Graepel.
AIJ, 2020.
[paper]
[arXiv].
-
The Hanabi challenge: A new frontier for AI research.
Nolan Bard,
Jakob N. Foerster,
Sarath Chandar,
Neil Burch,
Marc Lanctot,
Francis Song,
Emilio Parisotto,
Vincent Dumoulin,
Subhodeep Moitra,
Edward Hughes,
Iain Dunning,
Shibl Mourad,
Hugo Larochelle,
Marc G. Bellemare,
Michael Bowling.
AIJ, 2019.
[paper]
[arxiv]
[bib]
-
Bounds and dynamics for empirical game theoretic analysis.
Karl Tuyls,
Julien Pérolat,
Marc Lanctot,
Edward Hughes,
Richard Everett,
Joel Z. Leibo,
Csaba Szepesvari,
Thore Graepel.
JAAMAS, 2019.
[paper]
[bib]
-
α-Rank: Multi-Agent Evaluation by Evolution.
Shayegan Omidshafiei,
Christos Papadimitriou,
Georgios Piliouras,
Karl Tuyls,
Mark Rowland,
Jean-Baptiste Lespiau,
Wojciech M. Czarnecki,
Marc Lanctot,
Julien Pérolat,
Remi Munos.
Nature Scientific Reports, 2019.
[paper]
[bib].
-
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
David Silver,
Thomas Hubert,
Julian Schrittwieser,
Ioannis Antonoglou,
Matthew Lai,
Arthur Guez,
Marc Lanctot,
Laurent Sifre,
Dharshan Kumaran,
Thore Graepel,
Timothy Lillicrap,
Karen Simonyan,
Demis Hassabis.
Science, 2018.
[paper]
[preprint pdf]
[blog post]
[bib]
-
Symmetric Decomposition of Asymmetric Games.
Karl Tuyls,
Julien Pérolat,
Marc Lanctot,
Georg Ostrovski,
Rahul Savani,
Joel Z. Leibo,
Toby Ord,
Thore Graepel,
Shane Legg.
Nature Scientific Reports, 2018.
[paper]
[arXiv]
[blog post]
[bib]
-
Algorithms for Computing Strategies in Two-Player Simultaneous Move Games.
Branislav Bosansky,
Viliam Lisy,
Marc Lanctot,
Jiri Cermak,
Mark H.M. Winands.
Artificial Intelligence, 2016.
[paper]
[preprint pdf]
[bib]
-
Mastering the Game of Go with Deep Neural Networks and Tree Search.
David Silver,
Aja Huang,
Chris J. Maddison,
Arthur Guez,
Laurent Sifre,
George van den Driessche,
Julian Schrittwieser,
Ioannis Antonoglou,
Veda Panneershelvam,
Marc Lanctot,
Sander Dieleman,
Dominik Grewe,
John Nham,
Nal Kalchbrenner,
Ilya Sutskever,
Timothy Lillicrap,
Madeleine Leach,
Koray Kavukcuoglu,
Thore Graepel,
Demis Hassabis.
Nature, 2016.
[paper]
[nature video]
[deepmind video]
[web]
[bib]
-
Real-time Monte-Carlo Tree Search in Ms Pac-Man.
Tom Pepels,
Mark H.M. Winands,
Marc Lanctot.
IEEE Transactions on Computational Intelligence and AI in Games, 2014.
[paper]
[bib]
-
Computing Approximate Nash Equilibria and Robust Best Responses Using Sampling.
Marc Ponsen,
Steven de Jong,
Marc Lanctot.
Journal of Artificial Intelligence Research, 2011.
[paper]
[bib]
-
Simulation-Based Planning in RTS Games.
Michael Buro,
Marc Lanctot,
Frantisek Sailer.
AI Game Programming Wisdom 4, Charles River Media, February 2008.
[bib]
Conference Papers
-
Approximating the Core via Iterative Coalition Sampling.
Ian Gemp, Marc Lanctot, Luke Marris, Yirin Mao, Edgar Duenez-Guzman, Sarah Perrin, Andras Gyorgy, Romuald Elie, Georgios Piliouras, Michael Kaisers, Daniel Hennes, Kalesha Bullard, Kate Larson, Yoram Bachrach.
AAMAS 2024.
[pdf]
[arXiv]
[code]
-
Neural Population Learning Beyond Symmetric Zero-Sum Games.
Siqi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess.
AAMAS 2024.
[pdf]
[arXiv]
-
Search-Improved Game-Theoretic Multiagent Reinforcement Learning in General and Negotiation Games.
Zun Li, Marc Lanctot, Kevin R McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P Wellman.
AAMAS 2023 (Extended Abstract).
[pdf]
[arXiv]
-
A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games.
Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer.
ICLR 2023.
[pdf]
[arXiv]
-
ESCHER: Eschewing Important Sampling in Games by Computing a History Value Function to Estimate Regret
Stephen McAleer, Gabriele Farina, Marc Lanctot, Thomas Sandholm.
ICLR 2023.
[pdf]
[arXiv]
-
Approximate Exploitability: Learning a Best Response.
Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, and Michael Bowling.
IJCAI 2022.
[paper]
[pdf]
[arXiv]
-
Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games.
Siqi Liu, Marc Lanctot, Luke Marris, and Nicolas Heess.
ICML 2022.
[paper]
[pdf]
[arXiv]
-
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent.
Ian Gemp, Rahul Savani, Marc Lanctot, Yoram Bachrach, Thomas Anthony, Richard Everett, Andrea Tacchetti, Tom Eccles, and János Kramár.
AAMAS 2022.
[pdf]
[arXiv]
-
Dynamic population-based meta-learning for multi-agent communication with natural language.
Abhinav Gupta, Marc Lanctot, and Angeliki Lazaridou.
NeurIPS 2021.
[paper]
[arXiv]
-
Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers.
Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, and Thore Graepel.
ICML 2021.
[pdf]
[slides]
[arXiv]
-
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games.
Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, and Amy Greenwald.
ICML 2021.
[pdf]
[slides]
[arXiv]
-
From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization.
Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls
ICML 2021.
[pdf]
[poster]
[arXiv]
-
Hindsight Rationality of Correlated Play.
Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright,
Amy Greenwald, Michael Bowling.
AAAI 2021.
[arXiv]
-
Solving Common-Payoff Games with Approximate Policy Iteration.
Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio,
Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot.
AAAI 2021.
[arXiv]
-
Learning to Play No-Press Diplomacy with Best Response Policy Iteration.
Thomas Anthony,
Tom Eccles,
Andrea Tacchetti,
János Kramár,
Ian Gemp,
Thomas C. Hudson,
Nicolas Porcel,
Marc Lanctot,
Julien Pérolat,
Richard Everett,
Roman Werpachowski,
Satinder Singh,
Thore Graepel,
Yoram Bachrach.
NeurIPS 2020.
[pdf]
[arXiv]
-
Fast Computation of Nash Equilibria in Imperfect Information Games.
Remi Munos,
Julien Perolat,
Jean-Baptiste Lespiau,
Mark Rowland,
Bart De Vylder,
Marc Lanctot,
Finbarr Timbers,
Daniel Hennes,
Shayegan Omidshafiei,
Audrunas Gruslys,
Mohammad Gheshlaghi Azar,
Edward Lockhart, and
Karl Tuyls.
ICML 2020.
[pdf]
-
Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients.
Daniel Hennes,
Dustin Morrill,
Shayegan Omidshafiei,
Remi Munos,
Julien Perolat,
Marc Lanctot,
Audrunas Gruslys,
Jean-Baptiste Lespiau,
Paavo Parmas,
Edgar Duenez-Guzman, and
Karl Tuyls.
AAMAS 2020.
[arXiv]
-
A Generalized Training Approach for Multiagent Learning.
Paul Muller,
Shayegan Omidshafiei
Mark Rowland,
Karl Tuyls
Julien Perolat,
Siqi Liu,
Daniel Hennes,
Luke Marris,
Marc Lanctot,
Edward Hughes,
Zhe Wang,
Guy Lever,
Nicolas Heess,
Thore Graepel,
Remi Munos.
ICLR 2020.
[arXiv]
-
Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent.
Edward Lockhart,
Marc Lanctot,
Julien Perolat,
Jean-Baptiste Lespiau,
Dustin Morrill,
Finbarr Timbers,
Karl Tuyls.
IJCAI 2019.
[pdf]
[arXiv]
[bib]
-
Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines.
Martin Schmid,
Neil Burch,
Marc Lanctot,
Matej Moravcik,
Rudolf Kadlec,
Michael Bowling.
AAAI 2019.
[pdf]
[arXiv]
[bib]
-
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments.
Sriram Srinivasan,
Marc Lanctot,
Vinicius Zambaldi,
Julien Perolat,
Karl Tuyls,
Remi Munos,
Michael Bowling.
NeurIPS 2018.
[pdf]
[arXiv]
[bib]
-
Emergent Communication through Negotiation.
Kris Cao,
Angeliki Lazaridou,
Marc Lanctot,
Joel Z. Leibo,
Karl Tuyls,
Stephen Clark.
ICLR 2018.
[pdf]
[arXiv]
[bib]
-
A Generalized Method for Empirical Game Theoretic Analysis.
Karl Tuyls,
Julien Perolat,
Marc Lanctot,
Joel Z. Leibo,
Thore Graepel.
AAMAS 2018.
[pdf]
[arXiv]
[bib]
-
Deep Q-learning from Demonstrations.
Todd Hester,
Matej Vecerik,
Olivier Pietquin,
Marc Lanctot,
Tom Schaul,
Bilal Piot,
Dan Horgan,
John Quan,
Andrew Sendonaris,
Gabriel Dulac-Arnold,
Ian Osband,
John Agapiou,
Joel Z. Leibo,
Audrunas Gruslys.
AAAI 2018.
[pdf]
[arXiv]
[bib]
-
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning.
Marc Lanctot,
Vinicius Zambaldi,
Audrunas Gruslys,
Angeliki Lazaridou,
Karl Tuyls,
Julien Perolat,
David Silver,
Thore Graepel.
NIPS 2017.
[pdf]
[arXiv]
[poster]
[bib]
-
Multi-agent Reinforcement Learning in Sequential Social Dilemmas.
Joel Z. Leibo,
Vinicius Zambaldi,
Marc Lanctot,
Janusz Marecki,
Thore Graepel.
AAMAS 2017.
[pdf]
[arXiv]
[blog post]
[bib]
-
Memory-Efficient Backpropagation through Time.
Audrunas Gruslys,
Remi Munos,
Ivo Danihelka,
Marc Lanctot,
Alex Graves.
NIPS 2016.
[arXiv]
-
Dueling Network Architectures for Deep Reinforcement Learning.
Ziyu Wang,
Tom Schaul,
Matteo Hessel,
Hado van Hasselt,
Marc Lanctot,
Nando de Freitas.
ICML 2016. Won best paper award.
[pdf]
[arXiv]
[bib]
-
Convolution by Evolution: Differentiable Pattern Producing Networks.
Chrisantha Fernando,
Dylan Banarse,
Malcolm Reynolds,
Frederic Besse,
David Pfau,
Max Jaderberg,
Marc Lanctot,
Daan Wierstra.
GECCO 2016.
[pdf]
[arXiv]
[bib]
-
Fictitious Self-Play in Extensive-Form Games.
Johannes Heinrich,
Marc Lanctot,
David Silver.
ICML 2015.
[pdf]
[presentation]
[bib]
-
Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games.
Viliam Lisy,
Marc Lanctot,
Michael Bowling.
AAMAS 2015.
[pdf]
[bib]
-
Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups.
Marc Lanctot,
Mark H.M. Winands,
Tom Pepels,
Nathan R. Sturtevant.
CIG 2014. Nominated for best paper award.
[pdf]
[arXiv]
[bib]
-
Monte Carlo Tree Search Variants for Simultaneous Move Games.
Mandy J.W. Tak,
Marc Lanctot,
Mark H.M. Winands.
CIG 2014.
[pdf]
[bib]
-
Quality-based Rewards for Monte-Carlo Tree Search Simulations.
Tom Pepels,
Mandy J.W. Tak,
Marc Lanctot,
Mark H.M. Winands.
ECAI 2014.
[pdf]
[bib]
-
Further Developments of Extensive-Form Replicator Dynamics using the Sequence-Form Representation.
Marc Lanctot.
AAMAS 2014.
[pdf]
[bib]
-
Monte Carlo Tree Search for Simultaneous Move Games: A Case Study in the Game of Tron.
Marc Lanctot,
Christopher Wittlinger,
Mark H.M. Winands,
Niek G.P. Den Teuling.
BNAIC 2013.
[pdf]
[bib]
-
Convergence of Monte Carlo Tree Search in Simultaneous Move Games.
Viliam Lisy,
Vojtech Kovarik,
Marc Lanctot,
Branislav Bosansky.
NIPS 2013.
[pdf]
[tech report]
[bib]
-
Improving Best-Reply Search.
Markus Esser,
Michael Gras,
Mark H.M. Winands,
Maarten P.D. Schadd,
Marc Lanctot.
CG 2013.
[pdf]
[bib]
-
Monte Carlo *-Minimax Search.
Marc Lanctot,
Abdallah Saffidine,
Joel Veness,
Chris Archibald,
Mark H.M. Winands.
IJCAI 2013.
[pdf]
[tech report]
[bib]
-
Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions.
Richard Gibson,
Neil Burch,
Marc Lanctot,
Duane Szafron.
NIPS 2012.
[pdf]
[bib]
-
No-Regret Learning in Extensive-Form Games with Imperfect Recall.
Marc Lanctot,
Richard Gibson,
Neil Burch,
Martin Zinkevich,
Michael Bowling.
ICML 2012.
[pdf]
[tech report]
[presentation]
[bib]
-
Generalized Sampling and Variance in Counterfactual Regret Minimization.
Richard Gibson,
Marc Lanctot,
Neil Burch,
Duane Szafron,
Michael Bowling.
AAAI 2012.
[pdf]
[tech report]
[bib]
-
Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization.
Michael Johanson,
Nolan Bard,
Marc Lanctot,
Richard Gibson,
Michael Bowling.
AAMAS 2012. Nominated for best paper award.
[pdf]
[bib]
-
Variance Reduction in Monte Carlo Tree Search.
Joel Veness,
Marc Lanctot,
Michael Bowling.
NIPS 2011.
[pdf]
[bib]
-
Monte Carlo Sampling for Regret Minimization in Extensive Games.
Marc Lanctot,
Kevin Waugh, Martin Zinkevich,
Michael Bowling.
NIPS 2009.
[pdf]
[appendix]
[poster]
[tech report]
[COLT'09 workshop presentation]
[bib]
-
The Second Annual Real-Time Strategy AI Competition.
Michael Buro,
Marc Lanctot, Sterling Orsten.
GameOn'NA 2007.
[pdf]
[bib]
-
Adversarial Planning Through Strategy Simulation.
Frantisek Sailer, Michael Buro,
Marc Lanctot.
CIG 2007.
[pdf]
[bib]
-
Path-finding for Large Scale Multi-player Games.
Marc Lanctot,
Nicolas NgManSun,
Clark Verbrugge.
GameOn'NA 2006.
[pdf]
[bib]
-
Adaptive Virtual Environments in Multi-player Computer Games.
Marc Lanctot,
Clark Verbrugge.
GameOn 2004.
[pdf]
[bib]
Theses
-
Monte Carlo Sampling and Regret Minimization for Equilibrium Computation and Decision-Making in Large Extensive Form Games.
Ph.D. Thesis, University of Alberta, Computing Science Dept. (2013).
[pdf]
[bib]
-
Adaptive Virtual Environments in Multi-player Computer Games.
MSc. Thesis, McGill CS Dept. (2005).
[pdf]
[bib]
Page last updated: Jul 28th, 2024