Bilkent University Logo

Bilkent University

Online Academic Catalog

Undergraduate and Graduate Programs 2023-2024


IE 456 Reinforcement Learning and Dynamic Programming

Markov chains. Markov decision processes. Dynamic programming: policy iteration, value iteration. Model-free reinforcement learning: Monte Carlo, Temporal Difference, Q- learning. Policy gradient methods. Model-based reinforcement learning: classical multi- armed bandits, stochastic multi-armed bandits, adversarial multi-armed bandits. Credit units: 3 ECTS Credit units: 5, Prerequisite: (MATH 255 or MATH 230 or MATH 250) and (MATH 241 or MATH 225 or MATH 220 or MATH 224).

Autumn Semester (Staff)

Bilkent University Main Page

Last regenerated automatically on April 25, 2024 by OAC - Online Academic Catalog Software