Exploiting and generalizing \\ Epistemic uncertainty in reinforcement learning and planning

Likmeta, Amarildo (2024) Exploiting and generalizing \\ Epistemic uncertainty in reinforcement learning and planning, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Data science and computation, 35 Ciclo. DOI 10.48676/unibo/amsdottorato/11445.
Documenti full-text disponibili:
[img] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons Attribution Non-commercial ShareAlike 4.0 (CC BY-NC-SA 4.0) .
Download (12MB)

Abstract

Solving sequential decision-making problems with complex and non-linear dynamics has been a goal of Artificial Intelligence since the conception of the field. Reinforcement Learning (RL) offers an general framework for solving such problems. Its approach learning by direct interaction with the environment, allowing for speculation on the value of candidate solutions, testing, and counter-factual reasoning, has allowed researchers to achieve remarkable achievements in a multitude of challenging problems both simulated and real-world. Nonetheless, the successful application of RL to new problems requires a large degree of task-specific tuning. One of the main open challenges in RL remains the exploration-exploitation dilemma. An agent that optimizes a cumulative objective in an unknown environment while learning faces the question of whether to trust the current information gathered and exploit it by executing the best-known strategies or take explorative strategies to gather more information with the hope of finding better strategies. The exploration problem has been thoroughly studied in the literature, and a multitude of solutions have been given for tabular domains or continuous domains with known structure. However, when moving to complex domains where neural networks are employed as function approximators, deep and directed exploration is still a challenge. In this dissertation, we tackle the exploration problem in RL by proposing Wasserstein TD-Learning (WTD), a novel framework that models the uncertainty over the value function in a model-free manner and propagates it across the state-action space by employing variational updates that allow us enough control over the updates to show some desirable theoretical properties in the tabular setting while allowing the method to be easily scalable in the DeepRL setting. This allows us to adapt WTD in a multitude of different settings by adapting algorithms from the literature to handle the distributional nature of our value function, allowing for deep and directed exploration.

Abstract
Tipologia del documento
Tesi di dottorato
Autore
Likmeta, Amarildo
Supervisore
Dottorato di ricerca
Ciclo
35
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
sequential decision-making, reinforcement learning, online planning, epistemic uncertainty, monte-carlo tree search
URN:NBN
DOI
10.48676/unibo/amsdottorato/11445
Data di discussione
21 Giugno 2024
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi

^