Aggarwal Leads NSF Grant
Many companies, government agencies, and individuals make sequences of challenging decisions over time, for which they must choose from among many possible options. They may have limited knowledge about the outcomes of their decisions, and will receive limited feedback. For example, search engines and content providers make decisions for what sets of websites, products, or media to recommend each time a user logs on to their system or submits a query, in some cases having limited knowledge of the users’ underlying preferences. If users’ privacy is protected, then only users’ past actions, such which links or media were selected by earlier users will be available as feedback to inform the search engine or content provider on what to recommend next.
This project aims to develop provably good strategies that decision makers can use in such settings, aiding their decision making under uncertainty and with limited feedback. This project will also develop strategies for the more challenging setting where multiple decision makers must coordinate with each other on such problems, but have limited communication available to do so. Furthermore, this project will support undergraduate and graduate research training, as well as graduate-level course development, in machine learning and artificial intelligence, preparing students for careers in advanced, technical fields.
The goal of this project is to develop novel, provably good strategies for solving sequential decision problems (multi-armed bandit problems) when the actions available have a combinatorial structure (such as choosing subsets of products to recommend), the rewards have a diminishing returns property (submodularity), and there is no side-information available—the only feedback comes from the reward itself. The proposed work builds on the rich literatures of multi-armed bandits and of submodular optimization. The technical aims of the project are divided into two thrusts. The first thrust focuses on developing algorithms and identifying their regret bounds for combinatorial multi-armed bandit problems with submodular rewards and no additional feedback. The second thrust extends those strategies and regret analyses to a decentralized setting, where multiple agents coordinate to solve combinatorial multi-armed bandit problems, despite limited resources for communication.
To be kept up to date about this award, please contact Dr. Aggarwal.
Congratulations, Dr. Aggarwal, for your team's recent success!