OMNI:
Open-endedness via Modeling human Notions of Interestingness
Abstract
Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast environment search space, but there are thus infinitely many possible tasks. Even after filtering for tasks the current agent can learn (i.e., learning progress), countless learnable yet uninteresting tasks remain (e.g., minor variations of previously learned tasks). An Achilles Heel of open-endedness research is the inability to quantify (and thus prioritize) tasks that are not just learnable, but also interesting (e.g., worthwhile and novel). We propose solving this problem by Open-endedness via Models of human Notions of Interestingness (OMNI). The insight is that we can utilize large (language) models (LMs) as a model of interestingness (MoI), because they already internalize human concepts of interestingness from training on vast amounts of human-generated data, where humans naturally write about what they find interesting or boring. We show that LM-based MoIs improve open-ended learning by focusing on tasks that are both learnable and interesting, outperforming baselines based on uniform task sampling or learning progress alone. This approach has the potential to dramatically advance the ability to intelligently select which tasks to focus on next (i.e., auto-curricula), and could be seen as AI selecting its own next task to learn, facilitating self-improving AI and AI-Generating Algorithms.
Method
Provided that the real, significant challenges of AI safety and existential risk can be solved, there are tremendous gains to be had by creating more powerful AI or even AGI. Our approach combines a learning progress auto-curriculum and a model of interestingness, to train a Reinforcement Learning (RL) agent in a task-conditioned manner.
Learning Progress Curriculum
The task pool in open-ended environments can be very large and diverse, making it challenging for an agent to learn effectively through uniform sampling. Most randomly sampled tasks are likely to be too easy or too difficult for the agent. To automatically identify tasks at the frontier of the agent's capabilities, we extend the learning-progress-based curriculum (LP) from Kanitscheider et al. The high-level idea is for the curriculum to predominantly sample tasks with high learning progress, defined as an agent's recent change in task success probability.
Modeling what Humans Find Interesting
This paper capitalizes on the capabilities of autoregressive LMs, specifically GPT-3 and GPT-4, to emulate human notions of interestingness. LMs are pretrained on vast and diverse text corpora, enabling them to amass a significant amount of world knowledge. The LMs are prompted in a few-shot manner by providing it with a few examples of choosing which tasks are interesting. It takes into account the agent's existing proficiency on a given set of tasks and suggests what humans would typically find interesting. The input prompt consists of several components:
- Directives encouraging interestingly different behaviors, such as "The ultimate goal that [the agent] would like your help with is to learn as many interestingly different skills as possible ..."
- Environment description, including the possible objects in the environment, and how a task in the environment is specified
- Tasks that the agent has done well and tasks to predict the interestingness of
Experimental results
OMNI significantly outperforms baselines based on uniform sampling or learning progress alone. Uniform sampling samples all tasks with equal probabilities. Uniform sampling is the most naive and samples tasks that are too easy or too difficult most of the time. LP is distracted by the many boring tasks. OMNI: LP + MoI focuses on the subset of tasks with high learning progress that are also interesting.
Conclusion
In conclusion, our work demonstrates the potential of using an MoI to significantly enhance auto-curricula and the quest for open-ended learning algorithms by intelligently focusing on learnable and interesting tasks. In the long run, it hints at a synergy between LMs and open-endedness that simultaneously addresses looming challenges for both: how will LMs ultimately rise to the level of creativity seen in the best of human innovation, and how will open-endedness overcome the trap of diverging into a vast space of uninspiring mediocrity? By playing off each other’s strengths, LMs can perhaps someday become essential engines of open-ended discovery and begin to participate in the creative dance that has defined civilization since its inception.
Citation
@article{zhang2023omni, title={OMNI: Open-endedness via Models of human Notions of Interestingness}, author={Jenny Zhang and Joel Lehman and Kenneth Stanley and Jeff Clune}, year={2023}, journal={arXiv preprint arXiv:2306.01711}, }
Acknowledgements
This work was supported by the Vector Institute, a grant from Schmidt Futures, an NSERC Discovery Grant, and a generous donation from Rafael Cosman. We also thank Andrew Dai, Cedric Colas and members in our lab at the University of British Columbia, namely Aaron Dharna, Ben Norman, and Shengran Hu, for insightful discussions and feedback.
The website template was borrowed from Jon Barron.