A new AI evaluation cosmos

Ready to play the game?

José Hernández-Orallo, Marco Baroni, Jordi Bieger, Nader Chmait, David L. Dowe, Katja Hofmann, Fernando Martínez-Plumed, Claes Strannegård, Kristinn R. Thórissons

Research output: Contribution to journalArticleOtherpeer-review

4 Citations (Scopus)

Abstract

Through the integration of more and better techniques, more computing power, and the use of more diverse and massive sources of data, AI systems are becoming more flexible and adaptable, but also more complex and unpredictable. There is thus increasing need for a better assessment of their capacities and limitations, as well as concerns about their safety (Amodei et al. 2016). Theoretical approaches might provide important insights, but only through experimentation and evaluation tools will we achieve a more accurate assessment of how an actual system operates over a series of tasks or environments. Several AI experimentation and evaluation platforms have recently appeared, setting a new cosmos of AI environments. These facilitate the creation of various tasks for evaluating and training a host of algorithms. The platform interfaces usually follow the reinforcement learning (RL) paradigm, where interaction takes place through incremental observations, actions, and rewards. This is a very general setting and seemingly every possible task can be framed under it.

Original languageEnglish
Pages (from-to)66-69
Number of pages4
JournalAI Magazine
Volume38
Issue number3
Publication statusPublished - 1 Sep 2017

Cite this

Hernández-Orallo, J., Baroni, M., Bieger, J., Chmait, N., Dowe, D. L., Hofmann, K., ... Thórissons, K. R. (2017). A new AI evaluation cosmos: Ready to play the game? AI Magazine, 38(3), 66-69.
Hernández-Orallo, José ; Baroni, Marco ; Bieger, Jordi ; Chmait, Nader ; Dowe, David L. ; Hofmann, Katja ; Martínez-Plumed, Fernando ; Strannegård, Claes ; Thórissons, Kristinn R. / A new AI evaluation cosmos : Ready to play the game?. In: AI Magazine. 2017 ; Vol. 38, No. 3. pp. 66-69.
@article{afdf04141998414c90baa602fd423251,
title = "A new AI evaluation cosmos: Ready to play the game?",
abstract = "Through the integration of more and better techniques, more computing power, and the use of more diverse and massive sources of data, AI systems are becoming more flexible and adaptable, but also more complex and unpredictable. There is thus increasing need for a better assessment of their capacities and limitations, as well as concerns about their safety (Amodei et al. 2016). Theoretical approaches might provide important insights, but only through experimentation and evaluation tools will we achieve a more accurate assessment of how an actual system operates over a series of tasks or environments. Several AI experimentation and evaluation platforms have recently appeared, setting a new cosmos of AI environments. These facilitate the creation of various tasks for evaluating and training a host of algorithms. The platform interfaces usually follow the reinforcement learning (RL) paradigm, where interaction takes place through incremental observations, actions, and rewards. This is a very general setting and seemingly every possible task can be framed under it.",
author = "Jos{\'e} Hern{\'a}ndez-Orallo and Marco Baroni and Jordi Bieger and Nader Chmait and Dowe, {David L.} and Katja Hofmann and Fernando Mart{\'i}nez-Plumed and Claes Stranneg{\aa}rd and Th{\'o}rissons, {Kristinn R.}",
year = "2017",
month = "9",
day = "1",
language = "English",
volume = "38",
pages = "66--69",
journal = "AI Magazine",
issn = "0738-4602",
publisher = "Association for the Advancement of Artificial Intelligence (AAAI)",
number = "3",

}

Hernández-Orallo, J, Baroni, M, Bieger, J, Chmait, N, Dowe, DL, Hofmann, K, Martínez-Plumed, F, Strannegård, C & Thórissons, KR 2017, 'A new AI evaluation cosmos: Ready to play the game?', AI Magazine, vol. 38, no. 3, pp. 66-69.

A new AI evaluation cosmos : Ready to play the game? / Hernández-Orallo, José; Baroni, Marco; Bieger, Jordi; Chmait, Nader; Dowe, David L.; Hofmann, Katja; Martínez-Plumed, Fernando; Strannegård, Claes; Thórissons, Kristinn R.

In: AI Magazine, Vol. 38, No. 3, 01.09.2017, p. 66-69.

Research output: Contribution to journalArticleOtherpeer-review

TY - JOUR

T1 - A new AI evaluation cosmos

T2 - Ready to play the game?

AU - Hernández-Orallo, José

AU - Baroni, Marco

AU - Bieger, Jordi

AU - Chmait, Nader

AU - Dowe, David L.

AU - Hofmann, Katja

AU - Martínez-Plumed, Fernando

AU - Strannegård, Claes

AU - Thórissons, Kristinn R.

PY - 2017/9/1

Y1 - 2017/9/1

N2 - Through the integration of more and better techniques, more computing power, and the use of more diverse and massive sources of data, AI systems are becoming more flexible and adaptable, but also more complex and unpredictable. There is thus increasing need for a better assessment of their capacities and limitations, as well as concerns about their safety (Amodei et al. 2016). Theoretical approaches might provide important insights, but only through experimentation and evaluation tools will we achieve a more accurate assessment of how an actual system operates over a series of tasks or environments. Several AI experimentation and evaluation platforms have recently appeared, setting a new cosmos of AI environments. These facilitate the creation of various tasks for evaluating and training a host of algorithms. The platform interfaces usually follow the reinforcement learning (RL) paradigm, where interaction takes place through incremental observations, actions, and rewards. This is a very general setting and seemingly every possible task can be framed under it.

AB - Through the integration of more and better techniques, more computing power, and the use of more diverse and massive sources of data, AI systems are becoming more flexible and adaptable, but also more complex and unpredictable. There is thus increasing need for a better assessment of their capacities and limitations, as well as concerns about their safety (Amodei et al. 2016). Theoretical approaches might provide important insights, but only through experimentation and evaluation tools will we achieve a more accurate assessment of how an actual system operates over a series of tasks or environments. Several AI experimentation and evaluation platforms have recently appeared, setting a new cosmos of AI environments. These facilitate the creation of various tasks for evaluating and training a host of algorithms. The platform interfaces usually follow the reinforcement learning (RL) paradigm, where interaction takes place through incremental observations, actions, and rewards. This is a very general setting and seemingly every possible task can be framed under it.

UR - http://www.scopus.com/inward/record.url?scp=85030545578&partnerID=8YFLogxK

M3 - Article

VL - 38

SP - 66

EP - 69

JO - AI Magazine

JF - AI Magazine

SN - 0738-4602

IS - 3

ER -

Hernández-Orallo J, Baroni M, Bieger J, Chmait N, Dowe DL, Hofmann K et al. A new AI evaluation cosmos: Ready to play the game? AI Magazine. 2017 Sep 1;38(3):66-69.