TY - JOUR
T1 - A new AI evaluation cosmos
T2 - Ready to play the game?
AU - Hernández-Orallo, José
AU - Baroni, Marco
AU - Bieger, Jordi
AU - Chmait, Nader
AU - Dowe, David L.
AU - Hofmann, Katja
AU - Martínez-Plumed, Fernando
AU - Strannegård, Claes
AU - Thórissons, Kristinn R.
PY - 2017/9/1
Y1 - 2017/9/1
N2 - Through the integration of more and better techniques, more computing power, and the use of more diverse and massive sources of data, AI systems are becoming more flexible and adaptable, but also more complex and unpredictable. There is thus increasing need for a better assessment of their capacities and limitations, as well as concerns about their safety (Amodei et al. 2016). Theoretical approaches might provide important insights, but only through experimentation and evaluation tools will we achieve a more accurate assessment of how an actual system operates over a series of tasks or environments. Several AI experimentation and evaluation platforms have recently appeared, setting a new cosmos of AI environments. These facilitate the creation of various tasks for evaluating and training a host of algorithms. The platform interfaces usually follow the reinforcement learning (RL) paradigm, where interaction takes place through incremental observations, actions, and rewards. This is a very general setting and seemingly every possible task can be framed under it.
AB - Through the integration of more and better techniques, more computing power, and the use of more diverse and massive sources of data, AI systems are becoming more flexible and adaptable, but also more complex and unpredictable. There is thus increasing need for a better assessment of their capacities and limitations, as well as concerns about their safety (Amodei et al. 2016). Theoretical approaches might provide important insights, but only through experimentation and evaluation tools will we achieve a more accurate assessment of how an actual system operates over a series of tasks or environments. Several AI experimentation and evaluation platforms have recently appeared, setting a new cosmos of AI environments. These facilitate the creation of various tasks for evaluating and training a host of algorithms. The platform interfaces usually follow the reinforcement learning (RL) paradigm, where interaction takes place through incremental observations, actions, and rewards. This is a very general setting and seemingly every possible task can be framed under it.
UR - http://www.scopus.com/inward/record.url?scp=85030545578&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85030545578
SN - 0738-4602
VL - 38
SP - 66
EP - 69
JO - AI Magazine
JF - AI Magazine
IS - 3
ER -