Towards a fault-tolerant multi-agent system architecture

Sanjeev Kumar, Philip R. Cohen

Research output: Contribution to conferencePaper

62 Citations (Scopus)


Multi-agent systems are prone to failures typical of any distributed system. Agents and resources may become unavailable due to machine crashes, communication breakdowns, process failures, and numerous other hardware and software failures. Most of the work done in fault handling in multi-agent systems deals with detection and recovery from faults such as state-inconsistencies, relying on the traditional techniques for recovering from other distributed systems failure. However, the traditional fault-tolerance techniques are designed for specific situations and they require special infrastructural support. We argue for fault-tolerance techniques that can be readily implemented using generic agents with minimal or no modification to the agent infrastructure. We propose that theories from multi-agent systems literature can be effectively combined with basic fault-tolerance principles to design robust multi-agent systems. In particular, we argue that (1) teamwork can be used to create a robust brokered architecture that will recover a multi-agent system from broker failures without incurring undue overheads, (2) teamwork can also be used to guarantee a specified number of brokers in a large multi-agent system, and (3) agent autonomy can be used to prevent thrashing and guarantee acceptable levels of quality of service by an agent. To validate our approach, we present experimental evidence using the Adaptive Agent Architecture (AAA).

Original languageEnglish
Number of pages8
Publication statusPublished - 3 Dec 2000
Externally publishedYes
Event4th International Conference on Autonomous Agents - Barcelona, Spain
Duration: 3 Jun 20007 Jun 2000


Conference4th International Conference on Autonomous Agents
CityBarcelona, Spain

Cite this