Towards a fault-tolerant multi-agent system architecture

Sanjeev Kumar, Philip R. Cohen

Research output: Contribution to conferencePaperpeer-review

58 Citations (Scopus)


Multi-agent systems are prone to failures typical of any distributed system. Agents and resources may become unavailable due to machine crashes, communication breakdowns, process failures, and numerous other hardware and software failures. Most of the work done in fault handling in multi-agent systems deals with detection and recovery from faults such as state-inconsistencies, relying on the traditional techniques for recovering from other distributed systems failure. However, the traditional fault-tolerance techniques are designed for specific situations and they require special infrastructural support. We argue for fault-tolerance techniques that can be readily implemented using generic agents with minimal or no modification to the agent infrastructure. We propose that theories from multi-agent systems literature can be effectively combined with basic fault-tolerance principles to design robust multi-agent systems. In particular, we argue that (1) teamwork can be used to create a robust brokered architecture that will recover a multi-agent system from broker failures without incurring undue overheads, (2) teamwork can also be used to guarantee a specified number of brokers in a large multi-agent system, and (3) agent autonomy can be used to prevent thrashing and guarantee acceptable levels of quality of service by an agent. To validate our approach, we present experimental evidence using the Adaptive Agent Architecture (AAA).

Original languageEnglish
Number of pages8
Publication statusPublished - 3 Dec 2000
Externally publishedYes
EventInternational Conference on Autonomous Agents 2000 - Barcelona, Spain
Duration: 3 Jun 20007 Jun 2000
Conference number: 4th (Proceedings)


ConferenceInternational Conference on Autonomous Agents 2000
Abbreviated titleAGENTS 2000
Internet address

Cite this