The goal of the project is to provide a better
understanding of fault tolerance/performance trade-offs, in particular:
- What are the overheads in the absence of failures?
- How fast is the failover mechanism with different strategies?
- How are these affected by application size (state) and number of replicas
and by choice of middleware?
Two platforms have been built: one based on the
OMG FT-CORBA standard and the other grounded on a consensus-based
algorithm that provides full availability (FA-CORBA).
The FT-CORBA architecture:
The FT-CORBA platform provides tolerance against application crashes, but
not infrastructure crashes. The FA-CORBA platform, on the other hand, is robust in both
Fault Monitor (FM)
Object Factory (OF)
Replication Manager (RM)
Fault Notifier (FN)
Logging and Recovery Controller (LRC)
The FA-CORBA architecture:
Leader Election Unit (LEU)
Consensus Object (CO)
Server interceptor (SI)
Application Server Object (ASO)
Application Client Object (ACO)
Evaluation results based on a telecom application
server have provided insights into the overheads for providing various
degrees of availability.
For more details see the publications.