Server replication is a common fault-tolerance strategy to improve transaction dependability for services in communications networks. In distributed architectures, fault-diagnosis and recovery are implemented via the interaction of the server replicas with the clients and other entities such as enhanced name servers. Such architectures provide an increased number of redundancy configuration choices. The influence of a (wide area) network connection can be quite significant and induce trade-offs between dependability and user-perceived performance. This paper develops a quantitative stochastic model using stochastic activity networks (SAN) for the evaluation of performance and dependability metrics of a generic transaction-based service implemented on a distributed replication architecture. The composite SAN model can be easily adapted to a wide range of client-server applications deployed in replicated server architectures. In order to obtain insight into the system behaviour, a set of relevant environment parameters and controllable fault-tolerance parameters are chosen and the dependability/performance trade-off is evaluated.
International Journal of Critical Computer-based Systems, 2013, Vol 4, Issue 2, p. 144-172
Dependability; availability; model-based evaluation; stochastic activity networks; distributed architectures; fault tolerance; replicated server access; server replication; fault diagnosis; fault recovery; wide area networks; WANs; stochastic modelling.