How to Select a Replication Protocol According to Scalability, Availability, and Communication Overhead.


R. Jiménez-Peris, M. Patiño-Martínez, G. Alonso, B. Kemme
Abstract:
Data replication is playing an increasingly important role in the design of parallel information systems. In particular, the widespread use of cluster architectures in high-performance computing has created many opportunities for applying data replication techniques in new areas. For instance, as part of work related to cluster computing in bioinformatics, we have been confronted with the problem of having to chose an optimal replication strategy in terms of scalability, availability, and communication overhead. Thus, we have evaluated several representative replication protocols in order to better understand their behavior in practice. The results obtained are surprising in that they challenge many of the assumptions behind existing protocols. Our evaluation indicates that the conventional read-one/write-all approach is the best choice for a large range of applications requiring data replication. We believe this is an important result for anybody developing code for computing clusters as the read-one/write-all strategy is much simpler to implement and more flexible than quorum-based approaches. In this paper we show that, in addition, it is also the best choice using a number of other selection criteria.
Proc. of the 20th IEEE Int. Conf. on Reliable Distributed Systems, SRDS'01, New Orleans, Oct. 2001

Click to get the PostScript , Gzipped PostScript. Pdf Version