- > Company
- > Company Blog
- > Blog Detail
Consistency in distributed systems Part 1 of 3
09.06.2010 14:54 ( 0 comments )By Zak Croft
The subject of consistency in distributed systems is a broad and intricate subject that has great importance in the operation and implementation of large scale distributed systems. One of the characteristics of a distributed system is the idea that the system should be transparent. Transparency denotes that the system should seem to the user that the many nodes that make up the distributed system are a single coherent system. This is known as distributed transparency. Problems then arise when replicas of data are held on different nodes and are accessed and manipulated by different nodes and users either simultaneously or consecutively. More importantly, when systems are scaled, maintaining consistency becomes paramount as many replicas of the data can come into existence. Inconsistent data could therefore be created across the system so instigating a failure in distributed transparency. In the past, implementations of distributed systems would permit the system to fail the user if inconsistencies were present in requested data. Therefore the availability of the system was impaired due to these inconsistencies. Recently a different stance on the issue of consistency in distributed systems has suggested that availability should be more important than a consistent view of the data. Therefore ways to handle the trade off between availability and a consistent data view, which could also be defined as performance and reliability, was needed. Consistency models and protocols were therefore defined that dealt with this issue and so have become an important part of distributed systems. Consistency has many aspects in the field of distributed systems, therefore it is important to define these aspects before continuing.
Consistency defined
Consistency in distributed systems refers to keeping replicated data that is stored on nodes, either in cache, memory or hard drives across multiple machines, identical to a degree. This means that data can be maintained to a degree of consistency for any particular system or scenario. If reads or writes occur to any node's data then the degree of consistency maintained in the system will dictate how these writes are propagated throughout the system and also when and how users read this updated data. Consistency models have been put forward that suggest different ways that define what can be expected by processes when reading and writing data. There are a great many consistency models that can be applied, depending on the strategy determined by the needs of the system and some of these are discussed in this article. How a specific consistency model is implemented is described in a consistency protocol. These allow for the theory encapsulated in the model to be realised using the technologies that make up a distributed system. First, however, we need to understand consistency models.
Consistency Models
Consistency models can be defined as a contract between processes and the data stores that says that if a process agrees to obey certain rules, the store promises to work correctly. Here a store means a data store and pertains to the global view of the data. Many models have been suggested and each has benefits and disadvantages depending on the requirements of the system. Models can be segregated into data-centric and client-centric models. Data centric models can range from strong to weak and concentrate on the state of the data through-out the system having levels of consistency before and after read or write operations. Client-centric models are weaker models and concentrate on the situations where most operations on a data store are reads and are not simultaneously updates. These can concentrate on what an individual user will need in order for their present view of the data to remain consistent.
Next week we will start to look at these models, starting with Data centric models.

Comments