VYRE Company:Blog

Consistency in distributed systems Part 3 of 3

22.06.2010 17:36 ( 0 comments )

 

By Zak Croft

 
This week we continue with a look at client centric models and consistency protocols

Client-centric models

Client centric models, as stated earlier, concerns itself with the client view of the data returned so relaxes the recommendations that the system view of the system be consistent. Models put forward are monotonic reads, monotonic writes, read your writes and writes follow reads. These can fall into the category of causal or eventual consistency with regards to the way the state of the system wide view of the data is maintained before and after operations. These models can be summed up succinctly in single sentences and are as follows.

  1. Monotonic reads define that when any read on a datum occurs by a process, any following read on the datum will be the same or a more recent updated datum. 
  2. Monotonic writes define that any write operation by a process is finalized before any proceeding write by the same process can take place.
  3. Read your writes defines that any process that writes should always see this write on proceeding reads.
  4. Write follows reads define that any write made by a process will always occur on any preceding reads on that datum.

 

This in essence sums up these four models. The way that any individual or combination of each of these occur is down to the replica that is used when interacting with the system. If the same or a more recently updated replica is used every time a read and write is carried out then any of these models together can be realised. This shows why and how these kinds of models are client-centric. They can only be realised if the consistency model takes the view of a user using the system. These consistency models cannot be maintained on a system wide view and so pertain to an individual temporal view of the data. This does not cater for reliable data but again greatly enhances performance and when reliability can be traded for performance then these models are strategically beneficial. With these models now defined we will be able to see how using consistency protocols can be used to implement these models.

Consistency protocols

This article will not delve deep into these protocols as they are of great depth and complexity. This article will try to summarise a few of these protocols so a broad view can be obtained.  One of the protocols that pertain to strong and weaker forms of consistency systems is the bounding numerical deviation protocol. In essence this protocol states that a replica will keep track of its own writes with a log. When an update is performed on another replica, this other replica will propagate the update to the next replica, where this next replica will propagate it to the next. This is known as an epidemic protocol. When the update arrives, at any particular replica, the updates take place and that replicas log is updated. If the forwarding replica notices that the log of the replica it is forwarding to has an outdated log then the log, of the forwarding replica, is also forwarded. The receiving replica can then obtain the missing updated writes and so become consistent with the other replicas. This then ensures at all times replicas will be updated to the latest data value with-in the bounds of the degree of consistency specified by that model.

One of the protocols that pertain to the sequential consistency model is the remote write protocol. This is a type of primary based protocol. The idea of a primary based protocol is that there is a single replica that all writes by other replicas, are written too. This replica, known as the primary replica, co-ordinates and sends updates to all the other replicas. When the other replicas have been updated they send an acknowledgment back to the primary replica. Then the primary replica sends an acknowledgment back to the process that performed the write initially. This has problems as the process that performed the write has to wait for the acknowledgment to return before continuing. This is therefore known as a blocking protocol. A non blocking protocol would be where the initial process could continue without obtaining the acknowledgment. This has the problems that there is no guarantee that the other replicas have been updated, so the system suffers from unreliability issues. If a non blocking protocol is used then the primary replica can create a global time order of the writes so all other replicas will see this ordering. This is why this protocol pertains to the sequential consistency model. Therefore this protocol has the trade of between performance and reliability of data.

The last consistency protocol to look at is a protocol to implement a client-centric model. This is known as a naive implementation. This entails that write operation made by a user are given system wide unique identifiers. Write sets that are the identifiers and read sets that are the relevant writes, to be read for that user, are used in conjunction to give the user a consistent view of the data. All of the client-centric protocols can be implemented using this protocol, however this implementation can become unwieldy, in reference to the amount of read and write sets created. Another way of implementing this is to use sessions and timestamps. Timestamps are returned by the server with each write and associated with a session. Then the timestamp is used to locate the required read set of the latest write set with that timestamp from a particular server. This allows for a more compact and efficient representation of the writes.  This summarises a few of the consistency protocols available but there are a great deal more that cannot be covered in the article.

Conclusion

In conclusion it can be seen that there are a great many consistency models that can be implemented using a great many consistency protocols. Data-centric models or client-centric models are available and are concerned with either the system wide view of the data or how the user sees the data during their time interacting with the system.  What model needs to be implemented can be dictated by the type of system being built and should take into consideration the need for the reliability to performance trade off that it brings. If the view of the data needs to be definitive then strong models should be used but performance issues become apparent. Weaker models can be used when a need for the definitive view of the data can be relaxed. This then brings greater performance but decreased reliability. Weaker models are client centric so bring the greatest performance to the system. There are many protocols that specify how these models can be implemented and each can be specific to one or a group of models. Consistency is a very broad subject with great depth and this paper has only touched the surface of some of the models and protocols. In essence consistency is an important part of any implementation of a distributed system and so applying of the right model for the system in question is paramount. 

 

And this concludes this three part series on consistency in distributed systems.

 

Comments