About Distributed Systems (ADS)

Why Distributed Systems Are Necessary?

Because it has main features such as; can distribute autonomous computers geographically and can communicate using cable, fiber or wireless connections. That way, we can get benefits such as; allows for communication, cooperation and sharing of resources.

Other benefits include reducing costs, increasing data/information availability and improving performance.

Understanding Distributed Systems

Module Objectives

Introduces the principles and concepts involved in distributed systems design,
To become more familiar with the mechanisms and protocols for inter-process communication.
Provides an overview of fundamental problems and problem-solving techniques (creating solutions).

Introduction

An introduction to distributed systems includes:

Definition,
Examples Based on Typical / Characteristics
Challenge.

1. Definition

A distributed system is a collection of autonomous computers that are interconnected by a computer network and equipped with supporting devices such as distributed system hardware and software to form an integrated computing facility.

The process includes:

Concurrent execution,
Interact in order to work together to achieve common goals,
Coordinating activities and exchanging information by sending messages via communication networks.

1.1 Importance of Distributed Systems

The importance of distributed systems is very much felt by most organizations, including:

Banking, Transportation and Telecommunications

With a distributed system, the performance of each organization becomes more efficient, for example; a bank customer who wants to apply for a new debit card, because it was swallowed while being operated at an ATM outside the city (not its original place). Then the customer takes care of it at the nearest branch office of the related bank, then should the bank request the customer's archive at the head office via post? or send members to pick up the ball there? This is very ridiculous if it does happen and I think it is impossible, because now is the era of distributed systems, not the dinosaur era.

So what do the Bank or even other organizations that utilize distributed systems do? For more information, please see the next material.

2. Examples (Based on Typical / Characteristics)

2.1 Internet

A network of computers that are connected globally and communicate with each other using IP (Internet Protocol).

So the internet is an example of a distributed system model on the largest scale, because it covers countries (global).

Just a note:
Then we were given a simple but slightly misleading question, something like this: Which is bigger, the Internet or a Computer Network?
The answer is the big Computer Network, because the Internet is part of the Computer Network, while the Computer Network is not only the Internet, but there are also LAN, WAN, Intranet etc. and because, the question of Computer Network is not always related to the Internet, for example LAN.

2.2 Intranet

A restricted network that is separately restricted by local security division policies.

For example, LAN (Local Area Network) is a computer network that only covers a small area in one location (campus LAN, office LAN, boarding house LAN, etc.), WAN (Wide Area Network) is a computer network whose operational area is larger than LAN, because it connects between LANs, for example the relationship between the head office and branch offices outside the city, etc.

Just a note:
WiMAX, this coverage is wider than LAN / WLAN, because it already covers inter-regional / city areas, previously in Jogja there were several people who wanted to develop it, but because there was already 4G, the project was not continued.

2.3 Parts and Characteristics of the Internet

A typical portion of the Internet

2.4 Characteristics of the Internet

I will try to explain, as you look at the image above - "A typical portion of the Internet". Here are some important points:

So the internet has a very broad scope, consisting of various elements of different nature or type / diverse.
Allows for sending messages/emails, transferring data or files, multimedia communication (video calls, voice calls, live streaming, etc.) and WWW (surfing, searching, browsing, downloading, uploading, etc.).
Open-ended (outspoken).
Typical computer networks such as LAN, WAN, WLAN, etc. to be able to communicate to the Internet (as the widest Jarkom typical), then they can use; ISP, modem (dial-up / ADSL), or Satellite.
Meanwhile, for LAN to be able to communicate to intranet, then it can use backbones (Ethernet, Local Talk, Token Ring, FDDI, or ATM). An example of an intranet is like a small office, factory or university communication network and the intranet connects between departments.

Just a note:
Not all computers have the same platform / operating system, there are Linux, Unix or Windows. Then can they communicate with each other? let's take an example of their communication is data transfer via FTP (port21) or SSH (port22). Their mediation is Transaction Middleware which already includes webservices such as ASP.NET, JSON or XML.
In order for computers with different platforms/OS to communicate with each other, communication standardization is needed, and one of the things that is represented is web services.

2.5 Characteristics of Intranet

A typical Intranet

2.6 Characteristics of Intranet

Several LANs connected by backbones
Allows information to flow within the organization. Such as electronic data, documents, etc.
Provides services such as email server, file server, print server, etc.
Often connected to the internet via a router
Any communication that requires outbound access (meaning leaving the organization's network) is protected by a firewall.

Just a note:
Backbone is a high-speed channel or connection that is the main path in a network.
A backbone network is a network that connects several low-speed networks through a gateway. A backbone network can load up to 10 Gbps.
A firewall is a network security system, either hardware-based (for example, included in a router) or software (for example, included in an application, antivirus, or in the form of a browser plugin/extension), which uses certain rules to control traffic entering and leaving a network.

2.7 Portable and Handheld Devices

Portable and handheld devices

2.8 Mobile and ubiquitous computing

That is, computing activities that are increasingly flexible, can be done anywhere and anytime.

For example, laptops, PDAs, mobile phones, printers, home devices, all of these devices now allow for wireless connectivity. Let's take the example of a shared printer that allows anyone with any PC/device to use the printer as long as it is on the same network.

The scope includes:

WLAN (Wireless LANs)

Connectivity for devices such as laptops, PDAs (Personal Digital Assistants), mobile phones, digital cameras, etc.
For WAP (Wireless Applications Protocol).

Home Intranet

Embedded devices in the home (hi-fi, washing machines, etc.).
Universal Remote Control + communication (As in the case of "Jarvis" exhibited by Mark Zuckerberg).

2.9 Web servers dan web browsers

Web servers and web browsers

Just a note:
Actually, computer networks do not recognize domains but IP addresses, so when we access a domain / web address / web server, we are actually accessing a computer. Basically, our computers can communicate with other computers because they are connected to each other and recognize each other's IP addresses.
The analogy is as in the picture below;
\
Analogy of Communication Between Client and Server \

*IP version 4 (IPV4) has a length of 32 bits which consists of 4 segments, and each segment has a length of 8 bits.

2.10 WWW (World-Wide Web)

It is a publishing/publishing system that accesses resources or services via the internet. For example, as you access this site.

Here are the characteristics:

World-Wide sharing of resources worldwide via the internet network.
Based on technology: HTML (HyperText Markup Language), URL (Uniform Resource Locator), Client-server architecture
Open System, meaning it can be extended/expanded and can be re-applied.

3. Distributed System Challenges Triggered By:

Complexity
Size
Technological changes
Community dependency

3.1 Heterogeneity (Diversity of Types and Characteristics)

With the variety of software and hardware, standardization is needed to enable communication with each other, for example protocols, middleware, etc.
Mobile code support, for example like a virtual machine (say Java).

3.2 Transparency

Facilitating Independent Vendors
Publish key interfaces, for example CORBA.
Publishing communication mechanisms, for example Java RMI.

3.3 Security

Confidentiality (ensuring data security, or protecting against the threat of data breach / theft (hacking)). For example, Medical Records, for example, like most foreign hackers who want to attack someone, then they hack the information system of a hospital to get their target's health history. When it is known that the target has this or that allergy, or is weak in a certain organ, then a hacker can determine their main weapon (Biological weapons).
Integrity (protection against illegal changes, manipulation, or data interference). For example, party 1 sends an email to party 2, but before it reaches party 2, the email has been changed/modified by party 3 (hacker). Another example is Financial Data, your account balance increases by one zero without you topping up, that's already quite good.
The need for encryption and identity validation.

3.4 Scalability

Will it remain effective in future growth?

To answer the above challenges, it is necessary to have control over resource costs, performance, etc. For example, increasing computer growth, viewed from the web server ratio.

Evidence that distributed systems are growing

3.5 Failure handling

The ability to continue despite facing failure factors. Here are some efforts that can be made:

Detect/mask/tolerate failures.
Recovery from failure
Redundancy (file replication on different devices) / Backup. This redundancy is useful for increasing availability. It is different if we talk about redundancy with the DBMS paradigm, then it is very much prevented.

3.6 Concurrency

It is the process of executing simultaneously and sharing resources. This can be done by:

Synchronization (read "Synchronization")

It is a process that controls the running of several processes at the same time. It aims to avoid data inconsistencies due to access by several different processes (mutual exclusion), and to regulate the order in which these processes run, so that they can run smoothly and avoid deadlock or starvation.

Example:

How the synchronization mechanism works, you can see when you and other friends are commenting on your friend's status, how every change on the status page can be felt together, you can do it together, and the changes in information that you receive are also the same.

If the example above is expressed technically, the result will be more or less like this:

When you (Client) comment on your friend's status together, it means that you are accessing the Facebook Server, and Facebook (Server) tries to synchronize to serve your request, so that every change can be displayed in real time, for example when you are typing the status "typing" appears and when the comment is submitted simultaneously, for example at the same time 12:16:59, then not only one message is displayed but all of them, it's just that the order has been determined by a certain scheduling algorithm.

You can see the analogy of the Synchronization Mechanism in the following video:

https://youtu.be/W1TMZASCR-I

Inter-Process Communication (IPC) or Communication between processes

It is a method or mechanism for exchanging data between one process and another, whether the process is on the same computer or a remote computer connected via a network.

3.7 Transparency (read "Transparency")

The nature of the system is separately hidden from the user / programmer. So they do not have access to peek at the process of a system (that they use).

Example:

Network transparency, such as log on, email on the SoCS network.
When users open an online game application, for example, they are not aware whether the resources (for example enemy/opponent/enemy characters) are on the local computer (the computer used by the user) or on a remote machine (another computer on the network/server), well this is a form of transparency, where users do not need to be informed of the system process they are using.

Summary

Distributed Systems is;

Reaching the community (whether it is individuals, businesses, companies, industries).
Using various technologies (eg e-banking, sms-bangking, e-ID card, etc.)
Understand the basic concepts and important issues of their management, implemented with programming. (example: their business rules can be solved, such as e-commerce).

Just a note:
Next we were given the task to mention examples of Distributed Systems in each student's environment. After all the papers were collected, it turned out that Distributed Systems were very diverse, the examples mentioned by each student were almost different. Some of them had interesting examples, such as;
UGM SSO (Single Sign On), one account can be used to log in to many UGM services, in collaboration with Google, so it is not surprising that the way it works is similar to Google, one account can be used to access Google Drive, Blogger, Email, Youtube, etc.
UNBK (Computer Based National Examination), more info https://ubk.kemdikbud.go.id/

Reference:

Distributed System Module STMIK EL Rahma, By: Prof. Marta Kwiatkowska, School of Computer Science, University of Birmingham, Lecturer: EKO YUNIANTO, S.Kom (Github:eyeyunianto).
http://www.corba.org/
http://searchsecurity.techtarget.com/definition/firewall
http://opensource.telkomspeedy.com/repo/abba/v06/Kuliah/SistemOperasi/BUKU/SistemOperasi-4.X-2/ch23s03.html
https://id.wikipedia.org/wiki/Komunikasi_antar_proses
https://id.answers.yahoo.com/question/index?qid=20101002045017AAJh0t8
http://www.linfo.org/network_transparency.html

Principles of Distributed System Communication

Communication is required for a variety of tasks in distributed systems. Communication primitives greatly affect the ease and efficiency of applications.

1. Addressing and Naming

A name is an identifier that can be interpreted by a user or program. An identifier or address is one that can be interpreted by a machine. The name is translated into a form to perform the action on the resource referred to.

Naming and addressing of resources/objects in distributed systems must be global, independent of object location, scalable to large systems, and translated efficiently so as not to degrade system performance.

Addressing

On the Internet (TCP/IP), an identifier consists of two parts, namely:

Host identifier, called IP address contains four octets (bytes)
Port number (2 byte integer) identifies a specific communication port.

How to Implement Addressing:

Installed on client code directly
There is a name server, the name server looks for an address for a particular name.

2. Primitive Communication

Communication networks provide a means of sending streams of data bits. Communication primitives are high-level constructs, used by programs to communicate over a communication network. Communication primitives greatly affect the usability and efficiency of an application.

Communication primitives can be categorized into the following:

Primitive two-way communication
Primitive group communication

Two Party Communication

There are two popular primitive models of two-party communication, namely:

Model massage-passing
Remote Procedure call (RPC)

Model Massage-Passing

Communication services are reduced to two system calls, namely:

Send (destination, message)
Receive (sourceAddress, message)

Send (Destination, Message)

Sending a message to the destination process.

Receive (SourceAddress, Message)

Causes the caller to wait for a message from the source address. When the message arrives, it is copied to the process buffer.

Problem

Lost messages on the network.
Authentication, ensures that communication is truly authorized.
Performance when message passing is applied to processes on the same machine.

Summary of alternative designs and implementations of massage-passing

3. Table of Message Passing System Design Problems

Problems in designing and implementing message passing systems:

1. Addressing issues, alternatives

Direct addressing
Sender process
The receiving process has alternatives.

Explicit addressing
Implicit addressing

Indirect addressing

Static addressing
Dynamic addressing
Ownership addressing

2. Process synchronization problems

In the sending process

Blocking, that is, after sending, the process is blocked waiting for a response from the process that was sent.
Non-blocking, that is, after sending, the process continues executing the next instructions.

In the receiving process (receive)

Blocking, that is, after stating that it has received (receive), the process is blocked waiting for a message to be sent from the desired process.
Non-blocking, that is, after declaring receive, the process continues executing the next instructions.
The problem at the receiver is the way the message arrival check is implemented.

3. Buffering issues

Bufferred primitives
Unbuffered primitives

4. Reliability issues

Reliable primitives
Unreliable primitives

5. Message format issues

Message content

message length

Fixed length messages
Message length can vary

6. Queueing discipline issues

FIFO
Priority

Message passing has many advantages, including the ability to be transparent to computer systems, both uniprocessors with shared memory and distributed systems.

Remote Procedure Calls

The purpose of RPC is to call remote procedures (executed on another machine) just like calling local procedures. The programmer writes the program as usual without caring whether the procedure will be executed on the local processor or on the remote processor.

RPC allows a process on one system to call a procedure in another process on another machine. The calling and called procedures can reside on separate machines with separate memory address spaces. Since there is no shared global memory as in a single process, information is transferred via the parameters of the RPC call.

RPC is usually implemented in a message passing system. Calls to procedures on other machines are translated into message passing, namely:

Send (RemoteProses, InputParameters)
Receive (RemoteProses, OutputParameters)

RPC Mechanism

Figure 1 RPC Mechanism Schematic

In the picture, a remote call consists of 10 steps:

Step 1

The client program (procedure) calls a stub linked in the program's address space. Parameters are passed as in a local call. The client stub packs the parameters into a message, called parameter marshalling.

Step 2

The message is passed to the transport layer for transmission. On many systems, step 2 is a trap to the operating system.

Step 3

On a connectionless LAN, the transport entity simply attaches a header to the message and places it on the network. On a WAN, the transport layer is more complicated.

Step 4

When a message arrives at the server, the transport entity passes it to a stub server which unmarshalls the message into parameters.

Step 5

The server stub calls the server procedure, passing parameters in the standard way. The server procedure is unaware that it is being invoked remotely because the caller obeys all the standard rules of local procedures. Only the stub knows.

Step 6

After completing its work, the server procedure completes just as any other procedure completes. The server can send results to the caller.

Step 7

The stub server marshals the results as messages and releases them to the transport layer with a system call like step 2.

Step 8

The answer arrives at the client machine

Step 9

Answers handled by stub client

Step 10

The client stub returns to the caller, the client procedure. A value sent by the server in step 6 is given to the client.

The purpose of the RPC mechanism is to give the illusion to the client procedure that it is not making a call to a remote server. The client does not know that it is a remote server that is serving it. RPC hides all network communication in a stub procedure. This protects application programs, clients and servers from having to worry about socket and other network details, making it easier to write distributed applications.

Source of the problem

Since the calling and called procedures are on different machines, the procedures are executed in different address spaces.
Parameters and results must be passed (over the network), more complicated if the machines are not identical.
The machine on which the calling and called procedures are executed can crash and failures can cause complex problems.

RPC Discussion Points

pelewatan parameter (parameter passing)
binding
transport protocol selection
error handling (exception handling)
semantics of calling
data representation
orphan handling
performance
security
implementation specific issues

Pelewatan Parameter (Parameter Passing)

Call-by-value can be implemented easily, by sending a copy of the data via message.
Call-by-reference is a bit of a pain. There are many proposed solutions to the call-by-reference problem.

Enhancement (Binding)

Communication target enhancement (server executing the procedure). There are two binding components:

Find remote host as target server
Finding matching server processes on host

One way is to provide a central database. Clients contact a central authority to find a particular service. Another way is for clients to know which host to contact.

Transport Protocol Selection

Transport protocols can vary, including:

SUN RPC uses UDP and TCP, connection-oriented or connectionless.
Xerox Courier RPC menggunakan SPP, connection-oriented
Apollo RPC uses UDP, DDS, connectionless.

When connectionless is used, the client stub must handle lost packets and so on. Connection-oriented protocols handle cases of lost, duplicated packets and so on, with higher overhead than connectionless.

Error Handling (Exception Handling)

Common sources of errors are:

Client cannot find server location.
The request message from the client to the server was lost.
The reply message from the server to the client is lost.
Server crashed after receiving request.
Client crashes after sending request message.

Semantics of Calling

Alternative meanings of failure (semantics of failure):

At least once, ensures that the server crash has executed once or more.
At most once, no call is executed more than once.
Maybe, the system doesn't guarantee anything, it's the easiest to implement.

Representation Data

Different engines use different internal representation conversions.

Orphan Handling

A client crash after starting an RPC can cause problems. Once a server procedure is started, the process continues to run even if the calling client crashes. A server that is running without any waiting clients is called an orphan.

Difficulties caused by orphans:

Wasting CPU cycles and other resources.
Orphans may lock files and other objects so that the server denies access to them by other processes.
If the client reboots and starts the RPC over and over again, the orphan results sent can cause havoc.

Handling orphan problems:

Extermination
Expiration
Reincarnation
Gentle reincarnation.

Extermination

When the client machine recovers from a crash, check for RPC continuity. If it still continues, request to kill running processes as needed. The client stub needs to log RPCs before execution. When the RPC completes, pending RPC log entries are cleared. The log must be stored so that it survives a crash (e.g. on disk).

Orphans can create grand orphans (even great grand orphans). The extermination algorithm must run recursively killing all descending chains.

Expiration

No log required. When an RPC is started, the server is given a certain amount of time to complete the call. If the call does not complete within a certain interval, the server stub asks the client stub for a new quantum. If the client machine has crashed or rebooted, this event is detected, the timer is not updated, and the server is allowed to leave.

Reincarnation

Extermination can fail to eliminate orphans. For example, if the network is partitioned when extermination is performed, some orphans are unreachable. It is a big mistake to kill all remote activity on the network.

In the reincarnation method, time is divided into sequel epochs. When a client fails to exterminate its orphans, the client broadcasts the start of a new epoch to all machines. The machines react by killing all server processes.

Getle Reincarnation

Similar to the third approach, but less deadly.

This method also uses epochs but each machine does not kill all remote activity when a new epoch is declared, it tries to find the client that started the server. If that client cannot be found, the server kills the server process for that client.

Performance

Generally the performance degradation of RPC compared to local Procedure call is a factor of 10 or more.

Security

RPC introduces security issues such as remote command execution. Allowing a program to call a procedure on a remote server is similar to executing a command on that system.

Implementation Specific Issues

ACK implementation
Critical path mitigation
Copying
Timer management

3. Group Communication

System message passing and RCP are two-way communication, client and server. In distributed systems, communication can involve many processes at once, not just two. For example, a group of file servers provide file services and to provide fault tolerance capabilities. In a fault-tolerant system, the client sends messages to all servers to ensure service by at least one server. Distributed systems require a group communication mechanism.

RCP cannot handle one sender to many recipients communication except with many separate RCPs for each server.

A group is a collection of processes acting together. The main property of group communication is that when a message is sent to a group, all members of the group receive it. One-to-many communication (one sender, many receivers) is different from point-to-point communication (one sender, one receiver).

The main purpose of groups is to allow processes to deal with a collection of processes as a single abstraction. Processes can send messages to a group of servers without knowing the number of servers or their locations, which can change at any time.

Group Management

The group is dynamic, namely:

New groups can be created
Old groups can be deleted
Processes can join a group or leave a group.
Prosea can be a member of several groups at one time.

For this, group management and group membership are required.

Group management issues include:

Basic operations of group management.
Closed or open group selection
Selection of groups at the same level or hierarchy
Handling group membership
Handling group addressing issues.
Handling the problem of overlapping groups.
Handling scalability issues.

Basic Operations of Group Management:

New group creation operation
Group delete operation
Join operation to a group
Operation leaving a group

Closed or open group selection

A closed group means that only members of the group can send messages to the group. Processes outside the group cannot send messages directly to the group as a whole, although they can send messages to individual group members.

Typically used for parallel processing that has a purpose and does not interact with external processes.

An open group means that any process can send to a group.

Used when a process outside the group sends to the group.

Selection of Groups of the Same Level or Hierarchy

A co-level group means that the processes in the group are considered to be at the same level. In the hierarchy there are process ranks, such as one process is a coordinator and the others are workers. The coordinator decides which worker handles the request.

A single-level group is symmetric, there is no single point of failure. Decision making is more complex, there is more processing overhead.

In a hierarchical group, the failure of the coordinator causes the entire group to die.

Handling group membership

The simplest solution is to create a single group server for group management. This method has the disadvantage of centralization, namely a single point of failure.

The opposite is in a distributed manner. In this case, the client sends a message to all group members informing them of its presence.

Two problems:

group member crashes
if many machines are down and the group is no longer able to operate at all.

Group Members crash

Different solutions for system types based on:

asynchronous
Sync

Asynchronous

Another process that knows the process has crashed notifies the system. Once the crash is confirmed, the process is removed from the group member list.

Sync

Leaving and entering a group is done synchronously.

If Many Machines Go Down and the Group Is No Longer Able to Operate at All.

A protocol is needed for group rebuilding.

Group Addressing Problem Handling

If the network supports multicasting, then the group address is associated with a multicast address so that any message sent to the group address can be multicast. Messages are sent only to the machines that need them.

If the network only supports broadcast, then the message can be broadcast.
If the network only supports broadcasts, then the kernel sends to the machines in the list of machines that are included in the target group. The kernel sends one by one.

Handling the Problem of Overlapping Groups.

Systems that allow overlapping group membership can result in inconsistencies if a message addressed to multiple groups is simultaneously received by processes belonging to multiple groups.

Handling Scalability Issues.

A good algorithm must be used for both small and large groups so that when the system develops it is not needed in its entirety.

Types of Broadcast

Group communication is easiest to do with broadcast. Broadcast is a direct message that can be sent and distributed to all active system components.

There are various types of broadcasts, namely:

Reliable broadcast
fifo broadcast
Causal broadcast
atomic broadcast
Fifo atomic broadcast
Causal atomic broadcast

Reliable broadcast

Three reliable broadcast properties, namely:

if one message correctly broadcasts message m, then all processes correctly receive the delivered message. (Validity property).
If one true message obtains message m, then all true processes have message m. (property agreement).
for any message m, each process correctly obtains message m exactly at most once, and only if message m was previously broadcast by the sender of message m.

Reliable broadcast is the weakest type of broadcast. This type may be sufficient for some applications. Reliable broadcast does not guarantee the order in which messages are delivered. In some applications the order of messages is important. Message ordering is guaranteed in FIFO broadcast.

FIFO Broadcast

FIFO broadcast ensures that messages broadcast by the same sender are delivered in the order in which they were sent.

FIFO order meaning

If a process broadcasts a message m before that process broadcasts a message m, then no correct (non-failing) process can obtain m' unless it previously obtained m.

Causal Broadcast

Causal Broadcast is more powerful than FIFO broadcast, namely messages are delivered following the causal precedence relationship (Causal Broadcast). An important concept in distributed systems. If a process delivers a message m and then broadcasts m which can depend on the previous delivery of message m, then Causal Broadcast guarantees to deliver message m before delivering message m. Causal Broadcast allows a process to deliver causally unconnected messages in different orders.

Causal Broadcast

If a broadcast of message m causally precedes a broadcast of message m, then no message can receive message m unless it previously received message m.

Atomic Broadcast

is to broadcast all messages in total order.

Total sorted meaning (total arder)

If processes p and q both deliver messages m and m , then p delivers m before m if and only q delivers m before m.

Fifo Atomic Broadcast

FIFO automatic broadcast is a FIFO broadcast whose messages are completely ordered.

Causal Atomic Broadcast

Causal atomic broadcast is a causal broadcast whose messages are completely ordered.

Broadcast Types Connectivity

Broadcast Types Relationship Schema

So that:

Reliable broadcast = balidaty + agreement + integrity
FIFO broadcast = reliable roadcast + FIFO order
Causal broadcast = reliable roadcast + causal order
Atomic broadcast = reliable roadcast + total order
FIFO atomic broadcast = reliable roadcast +FIFO order + total order
Causal atomic broadcast = reliable roadcast + causal order + total order.

Causal atomic broadcast satisfies the causal order and total order requirements, stronger than FIFO atomic broadcast. Causal atomic broadcast is the basic mechanism of the state-machine approach to fault-tolerant systems.

Timed Broadcast

Many applications require the following message characteristics. When a message is delivered at all, it is delivered within a finite time after the message is broadcast. This property is called Δ Timeliness.

In distributed systems, the time spent can be interpreted in two different ways, namely:

real time measured by an external observer
local time is measured by the local clock of the processes

(Real-time)Δ - Timeliness Broadcast

There is a constant Δ such that if a broadcast of message m is initialized at real time t, no correct process will receive message m after real time t + Δ

(Local-time)Δ - Timelines Broadcast

There is a constant Δ such that if a broadcast of message m is initialized at real time t, no valid process p will receive message m after local time ts(m) + Δ on p's clock.

In asynchronous systems, there is no reliable broadcast algorithm that satisfies real-time Δ - timeliness or local-time Δ timeliness. Timed broadcast cannot be implemented.

In a synchronous system, reliable broadcast can be implemented that satisfies local-time timeliness. No reliable broadcast can satisfy real-time Δ timeliness in a system that has timing failure.

4. Uniform Broadcast

Uniform agreement

If one process (either the correct one or the failed one) delivers message m, then all correct processes receive message m.

Uniform Integrity

For any message m, each process (either correct or failed) delivers message m at most once, and only if a process broadcasts message m.

Uniform Local-Time Δ - Timeline

There is a known constant Δ such that no process p (either correct or failed) delivers message m after local time ts(m) + Δ at clock p.

Uniform FIFO Order

If one process broadcasts message m before a process broadcasts message m', then no process (either correct or failed) delivers m', unless the previous process obtains m.

Uniform Causal Order

If broadcasting message m precedes broadcasting message m', then no process (correct or failed) delivers m before m if and only if q delivers m.

Uniform Total Order

If any p and q (true or failed) both deliver messages m and m, then p delivers m before m if and only if q delivers m before m.

Uniform timed reliable broadcast satisfies the properties of validity, uniform agreement, uniform integrity and uniform Δ -timeliness. This type is used to solve the non-blocking atomic commitment problem.

Terminating Reliable Broadcast (TRB)

TRB is a type of broadcast in which correct processes always get the message -- even if the sender fails, or crashes before broadcasting. The message delivered can be a special message SF € M that marks the sender as correct. The set of messages that can be delivered is MU {SF}.

The TRB property is similar to reliable broadcast, except for the addition of a termination property:

every process correctly receives a message. (Termination property).
if one message correctly broadcasts message m, then all processes correctly receive the delivered message. (property validity)
if one true message obtains message m, then all true processes m obtain message m.(Agreement property).
for any message m, each correct process obtains message m exactly once, and if the obtainment of message m =SF then the sender broadcasts message m.

TRB has been studied extensively in the case of arbitrary failure in the Byzantine Agreement Problem or Byzantine Generals Problem.

TRB means the process correctly obtains the SF message only if the sender experiences a failure.

Meeting

In TRB, a single process is assumed to broadcast a message and all valid processes must agree on that message. In consensus problems, all valid processes propose a value and must agree on a value related to the proposed value.

Primitive in consensus issues.

Propose (v)
Decide (v)

Where v is a value.

Propose (v)

When a process executes propose(v), it proposes v.

Decide (v)

When a process executes decide(v), it decides on v.

V is the set of all possible proposed values. The set of all possible decided values is VU {Nu}, where NU E V. NU is a special value that indicates that not all processes propose the same value. NU stands for no unanimity.

Consensus problem specifications:

Every true process decides on a value. (Terminator property).
If all processes that propose a value v. Propose v, then all processes correctly decide the value of v. (Validation property).
If one process correctly decides on the value of v , then all processes correctly decide on the value of v. (Agreement property)
Each process correctly decides at most one value, and if a process decides v = NU then a process must have proposed v.

Stronger properties

Strong validity

If all processes correctly propose a value of v, then all processes correctly decide on the value of v.

Strong integrity

Each process correctly decides at most one value, and if a process decides on a value v, then a process must have decided on a value v.

The correct process never decides NU. Even though there is NU in the proposed value.

Impossibility of consensus in asynchronous systems, there is no deterministic algorithm that can solve the consensus problem in asynchronous systems and tolerate even a single crash failure.

Implementation of Group Communication

Primitive implementation of group communication:

Multicasting
Broadcasting
Unicasting

If the network does not provide a multicasting or broadcasting mechanism, then group communication is implemented by sending separate packets to each member. Although inefficient, this implementation can still be done especially in small groups. Sending from one sender to one receiver is called unicasting.

Options in designing group communication:

Selection of blocking or non-blocking primitives.
Selection of buffered and unbuffered primitives.

Understanding Distributed System Architecture

Almost all large-scale computer-based systems are distributed systems. A distributed system is a system in which information processing is distributed across multiple computers rather than confined to a single machine. Obviously, distributed engineering has much in common with software engineering, but there are special issues that must be taken into account when designing this type of system. The software engineer must be aware of the importance of taking these into account, because distributed systems are so widely used. Until recently, most large systems were centralized systems running on a mainframe with terminals connected to it. This system had the disadvantage that the terminals had little processing power and were all dependent on the central computer.

Until now the main system still consists of:

A Personal System that is not distributed and is designed for a single workstation.
Embedded systems that run on a single processor or on an integrated group of processors.
Distributed Systems where system software runs on a loosely integrated group of cooperating processors, connected by a network. Examples include bank ATM systems, groupware systems, etc.

According to Coulouris et.al (1994), 6 characteristics that are important for distributed systems are identified, namely:

Sharing of resources
Openness. The openness of the system is open to many operating systems and many vendors.
Concurrency. Distributed systems allow multiple processes to operate at the same time on multiple networked computers. These processes can (but do not need to) communicate with each other during their normal operations.
Scalability. Distributed systems can be scaled by upgrading or adding new resources to meet the needs of the system.
Fault tolerance. Distributed systems are tolerant to some hardware and software failures and degraded services can be provided when failures occur.
Transparency. Distributed systems are open to users.

Apart from these things, there are also weaknesses of distributed systems, namely:

Complexity. Distributed systems are more complex than centralized systems.
Security. Distributed systems can be accessed from multiple computers and network paths are easily eavesdropped, making distributed system network security a major issue.
Controllability. Computers in a distributed system can be of different types and may run on different operating systems. A failure on one machine can affect the others, so it takes a lot of effort to control them.
Unpredictable. Distributed systems are unpredictable in their response. The response depends on the total system load, its organization, and its network load.

There are 2 generic types of distributed system architecture, namely:

Client Server Architecture. The system is considered as a set of services provided to clients. Servers and clients are treated differently.
Distributed Object Architecture. There is no distinction between server and client, the system can be a set of interacting objects whose binding is irrelevant. There is no distinction between service providers and service users.