This transparency is like a centralized DBMS. In a distributed environment, a system does not have to experience degradation during the implementation of a distributed architecture, for example the emergence of a network. This transparency requires the DBMS to create a strategy to save the cost incurred to perform a request.
In a centralized DBMS, the query processor (QP) must evaluate each data request and execute an optimal strategy, which consists of a sequence of operations ordered on the database.
In a distributed environment the distributed query processor (DQP) maps a data request into a sequence of ordered operations on the local database. This has the added complexity of accessing it in terms of fragmentation, replication and allocation schemes. The DQP must decide:
- Which fragment to access
- Which copy of the fragment to use if the fragment is to be replicated
- Which location to use.
DQP creates an optimal execution strategy by running several cost functions. In general, the costs associated with a distributed request include:
- Access time (I/O) costs involve accessing physical data on disk.
- CPU time cost when executing data operations in main memory
- Communication costs with data transmission over the network.
The first factor is the only one considered in a centralized system. In a distributed environment, the DDBMS must account for communication costs, which are most significant in WANs with a bandwidth of the order of small kilobytes per second. In such cases, optimization may ignore I/O and CPU costs. However, LANs have bandwidths that make it impossible to ignore I/O and CPU costs entirely.
One approach to query optimization minimizes the total cost of the time that will be incurred in executing the query (Sacco and Yao, 1982). As an alternative approach, this can minimize the query response time, in the case of DQP. Sometimes the response time will be significantly less than the total time cost.
Understanding Transaction Transparency
This transparency in the DDBMS environment ensures that all distributed transactions maintain the consistency and integrity of their distributed database. Distributed transactions access data stored in more than one place. Each transaction is divided into several subtransactions, one to access the site that must be accessed; a subtransaction is represented by an agent.
DDBMS Transaction Transparency
Example:
There is a transaction T that prints the names of all staff, using the fragmentation scheme defined as S1, S2, S22, and S23. Subtransactions can be defined as TS3, TS5, and TS7 to represent agents located at locations 3, 5, and 7. Each subtransaction prints the names of the staff at each location.
| Time | TS3 | TS5 | TS7 |
|------|------------------------|------------------------|------------------------|
| t1 | Begin transaction | Begin transaction | Begin transaction |
| t2 | Read(fname,lname) | Read(fname,lname) | Read(fname,lname) |
| t3 | Print (fname,lname) | Print (fname,lname) | Print (fname,lname) |
| t4 | End_transaction | End_transaction | End_transaction |
The unity of distributed transactions is the basis of the transaction concept, but the DDBMS must also ensure the unity of each subtransaction. Therefore, it must not only ensure the synchronization of subtransactions with other local transactions that are operated simultaneously at a location. But also ensure the synchronization of subtransactions with global transactions that run simultaneously at the same location or at different locations. Transaction transparency in a distributed DBMS is complemented by fragmentation charts, distribution charts and replication charts.
1. Concurrency Transparency
Concurrency transparency is owned by DDBMS if the results of all concurrent transactions (distributed or not distributed) are executed independently or at one time and guarantee the resulting data is consistent and updated correctly, this is in accordance with the basic principles of a centralized database but there is an addition because the form of DDBMS must guarantee that local or global transactions do not conflict with each other. In the same way, DDBMS must ensure the consistency of all global subtransactions.
Replication makes concurrency more complex. When a copy of a replicated data is updated, the latest update must be propagated as quickly as possible to all copies. The strategy is to propagate each data change into a single operational data unit of a transaction. However, if one of the sites holding a copy of the data cannot be reached while the update is being performed, because the site or its communication link is failing, then the transaction is delayed until the site can be reached. If there are many copies of a data item, it is likely that concurrent transactions will fail.
Another alternative to limit this is to update data only for the site that currently exists. The next strategy allows updates to copies of the data that are not done simultaneously, sometimes after the original database has been updated. The delay to regain consistency of the data can be anywhere from a few seconds to several hours.
2. Failure Transparency
Centralized DBMS has the ability to recover data used in the event of a transaction failure. The types of failures that centralized DBMS has are: system crash, media error, software error, natural disaster and sabotage. DDBMS also has the following types of failures:
- Data loss
- Communication failure
- Failure on site
- Network partition.
DDBMS must ensure the unity of the global transaction, meaning that all subtransactions in the global transaction are either successful or aborted. Therefore, DDBMS must equate global transactions to ensure that all subtransactions have succeeded before they are recorded as SUCCESSFUL / COMMIT.
3. Transaction Classification
Before completing the explanation of transactions, a brief explanation will be given regarding the classification of transactions that have been defined in IBM's distributed relational database architecture (DRDA). In this architecture there are four types of transactions, each level has an additional complexity in the interaction with the DBMS.
a. Remote access request
An application at one location can send requests (SQL commands) to multiple remote locations to execute the data submission. The request is executed as a whole at that location and can become reference data at those remote locations.
b. Remote Unit of Work
An application at one location can send all SQL commands in a work unit (transaction) to several remote locations for execution. All SQL commands are executed entirely at the remote location and only become reference data at that location. However, the local site decides which transactions to commit and which to rollback.
c. One distribution work
An application at one location can send part or all of a request (SQL command) in a transaction to one or more remote locations to execute the data delivery. The request is executed in its entirety at those locations and can become reference data at those remote locations.
d. Distributed Requests
An application at one location can send part or all of a request (SQL command) in a transaction to one or more remote locations to execute the data delivery. However, SQL commands require access to data from one or more locations (SQL commands need to be able to join or union a relation/fragment that is in different locations).
Understanding DBMS Distribution Transparency
Distributed transparency allows users to see the database as a single logical entity. If a DDBMS exhibits distributed transparency, users do not need to know about the fragmentation of their data or the location where the data is stored.
Database Management System Distribution Transparency
There is a transparency that allows users to know whether data has been fragmented and stored somewhere in a location, the name of this transparency is: Transparency Local Mapping.
Example:
- S1 = ∏ staffno, position, sex, DOB, salary (STAFF) placed at site 3
- S2 = ∏ staffno, fname, lname, branchno, sex, DOB, salary (STAFF)
- S21 = σ Bno = B3 (S2) is placed at site 3
- S22 = σ Bno = B5 (S) is placed at site 5
- S23 = σ Bno = B7 (S) is placed at site 7
1. Fragmentation Transparency
Fragmentation is the highest level of distributed transparency provided by a DDBMS, so that users do not need to know about the fragmented data. Database access is based on its global schema, so users do not need to specify fragment names or data locations.
Example:
Select fname, lname From Staff
Where position = 'Manager';
This is the SQL statement that must be written on a centralized system.
2. Location Transparency
Location transparency in the transparency distribution is at the medium level. With this transparency, the user knows the data is fragmented without having to know where the data is located.
Example:
SELECT fname, lname FROM S21
WHERE Staffno IN (SELECT Staffno FROM S1 WHERE Position='Manager')
UNION
SELECT fname, lname FROM S22
WHERE Staffno IN (SELECT Staffno FROM S1 WHERE Position='Manager')
UNION
SELECT fname, lname FROM S23
WHERE Staffno IN (SELECT Staffno FROM S1 WHERE Position='Manager');
Now it is necessary to name the fragment in the query. Also used join (subquery) because the position and fname or lname appear in several different vertical fragmentation. The main advantage of location transparency is that the database can be physically organized without having to affect the applications that access the database.
3. Replication Transparency
Same as location transparency is transparency to duplicate a data, meaning the user does not know the data has been fragmented. This transparency is a result of location transparency. However, it is possible to not have location transparency but have replication transparency.
4. Local Mapping Transparency
This is the lowest level of transparency distribution. With this transparency, the user needs to specify the fragment name and location of the data items.
Example:
SELECT fname,lname FROM S21 AT SITE 3
WHERE Staffno IN (SELECT Staffno FROM S1 AT SITE 5 WHERE Position='Manager')
UNION
SELECT fname,lname FROM S22 AT SITE 5
WHERE Staffno IN (SELECT Staffno FROM S1 AT SITE 5 WHERE Position='Manager')
UNION
SELECT fname,lname FROM S23 AT SITE 7
WHERE Staffno IN (SELECT Staffno FROM S1 AT SITE 5 WHERE Position='Manager');
5. Naming Transparency
Each item in a distributed database has a unique name. Therefore, the DDBMS ensures that no two sites create database objects with the same name. One solution to this problem is to create a centralized name server, which contains all the names of the system so that if there are any duplicates, they can be detected.
However, this problem has obstacles, namely:
- Lack of local autonomy capabilities
- Performance issues, if centralized then a bottleneck will occur
- Low availability, if the central site fails, other sites cannot create other database objects.
There is an alternative solution, namely by using the 'prefix' of an object as the location identifier that created the object. For example, the Branch relation is created at site S1 so that the object can be named S1.Branch.
However, if you want to identify each fragment and each copy of the fragment, you can create S1.Branch.F3.C2, which contains 2 copies of fragment 3 in the Branch relation created at site S1. However, this will result in data loss in distributed transparency.
Another approach is to use aliases (synonyms) for each database object. For example, S1.Branch.F3.C2 is known as the Localbranch used by users at site S1. The DDBMS has the task of mapping the aliases to the appropriate database objects.
The distributed R* system distinguishes between its printname objects and its system wide-names. A printname is the name that a user uses to refer to an object. A system wide-name is a unique internal identifier for an object that is guaranteed never to be changed. A system wide-name consists of four parts:
- Creator ID - A unique identifier location for the user who created the object.
- Creator site ID - a unique global identifier for the site where the object was created.
- Local name - a name that does not meet the requirements for an object.
- Birth-site ID - a unique identifier for the site where the object is stored.
For example, the system wide-name: Manager@London.localbranch @glasgow Represents an object with the local name localBranch, created by the Manager user in London and stored at the site in Glasgow.