There are 4 components in the DDBMS architecture, namely:
- Local DBMS components
- Data Communication (DC) Components
- Global Systems Catalog (GCS)
- Distributed DDBMS Components.
1. Local DBMS Components
This LDBMS component is a standard component of DBMS, which has the responsibility to control local data at each location that already has a database. This means that each location has its own SGC which contains all information about the data. In a homogeneous system, the LDBMS component has the same system product that is replicated at each location. And in a heterogeneous system there will be two locations with different DBMS products or DBMS forms.
2. Data Communication Components
This component is software and hardware that allows all locations to communicate well with each other. The data communication component contains information about the site and its network.
There are 4 components in the DDBMS architecture, namely:
- Local DBMS components
- Data Communication (DC) Components
- Global Systems Catalog (GCS)
- Distributed DDBMS Components.
1. Local DBMS Components
This LDBMS component is a standard component of DBMS, which has the responsibility to control local data at each location that already has a database. This means that each location has its own SGC which contains all information about the data. In a homogeneous system, the LDBMS component has the same system product that is replicated at each location. And in a heterogeneous system there will be two locations with different DBMS products or DBMS forms.
2. Data Communication Components
This component is the software and hardware that allows all locations to communicate well with each other. The data communication component contains information about the site and its network.
3. Global System Catalog (GCS)
GCS has similar functions with the catalog system in centralized. GCS handles specific information about the distribution of a system, such as fragmentation, duplication and allocation. This component can manage itself such as distributing databases and fragmentation, overall replication or centralization. In GCS that performs overall replication guarantees the autonomy of each site, such as making modifications must be notified to all connected sites. Centralized GCS also promises autonomy for its sites and is very sensitive to errors at a site.
This approach is used in the distributed system R* (Williams et al,1982). In this system there is a local catalog at each site that consists of meta data related to the data stored. For the relationship stored at multiple sites, it is the responsibility of each local catalog to record the definition of each fragment and each replica of each fragment and to record where the fragment or replica is allocated. Whenever a fragment or replica is used at a different location, the local catalog must always be updated with the changes, so that the fragment or replica can be relied upon.
4. Distributed DBSM Components
DDBMS components are control units in all systems.
3. Global System Catalog (GCS)
GCS has similar functions with the catalog system in centralized. GCS handles specific information about the distribution of a system, such as fragmentation, duplication and allocation. This component can manage itself such as distributing databases and fragmentation, overall replication or centralization. In GCS that performs overall replication guarantees the autonomy of each site, such as making modifications must be notified to all connected sites. Centralized GCS also promises autonomy for its sites and is very sensitive to errors at a site.
This approach is used in the distributed system R* (Williams et al,1982). In this system there is a local catalog at each site that consists of meta data related to the data stored. For the relationship stored at multiple sites, it is the responsibility of each local catalog to record the definition of each fragment and each replica of each fragment and to record where the fragment or replica is allocated. Whenever a fragment or replica is used at a different location, the local catalog must always be updated with the changes, so that the fragment or replica can be relied upon.
4. Distributed DBSM Components
DDBMS components are control units in all systems.
Distributed Relational Database Design
The recommended factors to use in distributed databases are:
- Fragmentation, a relation that is divided into several sub-relations called fragments, so it is also called distribution. There are two fragmentations, namely horizontal and vertical. Horizontal fragmentation is a subset of tuples while vertical fragmentation is a subset of attributes.
- Allocation, each fragment is stored at a site with optimal distribution.
- Replication, DDBMS can create a copy of a fragment at several different sites.
- The definition and allocation of fragments should be based on how the database is used. The design should be based on quantitative and qualitative information. Quantitative information is used in data allocation while qualitative information is used for fragmentation.
Quantitative information includes:
- How often the application is run
- Which site the application is running on
- Performance criteria for transactions and applications.
Qualitative information includes transactions executed on the application, including access to relations, attributes and tuples, access types (R or W) and predicates of operations.
Definition and allocation of fragments using strategies to achieve the desired objectives:
1. Local References
If possible, data should be stored close to the user. If a fragment is used in multiple locations, it is beneficial to store that fragment of data in multiple locations as well.
2. Improved Reliability and Availability
Data reliability and availability are improved with replication. There are other copies stored in other locations.
3. Performance received
Poor allocation can result in bottlenecks occurring, resulting in many requests from several locations that cannot be served and the requested data being out of date, causing performance to decrease.
4. Balance between storage capacity and cost
Consideration must be given to the availability of infrastructure and costs for storage at each location, so that for efficiency, inexpensive storage can be used.
5. Minimal communication costs
Consideration must be given to the cost of remote access. Costs are minimal when local demand is high or when each site is replicating its own data. However, when the replicated data is updated, the updated data must be duplicated across all sites, which increases communication costs.
DBMS Data Allocation
There are four strategies according to data placement, namely centralization, division or partition, complete replication and selected replication.
1. Centralization
This strategy contains a single database and DBMS stored at a single site with users distributed over the network (distributed processing). The lowest local references at all sites, except the central site, must use the network to access all data. This also means high communication costs.
Low reliability and availability, errors on the central site will affect all database systems.
2. Partition (Fragmentation)
This strategy partitions the database into fragments, where each fragment is allocated to a single site. If the data is located at a site where it is frequently used, local references will increase. However, there will be no replication, and the storage cost is low, so the reliability and availability are also low, although distributed processing is better than centralized.
There is one advantage to centralization, namely in the case of data loss, the loss is only on the site concerned and the original is still in the central database. Performance should be good and communication costs low if the distribution is designed in such a way.
3. Complete replication
This strategy involves maintaining a complete copy of a database at each site. Where local references, availability, reliability and performance are maximized. However, the storage cost and communication cost of updating are very high. To overcome this problem, snapshots are usually used. Snapshots are used to copy data at a specified time.
The copied data is the result of updates per period, for example per week or per hour, so the copied data is not always up to date. Snapshots are also used to implement table views in distributed data to improve the time used for operational performance of a database.
4. Selective replication
A strategy that is a combination of partitioning, replication and centralization. Some data items are partitioned to get high local references and others, which are used in many locations and are not always updated are replicated; otherwise centralization is done. The objective of this strategy is to get all the advantages of all strategies and not their weaknesses. This strategy is commonly used because of its flexibility.
Understanding Parallel DBMS
Parallel DBMS is a database management system designed in parallel in using processors and disks. So that the performance of the DBMS becomes better.
Parallel DBMS connects multiple small machines to produce the output of a single large machine with greater scalability and reliability of its database.
To support multiple processors with equal access to a single database, a parallel DBMS must provide management of shared resources.
What resources can be used together depends on what they are applied for. Of course, it also affects the performance and scalability of the system.
There are three architectures used in parallel DBMS, namely:
- Use of shared memory (share memory)
- Shared disk usage
- Individual use (share nothing).
Homogeneous & Heterogeneous DDBMS
A DDBMS can be classified into homogeneous and heterogeneous. In a homogeneous system, all sites use the same DBMS product. In a heterogeneous system, the DBMS products used are not the same, as well as the data model so that the system can consist of several data models such as relational, network, hierarchical and object oriented DBMS.
Homogeneous systems are easier to design and manage. This approach provides good development, does not experience difficulties in creating a new site on the DDBMS, and improves performance by exploiting the capabilities of parallel processing at several different sites.
Heterogeneous systems, resulting in several individual sites where they implement their databases and the data integration is done at a later stage. In this system, translation is needed to communicate between several different DBMS. To produce DBMS transparency, users must be able to use the programming language used by the DBMS at the local site. The system will search for the location of the data and display it as desired.
Data required from other sites may include:
- Have different hardware
- Have different DBMS products
- Have different hardware and DBMS products.
If the hardware is different but the DBMS product is the same, then what will be changed is the code and the word length. If the DBMS product is different, it will be more complex because what will be changed is the mapping process of the data structure in one data model that is the same as the data structure in another data model.
For example: relational in relational data model is mapped into several records and sets in network data model. Also required changes in query language used (Example in SQL SELECT command is mapped into network model as FIND or GET). If both are different, then these two types of changes are required so that processing becomes more complex.
Another complexity is having the same conceptual schema, which is formed from the unification of data from individual schemas in the local conceptual. To overcome this, GATEWAY is used, where this method is used to convert programming languages and data models in each different DBMS into relational languages and data models. But this method also has limitations, the first of which does not support transaction management, even for paired systems.
In other words, this method between two systems is only a query translator. For example, a system cannot coordinate concurrency control and data recovery transactions involving updates to related databases. Second, this method can only solve the problem of translating queries displayed in one language to another language that is the same.
Distributed DBMS Concept
Distributed Database
Logically, it is a collection of data that is interconnected and used together, then distributed via a computer network.
Distributed DBMS
A software system that manages distributed databases and distributes them transparently.
DDBMS has one logical database divided into several fragments. Where each fragment is stored on one or more computers under the control of a separate DBMS, by connecting the computers using a communication network.
Each site has the ability to access user requests for local data and is also able to process data stored on other computers connected to the network.
Users access distributed databases using two applications, namely local applications and global applications, so that DDBMS has the following characteristics:
- A collection of logical data that is used together
- Data is divided into several fragments
- Fragments may have copies (replicas)
- The fragment/replica is allocated to the one used
- Each site is connected to a communication network.
- Data at each site is under DBMS supervision.
- The DBMS at each site can handle local applications, autonomously.
- Each DBMS participates in at least one global application.
From the definition, the system is expected to create a transparent distribution. Distributed databases are divided into several fragments stored on several computers and may be replicated, and storage allocation is unknown to the user. The existence of transparency in distributed databases makes this system look like a centralized database. This refers to the basic principles of DBMS (Date, 1987b). Transparency provides good functionality for users but unfortunately causes many problems that arise and must be overcome by DDBMS.
Distribution Processing
A centralized database that can be accessed across all computer networks.
The main point of the distributed database definition is that the system consists of data that is physically distributed across multiple sites connected by a network.
If the data is centralized even though other users access the data via the network, this is not called DDBMS but rather distributed processing.
12 Rules of DDBMS
In this final section, we will explain the twelve rules of DDBMS (Date, 1987b). The basis of these rules is that a distributed DBMS must be as user-friendly as a non-distributed DBMS. These rules are similar to the twelve CODD rules for relational systems.
Basic principle: A DDBMS system should look like a non-distributed DBMS to its users.
1. Local Autonomy
Places in a distributed system must be autonomous. Autonomy means:
- Local data belongs to the local DBMS and is managed by the local DBMS itself.
- Local operations remain local operations
- All the operations that have been given are controlled by the Local DBMS.
2. No central site interference
All service processes, transaction management, deadlock detection, query optimization and management of the catalog system are the responsibility of the local DBMS, and the center has no authority to do so.
3. Continuous operation
The function of DDBMS is modular development, where if a network expansion occurs, the infrastructure creation process will not disrupt the operational flow of data.
4. Independent location
Location freedom is the same as location transparency, users can access the database from many places. In accessing the data, all data is as if stored close to the user's location, it does not matter where the data is physically stored.
5. Freedom of Fragmentation
Users can access the database without having to know how the data is fragmented.
6. Freedom of replication
Users do not need to know whether data has been replicated or not and do not need to access a particular copy of a data item directly, nor should users have access to details for all data when updating data.
7. Distributed query processing
The system must be able to handle query processing that references data across multiple connected sites.
8. Distributed transaction processing
The system must support a transaction as a unit of data recovery. And ensure that global or local transparency must comply with the ACID rules for transactions, for example; naming, consistency, isolation and resilience (Automicity, Consistent, Isolation, Defense).
9. Hardware freedom
DDBMS must be able to be used on a wide variety of hardware platforms.
10. Operating system freedom
In accordance with previous rules, DDBMS must also be able to be used on various operating system platforms.
11. Network freedom
Similar to the previous rule, DDBMS must be able to be used on a variety of different communication network platforms.
12. Database freedom
DDBMS is formed from different local DBMS, which allows for different data models. In other words, DDBMS must be able to support heterogeneous systems.
The last four rules must be owned by the DDBMS. The rest are general rules and if there are weaknesses in the computer standards and network architecture, the system can only expect from the vendor for future fulfillment.