This section covers the core principles of database design, management, and security. You will learn what data normalization is and understand the different normal forms used to organize data efficiently and reduce redundancy. It also introduces DBMS concepts, including its key functions such as data storage, retrieval, updating, catalog management, and backup and recovery. Additionally, the module explores database and network security challenges in the modern world, explains the process and techniques of normalization, and highlights the differences between data mining and data warehousing.
DATABASE
A Database is an organized, consistent, and logical collection of data that can easily be updated, accessed, and managed. Database mostly contains sets of tables or objects which consist of records and fields.
DATABASE NORMALIZATION
✓ Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations.
✓ Data normalization is an important aspect of data management and analysis that plays a crucial role in both data storage and data analysis.
✓ It helps you avoid redundancy and maintain the integrity of the database.
✓ It also helps you eliminate undesirable characteristics associated with insertion, deletion, and updating.
✓ It ensures that each data item is stored in only one place which reduces the overall disk space requirement and improves the consistency and reliability of the system.
✓ In databases, it organizes fields and tables and in data analysis and machine learning, normalization is used to preprocess data before being used in any analysis.
✓ The main purpose of database normalization is to avoid complexities, eliminate duplicates, and organize data in a consistent way.
✓ In normalization, the data is divided into several tables linked together with relationships. Database administrators are able to achieve these relationships by using primary keys, foreign keys, and composite keys.
✓ In database management systems (DBMS), normal forms are a series of guidelines that help to ensure that the design of a database is efficient, organized, and free from data anomalies. There are several levels of normalization, each with its own set of guidelines, known as normal forms.
Need of Database Normalization
Database Normalization helps you design and structure your table properly so that you have proper relationships between tables. It helps you with the following:
- It Make the database more efficient
- Facilitate Data Integrity
- Promote Data consistency
- Eliminate data redundancy
- Prevent update anomalies
- Simplify data retrieval
- Enhance database performance
- Avoid bugs
- Prevent the same data from being stored in more than one place
- Prevent updates being made to some data but not others
- Prevent data not being deleted when it is supposed to be, or from data being lost when it is not supposed to be
- Ensure the data is accurate
- Reduce the storage space that a database takes up
- Ensure the queries on a database run as fast as possible
Different normal forms
1) First Normal Form also known as 1NF
2) Second Normal Form or 2NF
3) Third Normal Form or 3NF
4) Boyce-Codd Normal Form or BCNF
5) Fourth Normal Form or 4NF
The First Normal Form – 1NF
✓ This is the most basic level of normalization. In 1NF, each table cell should contain only a single value, and each column should have a unique name. The first normal form helps to eliminate duplicate data and simplify queries.
✓ For a table to be in the first normal form, it must meet the following criteria:
- a single cell must not hold more than one value (atomicity)
- there must be a primary key for identification
- each column must have only one value for each row in the table
Example:
Consider the following table:
| StudentID | Name | Courses |
|---|---|---|
| 1 | Alice | Math, Science |
| 2 | Bob | History, Geography |
To convert this table to 1NF, we need to ensure that each column contains atomic values:
| StudentID | Name | Course |
|---|---|---|
| 1 | Alice | Math |
| 1 | Alice | Science |
| 2 | Bob | History |
| 2 | Bob | Geography |
The Second Normal Form – 2NF
✓ The 1NF only eliminates repeating groups, not redundancy. That’s why there is 2NF.
✓ A table is said to be in 2NF if it meets the following criteria:
- It’s already in 1NF
- It has no partial dependency. That is, all non-key attributes are fully dependent on a primary key.
✓ Partial dependency – when there are two or more primary keys in a table.
Example:
Consider the table in 1NF:
| StudentID | Name | Course | Instructor |
|---|---|---|---|
| 1 | Alice | Math | Prof. A |
| 1 | Alice | Science | Prof. B |
| 2 | Bob | History | Prof. C |
| 2 | Bob | Geography | Prof. D |
Here, the primary key is (StudentID, Course). To convert this to 2NF, we create separate tables for Students and Courses:
Students Table:
| StudentID | Name |
|---|---|
| 1 | Alice |
| 2 | Bob |
Courses Table:
| Course | Instructor |
|---|---|
| Math | Prof. A |
| Science | Prof. B |
| History | Prof. C |
| Geography | Prof. D |
The Third Normal Form – 3NF
✓ When a table is in 2NF, it eliminates repeating groups and redundancy, but it does not eliminate transitive partial dependency.
✓ This means a non-prime attribute (an attribute that is not part of the candidate’s key) is dependent on another non-prime attribute. This is what the third normal form (3NF) eliminates.
✓ So, for a table to be in 3NF:
- It must be in 2NF
- It must have no transitive partial dependency.
Example:
Consider the Courses Table from 2NF:
| Course | Instructor | InstructorOffice |
|---|---|---|
| Math | Prof. A | Room 101 |
| Science | Prof. B | Room 102 |
| History | Prof. C | Room 103 |
| Geography | Prof. D | Room 104 |
Instructors Table:
| Instructor | InstructorOffice |
|---|---|
| Prof. A | Room 101 |
| Prof. B | Room 102 |
| Prof. C | Room 103 |
| Prof. D | Room 104 |
Courses Table:
| Course | Instructor |
|---|---|
| Math | Prof. A |
| Science | Prof. B |
| History | Prof. C |
| Geography | Prof. D |
Boyce-Codd Normal Form (BCNF)
✓ Boyce and Codd Normal Form is a higher version of the Third Normal Form.
✓ This form deals with a certain type of anomaly that is not handled by 3NF.
✓ A 3NF table that does not have multiple overlapping candidate keys is said to be in BCNF.
✓ For a table to be in BCNF, the following conditions must be satisfied:
- R must be in the 3rd Normal Form
- and, for each functional dependency (X → Y), X should be a Super Key
Example:
Consider the following table:
| StudentID | Course | Instructor |
|---|---|---|
| 1 | Math | Prof. A |
| 2 | Science | Prof. B |
| 3 | Math | Prof. A |
| 4 | Science | Prof. C |
Here, the dependency Course → Instructor violates BCNF because Course is not a super key. To convert this table to BCNF, we separate it into two tables:
Courses Table:
| Course | Instructor |
|---|---|
| Math | Prof. A |
| Science | Prof. B |
| Science | Prof. C |
Enrollment Table:
| StudentID | Course |
|---|---|
| 1 | Math |
| 2 | Science |
| 3 | Math |
| 4 | Science |
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
- It is in the Boyce-Codd Normal Form.
- And, it doesn’t have Multi-Valued Dependency.
Example
STUDENT
| STU_ID | COURSE | HOBBY |
|---|---|---|
| 21 | Computer | Dancing |
| 21 | Math | Singing |
| 34 | Chemistry | Dancing |
| 74 | Biology | Cricket |
| 59 | Physics | Hockey |
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
| STU_ID | COURSE |
|---|---|
| 21 | Computer |
| 21 | Math |
| 34 | Chemistry |
| 74 | Biology |
| 59 | Physics |
STUDENT_HOBBY
| STU_ID | HOBBY |
|---|---|
| 21 | Dancing |
| 21 | Singing |
| 34 | Dancing |
| 74 | Cricket |
| 59 | Hockey |
Advantages of Normalisation
- It reduces data redundancy
- It improves database organisation or data consistency
- It makes the query easier to understand.
- It makes database design simple
- It handles queries faster
- It simplifies database maintenance
- Greater flexibility in database design
- Maintains relational integrity
- It easier database maintenance
- Improved query performance
Disadvantages of Normalisation
- Increased complexity
- It is time consuming process to decompose into higher normalisation forms.
- Slower write performance
- Requires much more CPU, memory, and I/O to process thus normalized data gives reduced database performance
- Requires more joins to get the desired result. A poorly-written query can bring the database down
- Maintenance overhead. The higher the level of normalization, the greater the number of tables in the database.
DATABASE MANAGEMENT SYSTEM
✓ A DBMS is software that manages databases, providing an interface to store, retrieve, and manipulate data efficiently and securely.
✓ The DBMS serves as an interface between the database and the users or application programs.
✓ It enables users to create, modify, and maintain the database, as well as control access to the data.
✓ It provides a way to store, retrieve, and manipulate data in a structured manner.
✓ It also includes features such as data validation, indexing, and data security.
✓ Some examples of DBMS include MySQL, Oracle, and Microsoft SQL Server.
✓ DBMS provides an interface to perform various operations like database creation, storing data in it, updating data, creating a table in the database and a lot more.
✓ It provides protection and security to the database. In the case of multiple users, it also maintains data consistency.
Characteristics of DBMS
- Data independence – The DBMS allows users to make changes to the database schema without affecting the application programs that use the data.
- Concurrent access – The DBMS allows multiple users to access the database simultaneously, and provides mechanisms to ensure that concurrent access does not result in conflicts.
- Data integrity – The DBMS ensures that data is entered correctly and follows certain rules, such as constraints and data types.
- Data security – The DBMS provides features to control access to the data and protect it from unauthorized access.
- Data recovery – The DBMS provides mechanisms to recover the database from crashes or other failures, and to create backups of the data for disaster recovery.
- Data sharing – The DBMS allows multiple applications to access the same data, which can improve data consistency and reduce data redundancy.
- High-performance – The DBMS uses various techniques to improve the performance of data retrieval, such as indexing, caching and partitioning.
- Scalability – The DBMS can handle large amounts of data and support increasing numbers of users and transactions.
- Data Validation – The DBMS checks the data entered by the user and ensures that it is valid as per the defined constraints and rules.
- Data Abstraction – The DBMS provides a level of abstraction between the user and the physical storage of the data, which makes it easier to manage and manipulate the data.
- Data Representation – The DBMS stores data in a logical and organized manner, which makes it easy to access, retrieve and manipulate the data.
Functions of DBMS
1) Data Storage Management – It is the one of the most important task of DBMS. The DBMS creates the complex structures required for data storage. The users are freed from defining, programming and implementing the complex physical data characteristics.
2) Security Management – The DBMS creates a security system that enforces user security and data privacy within the database. Security rules determine the access rights of the users. Read/write access is given to the user is specified using access rights.
3) Multiuser Access Control – Multi User Access control is another feature which is provided by the modern Database Systems. So, more than one user can access the database at the same time without any problem. Hence the database ensures data integrity and data consistency. This feature makes sure the integrity of the data present in the database. It also follows the ACID property, so the database will be consistent while multiple users are accessing it concurrently. It is very useful for the database of organizations where multiple database engineers are working concurrently.
4) Backup and Recovery Management – The DBMS provide backup and data recovery procedures to ensure data safety and integrity. DBMS system provides special utilities which allow the DIM to perform routine and special backup and restore procedures. Recovery Management deals with the recovery of the database after a failure.
5) Data Integrity Management – Database systems provide data integrity management by maximizing the data consistency and minimizing the data redundancy. The data dictionary is the feature database system used to store the relationships of the data to keep the data integrity. Data integrity is needed where a transaction-based database system is present.
6) Access Languages and APIs – Query language is used to access the data in the database. Writing queries are easy as users need not specify how it should be done as DBMS will take care of the same. Structured Query Language or SQL is the most used language in the database as all vendors support this language in their database.
7) Data Dictionary Management – The data dictionary stores the definitions of data elements and their relationships. This information is termed as metadata. The metadata includes definition of data, data types, relationship between data, integrity constraints etc. Any changes made in a database structure are automatically reflected in the data dictionary. In short the DBMS provides data abstraction and it removes structural and data dependency from the system.
8) Data Transformation and Presentation – DBMS provides the functionality of data transformation, which means programmers need not worry about the logical and physical representation of the data. DBMS stores the data in the determined data structure.
For example, if a user asks for the date from a database and he receives it as 14 December 2022, but in the database, it is stored in different columns of month, date and year.
9) Database Communication – DBMS accept the users from various applications or through web browsers in the system. This communication is achieved in the system with the help of predefined reports in the websites that are published with the help of DBMS.
Advantages of DBMS
- Data Independence.
- Efficient Data Access.
- Data Integrity and security.
- Data administration.
- Concurrent access and Crash recovery.
- Reduced Application Development Time
- Controls database redundancy
- Easily Maintenance
- Data security
- High-performance
- Multiple user interface
Disadvantages of DBMS
- It requires a high speed of data processor and large memory size to run DBMS software.
- The cost of Hardware and Software of a DBMS is quite high, which increases the budget of your organization.
- It occupies a large space of disks and large memory to run them efficiently.
- Database system creates additional complexity and requirements.
- Failure is highly impacted the database because in most of the organization, all the data stored in a single database and if the database is damaged due to electric failure or database corruption then the data may be lost forever.
- Using the same program at a time by multiple users sometimes leads to data loss.
- DBMS can’t perform sophisticated calculations.
- Offers Data Integrity and Security
- Reduced Application Development Time
Application of DBMS
A Database Management System is used in a wide variety of applications and industries. Some common applications of DBMS include:
- Business – DBMSs are used to store and manage data in many business-related applications, such as customer relationship management (CRM), enterprise resource planning (ERP), and supply chain management.
- Banking and finance – DBMSs are used to store and manage financial data, such as customer information, account balances, and transaction history.
- Healthcare – DBMSs are used to store and manage patient information, medical records, and other healthcare-related data.
- Education – DBMSs are used to store and manage student information, grade records, and other educational data.
- E-commerce – DBMSs are used to store and manage data for e-commerce applications, such as product catalogs, customer information, and order history.
- Government – DBMSs are used to store and manage data for various government-related applications, such as tax records, voter registration, and public safety.
- Manufacturing – DBMSs are used to store and manage data for manufacturing and production processes, such as inventory control, scheduling, and quality control.
- Retail – DBMSs are used to store and manage data for retail operations, such as inventory, sales, and customer data.
- Transportation – DBMSs are used to store and manage data for transportation operations, such as fleet management, scheduling, and logistics.
- Social Media – DBMSs are used to store and manage data for social media applications, such as user information, posts, and interactions.
DATA RETRIEVAL
✓ In databases, data retrieval is the process of identifying and extracting data from a database, based on a query provided by the user or application.
✓ Data retrieval typically requires writing and executing data retrieval or extraction commands or queries on a database. Based on the query provided, the database looks for and retrieves the data requested.
✓ Applications and software generally use various queries to retrieve data in different formats.
✓ Data retrieval enables users to efficiently access and utilize data, making it crucial for data analysis, decision-making, and reporting.
✓ Moreover, Data retrieval ensures the confidentiality, integrity, and availability of the data while protecting it from unauthorized access or tampering.
Advantages of Data retrieval
- Quick access to stored data.
- It improves decision-making and analysis.
- Flexible filtering and sorting.
- Compatibility with various query languages.
Disadvantages of Data retrieval
- It can cause security risks.
- May require specialized knowledge or skills.
- It can negatively impact database performance.
DATA CATALOG
✓ A Data Catalog is a collection of metadata, combined with data management and search tools, that helps analysts and other data users to find the data that they need.
✓ Fundamentally, metadata is data that provides information about other data. In other words, it’s “data about data”. It consists of labels or markers that describe information, making it easier to find, understand, organize, and use.
✓ It serves as an inventory of data and provides the necessary information to evaluate the fitness of data for intended uses.
✓ It also helps analysts and other data users find the target data they need for specific purposes.
✓ It helps organizations discover, understand, and consume data better — all in one place.
✓ Using a data catalog the right way means better data usage, all of which contributes to:
- Cost savings
- Operational efficiency
- Competitive advantages
- Better customer experience
- Fraud and risk advantage
Fundamental features of a data catalog
Data catalog reduces the time to insight for data users. It ensures:
- Data is made readily accessible – A data catalog automatically crawls, identifies, inventories, and classifies data assets from multiple sources. Data catalog tools allow you to run a search across data lakes, data warehouses, databases, tables, columns, SQL queries, and business glossaries.
- Context is provided – People with no context of the data can learn more about it to decide if they have the right data.
- The data lifecycle is visible – Data catalogs enable you to visualize the complete lifecycle of a data asset, its transformation, and its dependency both upstream and downstream.
- Access permissions are defined – A data catalog helps enforce robust access control policies as guard rails to help you protect confidentiality and comply with various data protection regulations.
Benefits of a Data Catalog
- Improved data efficiency – Data catalogs make data work better. They help you find data easily, avoid duplication, understand data better, ensure data rules are followed, make data integration smoother, and encourage teamwork.
- Improved data context – Data catalogs boost data understanding with detailed info about datasets. This includes where they come from, their quality, who uses them, how they should be used, and how they connect to other datasets. This info makes it easier for users to grasp the data’s meaning, importance, and suitability. As a result, users can perform better decision-making and analysis.
- Reduced risk of error – Data catalogs cut down on errors. They offer quality data info and detailed descriptions, track data history, comply with metadata rules, promote teamwork, limit access, and assist with data prep. All this helps users handle data more accurately, reducing errors in analysis and usage.
- Improved data analysis – Data catalogs improve data analysis in many ways. They make it easier to find data, offer context with metadata, ensure data quality, enable teamwork, and simplify data integration. Users can find and use datasets faster, saving time on prep. Detailed metadata provides insights into data quality, and relationships. Collaboration tools help teams share insights. With a data catalog, analysts can make informed decisions, reduce errors, and analyze data more efficiently.
- More accurate analytics – By making it easier for users to find all of the applicable data for analytics applications, a data catalog helps improve the accuracy of the results.
- Productivity improvements – A data catalog reduces the time users spend looking for data, enabling them to do more analytics work. It can also eliminate duplicate data transformation and preparation tasks by different analysts.
- Higher-quality data that’s more reliable – Embedded data governance, data quality and data security functions help create trusted data sets for analytics users.
BACK UP AND RECOVERY
✓ The backup and recovery of data is the process of backing up your data in the event of a loss and setting up secure systems that allow you to recover your data as a result.
✓ A backup is a copy of data of a database that you can use to reconstruct data. A backup can be either a physical backup or a logical backup.
✓ Data backup requires the copying and archiving of computer data to make it accessible in case of data corruption or deletion.
✓ In general, the purpose of a backup and recovery strategy is to protect the database against data loss and reconstruct the database after data loss.
✓ You can only recover data from an earlier time if you have backed it up with a reliable backup device.
✓ Data backup is one form of disaster recovery making it an essential part of any sensible disaster recovery plan.
✓ Backing up data cannot always restore all of your business operating systems data and settings. For instance, computer clusters, database servers, or active directory servers may need additional types of disaster recovery since a backup and recovery may not entirely reconstitute them.
✓ There are two major backup types:
- Physical Backup – This is a copy of physical database files such as data, control files, log files, and archived redo logs. It is a copy of the files that store database information in another location and forms the foundation of the database recovery mechanism.
- Logical Backup – It contains the logical data that is extracted from a database, and it consists of tables, procedures, views, functions, etc. However, logical backups alone are not recommended or useful since it only provides structural information.
Data recovery is the process of restoring lost, corrupted, or inaccessible data from backup copies. Effective data recovery ensures minimal downtime and data loss, maintaining business continuity.
Recovery Techniques
- Restore Full Backup – Involves restoring the most recent full backup. It is straightforward but may not include the latest data changes.
- Restore Incremental Backup – Requires restoring the last full backup followed by each incremental backup since then. This can be complex and time-consuming.
- Restore Differential Backup – Involves restoring the last full backup followed by the most recent differential backup. This method is faster than restoring multiple incremental backups.
Importance of data backup and recovery in DBMS
Data Protection
- Accidental Deletion or Modification – Users or applications might accidentally delete or modify data. Regular backups ensure that a previous state of the data is available.
- Data Corruption – Data can become corrupted due to software bugs, hardware failures, or malicious activities. Backup and recovery mechanisms help restore the database to a stable state.
Disaster Recovery
- Hardware Failures – Disk crashes, power outages, or server failures can render data inaccessible or lost. Backups stored on separate media or locations allow recovery after such incidents.
- Natural Disasters – Events like floods, fires, or earthquakes can destroy physical data storage. Off-site backups ensure data recovery is possible even in such scenarios.
Business Continuity
- Minimize Downtime – Regular backups and efficient recovery processes help minimize downtime during data loss events, maintaining business operations and services.
- Regulatory Compliance – Many industries are required by law to maintain data integrity and availability. Regular backups help meet these compliance requirements.
Historical Data Preservation
- Archiving – Backups can serve as archives for historical data, allowing businesses to access past data for analysis, reporting, or compliance purposes.
- Version Control – Regular backups help maintain different versions of the data, allowing rollbacks to specific points in time if necessary.
Security
- Recovery from Malicious Attacks – In case of data breaches or unauthorized access, backups ensure that a clean and secure version of the database can be restored.
Operational Efficiency
- Testing and Development – Backups provide a means to create test environments without affecting the live database, helping in development and testing processes.
- Data Migration – During database migrations or upgrades, backups ensure that the original data can be restored in case of migration issues.
Difference between data mining and data warehousing
| Criteria | Data Mining | Data Warehousing |
|---|---|---|
| Definition | Process of discovering patterns in large datasets | Process of collecting, storing and managing data from various sources |
| Purpose | To extract useful insights and knowledge from data | To provide a comprehensive view of an organization’s data |
| Focus | Analyzing data to identify patterns, correlations and trends | Storage and management of data for reporting and analysis |
| Source of data | Large datasets from various sources | Multiple sources, including internal and external systems |
| Data processing | Advanced techniques like machine learning algorithms | Aggregating, transforming and organizing data |
| Analysis methods | Techniques such as clustering, classification and regression | Queries, reports and online analytical processing (OLAP) |
| Timeframe | Historical and current data | Historical data only |



