Database Design & Modeling Interview Questions - DBMS

Database Design & Modeling is the process of planning and structuring how data will be stored, organized and managed in a database. Database Design involves deciding what data needs to be stored, how different pieces of data relate to each other, and how to avoid duplication or inconsistency.

Database Modeling is the step where we create visual representations (models) of the database structure.

1. What are the features of using a DBMS?

The features of using a DBMS are:

Efficient Data Retrieval: Optimizes queries and indexing, allowing faster data retrieval.
Reduced Redundancy: Avoids duplicate data by enforcing normalization.
Backup and Recovery: Offers automatic backup and recovery mechanisms.
Concurrent Access: Allows multiple users to access the database at the same time without conflicts.
Data Integrity: Ensures that the data is accurate and consistent.

2. What is the difference between DBMS and RDBMS?

DBMS (Database Management System): A system that allows users to create, store, modify, and delete data. It does not require a relational structure for data organization. Examples: Microsoft Access, XML databases.
RDBMS (Relational Database Management System): A subset of DBMS that stores data in a structured format, using tables (relations), and supports relational operations like joins. It enforces data integrity through keys and supports SQL for querying. Examples: MySQL, Oracle, SQL Server.

3. What are the different types of DBMS?

The four types of DBMS are:

Hierarchical DBMS: Data is organized in a tree-like structure with parent-child relationships. Example: IBM’s IMS.
Network DBMS: Data is represented as a graph with many-to-many relationships. Example: Integrated Data Store (IDS).
Relational DBMS (RDBMS): Data is organized in tables (relations) and managed through SQL. Example: MySQL, PostgreSQL.
Object-Oriented DBMS: Data is stored as objects, like in object-oriented programming. Example: ObjectDB.

4. What is a relation, table, rows and columns in DBMS?

A relation in DBMS is a table that consists of rows and columns. Each row represents a record, and each column represents an attribute or property of the entity being described. Relations are defined by a schema, which specifies the attributes (columns) of the table.

5. What are the primary components of a DBMS?

The primary components of a DBMS are:

Database Engine: Responsible for storing, retrieving, and managing data.
Database Schema: The structure that defines the organization of the database.
Query Processor: Interprets and executes SQL queries.
Transaction Manager: Ensures the ACID properties of transactions.
Storage Manager: Manages the physical storage of data.

6. How does the relational model resolve redundancy better than hierarchical and network models?

here are the reasons why the relational model resolves redundancy better:

Data stored once: Each entity (like Employee, Department) is stored in its own table, avoiding repetition.
Keys for relationships: Foreign keys link related tables instead of copying the same data.
Normalization – Breaks large tables into smaller ones, removing duplicate values and repeating groups.
Update efficiency – Changes made in one table automatically reflect across relationships, so no multiple updates.

7. Why are object-oriented data models not widely used in enterprise systems despite supporting encapsulation and inheritance?

They are complex to design and maintain compared to relational models.
No standard query language like SQL, which limits ease of use.
Performance issues arise when handling complex objects at scale.
Limited ecosystem and tooling compared to mature relational databases.
Integration difficulties with existing enterprise systems that rely on RDBMS.

8. How can an ER diagram lead to unnormalized schemas if not properly mapped? Explain with an example.

An ER (Entity-Relationship) diagram is a conceptual design tool, but if its entities and attributes are not carefully mapped into relational tables, it can lead to unnormalized schemas. The main reason is improper handling of:

Multivalued attributes (e.g., multiple phone numbers, multiple skills)
Composite attributes (e.g., Address with street, city, state)
Improper relationship mapping (e.g., embedding many-to-many relationships directly into a single table).

This causes repeating groups or redundant data, violating First Normal Form (1NF).

9. Why is BCNF considered stricter than 3NF? Provide a situation where 3NF is satisfied but BCNF is violated.

Third Normal Form (3NF): A relation is in 3NF if, for every functional dependency X -> Y, X is a superkey, OR Y is a prime attribute (part of some candidate key). Thus, 3NF allows certain dependencies where a non-superkey determinant exists, provided the dependent attribute is prime.

Boyce-Codd Normal Form (BCNF): A relation is in BCNF if, for every functional dependency X -> Y, X must be a candidate key. This is stricter because it eliminates all cases where a non-superkey determines attributes, even if they are prime.

Example: Consider relation R(A, B, C) with functional dependencies: {A -> B, B -> A, B -> C}

Candidate Keys: From A -> B and B -> A, we know A and B are interchangeable. So, A and B are candidate keys.

Check 3NF:

A -> B (A is a candidate key).
B -> A (B is a candidate key).
B -> C (B is a candidate key) -> Relation is in 3NF.

Check BCNF:

Dependency B -> C: Here B is a candidate key, so this is fine.
But if we modify slightly: Suppose FDs are A -> B and B -> C (without B -> A).
Candidate Key = {A}.
A -> B (A is candidate key).
B -> C (B is not a candidate key).

3NF is satisfied (since C is non-prime and determinant B is not a superkey, but transitive dependencies via superkey A are allowed). However, BCNF is violated because every determinant must be a candidate key.

10. What is a primary key? Explain with an example.

A Primary Key is a unique identifier for each record in a table. It ensures that no two records have the same value for the primary key field. It cannot contain NULL values. Example: In a STUDENT table, ROLL_NO could be the primary key because each student has a unique roll number.

ROLL_NO	NAME	ADDRESS
1	Rose	Los Angles
2	Merry	California

11. What is a foreign key? Explain with an example.

A Foreign Key is an attribute in a table that links to the primary key in another table. It creates a relationship between two tables, ensuring referential integrity. Example: In a STUDENT table, the BRANCH_CODE could be a foreign key referencing the primary key BRANCH_CODE in the BRANCH table.

Student Table

ROLL_NO	NAME	BRANCH_CODE
1	Rose	CS
2	Merry	IT

BRANCH Table

BRANCH_CODE	BRANCH_NAME
CS	Computer Science
IT	Information Technology

Here, BRANCH_CODE in STUDENT is a foreign key referencing BRANCH_CODE in BRANCH.

12. What is normalization? Why is it important in DBMS?

Normalization is the process of organizing data in a way that reduces redundancy and dependency. It involves dividing large tables into smaller ones and defining relationships between them to ensure data integrity.

Importance:

Eliminates redundant data.
Prevents anomalies during data operations (insertion, update, deletion).
Improves data integrity and consistency.

13. What is an entity-relationship diagram (ERD)?

An Entity-Relationship Diagram (ERD) is a visual representation of the entities within a system and the relationships between those entities. It is used in database design to model the structure of data and how different pieces of data relate to each other.

Entities: Objects or things within the system (e.g., Student, Course).
Attributes: Properties or details about an entity (e.g., Student Name, Course Duration).
Relationships: How entities interact with each other (e.g., Student enrolls in Course).

Example of ERD:

A Student entity might have attributes like ID, Name, and Age.
A Course entity might have attributes like CourseID, CourseName.
A relationship Enrolls connects Student and Course with attributes like EnrollmentDate.

14. What is denormalization? How does it differ from normalization?

Denormalization is the process of combining tables to improve query performance, often by introducing redundancy. While normalization minimizes redundancy, denormalization sacrifices some of it to improve speed for read-heavy operations.

Difference:

Normalization: Focuses on minimizing redundancy and improving data integrity.
Denormalization: Adds redundancy for performance optimization in some cases.

15. Can a schema be in 2NF but still have insertion anomalies? Justify your answer.

Yes. 2NF removes partial dependencies but not transitive dependencies. If a non-prime attribute depends transitively on a key, it can still cause insertion/deletion anomalies. Only 3NF or BCNF removes these anomalies.

Example:

Relation: Student(CourseID, StudentID, StudentName, InstructorName)

Functional Dependencies:

{CourseID, StudentID} -> StudentName
CourseID -> InstructorName
Candidate Key = {CourseID, StudentID}

Check 2NF:

The key is composite ({CourseID, StudentID}).
No partial dependency exists because:
StudentName depends on the full key.
InstructorName depends only on CourseID (which is part of the key).
Relation is in 2NF.

Problem (Insertion Anomaly): To insert a new course with its instructor (say CourseID = C10, Instructor = Dr. Rao), we must also insert at least one student record. This shows an insertion anomaly caused by the transitive dependency CourseID -> InstructorName.

16. How do multivalued dependencies affect normalization, and how are they resolved?

Multivalued dependencies (MVDs) occur when one attribute determines multiple independent values of another. They're not resolved in BCNF but in 4NF. Decomposition is used to eliminate MVD anomalies.

Impact on Normalization:

MVDs are not handled by 1NF, 2NF, 3NF, or BCNF.
They lead to data redundancy and update anomalies, similar to functional dependencies, but across sets of values.
Example anomaly: If a student can have multiple hobbies and multiple skills, storing them in a single table causes unnecessary repetition of combinations.

Example: Relation: Student(SID, Hobby, Skill)

MVDs: SID ->-> Hobby & SID ->-> Skill
Problem: A student with 2 hobbies and 2 skills produces 4 redundant tuples (all possible hobby-skill combinations).

Resolution - 4NF:

4NF extends BCNF to remove redundancy caused by MVDs.
Rule: A relation is in 4NF if, for every non-trivial MVD X ->-> Y, X is a superkey.

Decomposition: Split the relation into two independent ones:

StudentHobby(SID, Hobby)
StudentSkill(SID, Skill)

Note: This decomposition is lossless and eliminates redundancy since hobbies and skills are stored separately. Multivalued dependencies cause repetition and anomalies in schemas. They are not resolved in BCNF but are eliminated by decomposing relations into 4NF, ensuring each fact is stored independently.

17. What are constraints in DBMS?

Constraints in DBMS are rules that limit the type of data that can be inserted into a table to ensure data integrity and consistency. Common types of constraints include:

NOT NULL
PRIMARY KEY
FOREIGN KEY
UNIQUE, CHECK, DEFAULT.

18 What are the risks of over-normalization in schema design? Provide a scenario.

Over-normalization happens when a schema is broken down into too many small relations in pursuit of strict normal forms. While it reduces redundancy, it introduces practical challenges:

Risks:

Excessive Joins -> Performance Overhead: Queries require joining multiple tables, which increases execution time and complexity.
Reduced Query Readability: Developers must write longer, harder-to-maintain queries.
Application Complexity: ORMs or application logic become complicated due to many relations.
Possible Denormalization Later: DBAs may need to deliberately add redundancy back (denormalization) for performance optimization.

Scenario Example: Suppose we design a Customer Address schema and normalize it excessively:

Customer(cust_id, name, address_id)
Address(address_id, city_id, street_id)
City(city_id, state_id, city_name)
State(state_id, state_name)

Note: While this is highly normalized, retrieving a full customer address (state + city + street) requires 3–4 joins. In practice, a single CustomerAddress table (cust_id, street, city, state) would be more efficient without significant redundancy issues.

19. Why can some schemas not be decomposed into BCNF without losing dependency preservation?

Some decompositions eliminate redundancy (BCNF) but do not preserve all original functional dependencies. A classic example is when R(A, B, C) with FDs A -> B, B -> C must be decomposed in a way that loses B -> C.

Problem:

Some schemas have overlapping dependencies where it is impossible to decompose into BCNF relations while still being able to enforce all original FDs using only the decomposed relations.
This creates a trade-off between BCNF (eliminating redundancy) and dependency preservation (enforcing all FDs locally).

20. How do surrogate keys impact schema design and normalization?

A surrogate key is an artificial key (like an auto-incremented ID) introduced to uniquely identify tuples when a natural key exists but is inconvenient to use.

Impact on Schema Design:

Surrogate keys simplify joins and schema readability, especially when natural keys are composite or lengthy (e.g., PassportNo + CountryCode).
They make schema implementation-friendly but introduce extra attributes that do not carry business meaning.

Impact on Normalization:

Surrogate keys can mask functional dependencies (FDs).
Example: If SSN -> Name, DOB, and we replace SSN with a surrogate PersonID, the FD must now be documented as PersonID -> Name, DOB.
This may hide the real-world dependency (SSN -> Name, DOB) and make schema analysis harder.
Overuse of surrogate keys may lead to unnecessary normalization since natural FDs are not directly visible.
However, surrogate keys themselves do not violate normal forms; they just change how dependencies are represented.

21. Compare candidate key selection in hierarchical vs. relational models.

Hierarchical models use implicit pointer structures, making candidate key definition ambiguous. Relational models require explicit key declaration, enabling consistent dependency enforcement and normalization.

Aspect	Hierarchical Model	Relational Model
Data Organization	Tree (parent–child pointers)	Tables (relations)
Candidate Key Definition	Implicit (via path + parent linkage)	Explicit (declared in schema)
Uniqueness Enforcement	Based on navigation structure	Based on candidate key constraints
Normalization	Not directly supported	Strongly supported through key dependencies
Consistency	Ambiguous; path-dependent	Clear; schema-defined

Note: In hierarchical models, candidate key identification is implicit and path-dependent, making it ambiguous. In contrast, relational models explicitly declare candidate keys, ensuring consistent enforcement of dependencies and enabling systematic normalization.

22. What is the role of the Database Administrator (DBA)?

A Database Administrator (DBA) is responsible for managing and overseeing the entire database environment. Their key responsibilities include:

Database Installation & Configuration: Setting up and configuring database systems.
Performance Monitoring & Tuning: Ensuring queries and database operations run efficiently.
Backup and Recovery: Ensuring regular backups and providing recovery solutions in case of failures.
Performance Tuning: Monitoring and optimizing the database's performance.
Security Management: Managing user access, privileges, and enforcing security policies.
Data Integrity: Ensuring data consistency and integrity through constraints and checks.
Upgrades and Patches: Keeping the database software up-to-date with patches and upgrades.
Troubleshooting: Identifying and resolving database-related issues.

23. What is an entity-relationship diagram (ERD)?

An Entity-Relationship Diagram (ERD) is a visual representation of the entities within a system and the relationships between those entities. It is used in database design to model the structure of data and how different pieces of data relate to each other.

Entities: Objects or things within the system (e.g., Student, Course).
Attributes: Properties or details about an entity (e.g., Student Name, Course Duration).
Relationships: How entities interact with each other (e.g., Student enrolls in Course).

Example of ERD:

A Student entity might have attributes like ID, Name, and Age.
A Course entity might have attributes like CourseID, CourseName.
A relationship Enrolls connects Student and Course with attributes like EnrollmentDate.

24. What is the difference between a superkey and a candidate key?

Superkey: A set of one or more attributes that can uniquely identify a row in a table. It may contain unnecessary attributes.
Candidate Key: A minimal superkey that uniquely identifies a row, with no redundant attributes. A table can have multiple candidate keys, and one is chosen as the Primary Key.

25. How would you convert a ternary relationship in ER diagram into a relational schema without loss of information?

A ternary relationship is a relationship among three entities (say E1, E2, E3). To convert it into a relational schema without loss of information:

Create a new relation representing the ternary relationship.
Include the primary keys of all three participating entities as foreign keys in this new relation.
Add any attributes of the ternary relationship into the same relation.
The combination of the three foreign keys usually forms the primary key of the new relation (unless constraints specify otherwise).

26. Can a relation be in 3NF and still be lossy when decomposed? Explain with example.

Yes. 3NF doesn’t guarantee lossless decomposition unless we ensure that at least one relation in decomposition contains a candidate key. Without this, natural joins can introduce extra tuples.

Example: Consider relation R(A, B, C) with FDs: {A -> B, B -> C}

Candidate Key: {A}
Relation R is in 3NF: A -> B (A is key) & B -> C (C is non-prime, B is not a key, but C depends transitively on key A -> allowed in 3NF).

Decomposition Attempt: Decompose R into:

R1(A, B)
R2(B, C)
Intersection = {B}

Check for lossless join:

{B} -> A = Lossy
{B} -> C = Lossless (but C is only in R2, not in R1) -> Condition not satisfied -> Lossy decomposition.

Problem: When we join R1 and R2 back, extra tuples (spurious tuples) may appear because B is not a key in either relation.

27. Can transitive dependencies exist in a schema even if it’s in 1NF? Explain.

Yes. 1NF only ensures atomic values, not dependency structure. Transitive dependencies like A -> B, B -> C may still exist, which violate 3NF and cause redundancy and anomalies.

Note: Elimination of transitive dependencies is handled only in 3NF (or BCNF), not in 1NF.

28. What normalization form would best prevent update anomalies in a product-supplier database? Justify.

To prevent these anomalies, the schema should be normalized to at least Third Normal Form (3NF), and preferably Boyce-Codd Normal Form (BCNF) if possible.

Example (Unnormalized Design): ProductSupplier(ProductID, ProductName, SupplierID, SupplierName, SupplierAddress)

Functional Dependencies (FDs):

ProductID -> ProductName, SupplierID
SupplierID -> SupplierName, SupplierAddress

Problems:

Supplier details (SupplierName, SupplierAddress) are repeated for every product supplied.
Updating supplier address requires multiple changes -> update anomaly.

Normalized Design (3NF/BCNF):

Product(ProductID, ProductName, SupplierID)
Supplier(SupplierID, SupplierName, SupplierAddress)
Now supplier information is stored once in Supplier.
Product table just references suppliers through a foreign key.

Justification:

3NF removes transitive dependencies (e.g., ProductID -> SupplierAddress via SupplierID).
BCNF goes one step further, ensuring every determinant is a candidate key, eliminating subtle redundancy cases.
In practice, 3NF is usually sufficient for avoiding anomalies, while BCNF is applied if the FDs allow without sacrificing dependency preservation.

Note: The best normalization form to prevent update anomalies in a product-supplier database is 3NF (or BCNF, if applicable). This ensures supplier details are separated from product details, removing redundancy and anomalies.

29. Design a schema for hospital patient data that allows tracking historical address changes efficiently.

To efficiently track historical address changes without repeating patient details, we can design the schema with two relations:

1. Patient Table: Stores core patient details that do not change frequently.

Patient(
patient_id PRIMARY KEY,
name VARCHAR(100),
dob DATE,
gender CHAR(1),
contact_number VARCHAR(15)
);

2. PatientAddressHistory Table: Stores all address records with start and end dates for each patient.

PatientAddressHistory(
history_id PRIMARY KEY,
patient_id FOREIGN KEY REFERENCES Patient(patient_id),
address VARCHAR(255),
city VARCHAR(100),
state VARCHAR(100),
postal_code VARCHAR(20),
start_date DATE,
end_date DATE -- NULL if it is the current address
);

How It Works:

When a patient changes address, a new row is inserted into PatientAddressHistory with the new address and start_date.
The previous record’s end_date is updated.
This avoids duplicating patient details (name, DOB, gender, etc.) for every address change.

30. Can two schemas have same FDs but differ in normalization? Explain.

Yes, Two schemas can have the same set of functional dependencies (FDs) but be designed at different levels of normalization. This happens because normalization is about the structure of relations, not about the FDs themselves. The underlying FDs remain consistent, but how they are represented in tables (normalized vs denormalized design) can differ.

Example: Functional Dependencies

StudentID -> StudentName, DeptID
DeptID -> DeptName

Schema 1 (Unnormalized / 2NF):

Relation: Student(StudentID, StudentName, DeptID, DeptName)
Issues: Repetition of DeptName for every student in the same department. Leads to redundancy and anomalies.

Schema 2 (Normalized to 3NF):

Student(StudentID, StudentName, DeptID)
Department(DeptID, DeptName)

Same FDs hold (StudentID -> DeptID and DeptID -> DeptName).
But transitive dependency (StudentID -> DeptName) is eliminated by decomposition.

31. Can attribute redundancy ever be intentional? Justify with an example.

Yes. For caching or fast lookup, redundancy like duplicating a computed column (e.g., total cost = quantity * price) is acceptable when performance outweighs normalization benefits.

When Redundancy is Acceptable:

Precomputed/Derived Attributes: For example, Storing TotalCost = Quantity × Price in an Order table. Although it can be derived dynamically, pre-storing it avoids repeated computation in queries involving large datasets.
Denormalization for Performance: For example, In a data warehouse, redundant attributes are stored to reduce join operations and speed up analytical queries.
Caching Frequently Used Data: For example, Storing a customer’s FullName alongside FirstName and LastName to simplify display queries.
Distributed Databases / NoSQL Systems: Redundancy is often used to improve read performance and availability (e.g., duplicating user profile data across collections in MongoDB).

Example (Order Table):

OrderID	ProductID	Quantity	Price	TotalCost
101	P1	3	100	300
102	P2	5	50	250

Here, TotalCost is redundant (can be computed as Quantity × Price), but storing it saves computation time for frequent reporting queries.

32. Explain the difference between a primary key and a unique key.

Primary Key:

Uniquely identifies each record in a table.
Cannot contain NULL values.
A table can have only one primary key.

Unique Key:

Ensures that all values in a column (or a set of columns) are unique across all rows.
Can contain NULL values (unlike a primary key).
A table can have multiple unique keys.

33. What is referential integrity in DBMS?

Referential Integrity ensures that relationships between tables are maintained correctly. It requires that the foreign key in one table must match a primary key or a unique key in another table (or be NULL). This ensures that data consistency is maintained, and there are no orphan records in the database.

Example: In the Orders table, if the CustomerID is a foreign key, it should match a valid CustomerID in the Customers table or be NULL.

34. In what scenarios is 1NF violation acceptable in modern database systems?

Scenarios Where 1NF Violation is Acceptable:

Document Databases (e.g., MongoDB): A "Customer" document can embed multiple addresses or orders as nested arrays.