Database management system designed to efficiently handle both (transactions) an
Database management system designed to efficiently handle both (transactions) and (aggregate queries) workloads.The system consists of three main components: Transaction Manager, Data Manager, and Scheduler. In this overview, I’ll outline the requirements for each one
Databas Schema
For this project, it will be used to support a simplified version of a Coffee Chain Management System,
where all records (tuples) have the following fields (schema):
• coffeeID: 4-byte integer that identifies the coffee type. This attribute is also the Primary Key.
• coffeeName: 8-byte long string that identifies the name of the coffee type.
• intensity: 4-byte integer that represents the flavor profile of the coffee type.
• countryOfOrigin: 8-byte long string that identifies the country where the coffee originated from.
Data Manager
• Disk Organization:
Records are kept on persistent storage (disk) in a columnar fashion, meaning that each attribute of a table
is kept in a separate data file. Each data file consists of slotted pages of size 512 bytes each. You need
to design the organization of each column data file, whether a hash, ordered, or heap file, given that the
intensity is updated frequently as opposed to coffeeID, coffeeName, and countryOfOrigin, which are rarely
updated. For hash files, use the hash function h(x) = x mod 16. Conflicts are resolved with chaining, by
appending new slotted pages at the end of the file as necessary.
You need to define a Bloom filter access method on the coffeeID attribute in order to optimize insert and
update operations. The goal is to achieve 2% false positives with two hash functions. The attribute coffeeID
is unique and randomly generated by the store’s manager out of 64K values.
In addition to the required Bloom filter access method, you should include any auxiliary structure(s) you
plan to use (e.g., access methods) to speed up access to the data.
• Main Memory Organization:
The records will have to be brought to main memory (database buffer) to be manipulated. However, the
number of buffer pages available in the database buffer is limited and should be specified at the beginning
of each execution. You are required to implement Least Recently Used (LRU) page replacement mechanism
to swap pages in and out.
The available database buffer should be equally divided between the row and column layouts. Assume
page size equal to 512 bytes. Recall, the operations R, I, U, and T should be performed on the row buffer,
and operations M and G on the column buffer. Hint: You will not prefetch column values to the column
buffer in advance. The columnar store will be retrieved on-demand. All updates made to the row buffer
should be propagated to the disk when corresponding pages from the database buffer get swapped out, or
when the script execution completes by invoking Q (quit).
All meta-data and control structures (such as page tables, recovery log) in main memory that are needed
for efficient processing are kept outside of the database buffer. As opposed to the database buffer, there is
always sufficient space for meta-data and control structures.
You must keep the data synchronized and consistent between the row buffer, the column buffer, and the
disk, while executing multiple transactions simultaneously. The Data Manager also ensures transactions’
atomicity by adopting an undo recovery strategy where all before images (i.e., recovery log) are kept in the
buffer in main memory, and they are discarded at commit or abort time when they are no longer needed.
• Integration with the Transaction Manager:
In the first phase, you implemented the transaction manager to handle each of the discussed operations and
write them out to a simple log file. In this phase, you are also required to integrate the transaction manager
with the data manager by passing operations to the data manager to implement them.
Similar to Phase 1, the Data Manager should keep a log file in which it can record all its actions. You
need to record all Data Manager actions such as performed operations, creating new pages, and swapping
existing pages in and out of main memory. The expected logging format is as follows
R Starbucks 13
SWAP OUT T-Starbucks P-6 B-11
SWAP OUT T-Starbucks P-2 B-11
SWAP IN T-Starbucks P-8 B-13
Read: 13, Mocha, 4, Yemen
I RedHawk (18, Americano, 10, Italy)
CREATE T-RedHawk P-15 B-2
SWAP IN T-RedHawk P-15 B-2
Inserted: 18, Americano, 10, Italy
M Starbucks France
MRead: Frappe
MRead: Iced Coffee
G Starbucks 4
GCount: 89
Note that T-Starbucks P-6 B-11 means Table Starbucks, Page 6, Hash-Bucket 11.
Logging will significantly help you while debugging your project. You should develop it incrementally
as you add features to your project