Database is full of datas and records. What we see in rows and columns is quite different when it reaches the memory. What we see is the actual data. But when they are stored in the memory like disks or tapes, they are stored in the form of bits. But any users will not understand these bits. He needs to see the actual data to understand. But all the details about the data stored in the memory are not necessary for the users. He needs only little information that he is interested or wants to work with. Masking the unwanted data from the users happens at different levels in the database. This masking of data is called data abstraction. There are 4 levels of data abstraction.
- External level - This is the highest level in data abstraction. At this level users see the data in the form of rows and columns. This level illustrates the users how the data is stored in terms of tables and relations. Users view full or partial data based on the business requirement. The users will have different views here, based on their levels of access rights. For example, student will not have access to see Lecturers salary details, one employee will not have access to see other employees details, unless he is a manager. At this level, one can access the data from database and perform some calculations based on the data. For example calculate the tax from the salary of employee, calculate CGPA of a Student, Calculate age of a person from his Date of Birth etc. These users can be real users or any programs.
- Logical/ Conceptual level - This is the next level of abstraction. It describes the actual data stored in the database in the form of tables and relates them by means of mapping. This level will not have any information on what a user views at external level. This level will have all the data in the database.
- Internal level - This is one of the intermediary levels. In most of the cases this level is not mentioned and usually it is said that we have 3 levels of data abstraction. This level depends on the DBMS software. This level is how the database is seen from DBMS. We can even combine logical level and this level.
- Physical level - This is the lowest level in data abstraction. This level describes how the data is actually stored in the physical memory like magnetic tapes, hard disks etc. In this level the file organization methods like hashing, sequential, B+ tree comes into picture. At this level, developer would know the requirement, size and accessing frequency of the records clearly. So designing this level will not be much complex for him.
Any changes/ computations done at this level will not affect other levels of data. That means, if we retrieve the few columns of the STUDENT table, it will not change the whole table, or if we calculate the CGPA of a Student, it will not change/update the table. This level of data is based on the below levels, but it will not alter the data at below levels.
Any changes done in this level will not affect the external or physical levels of data. That is any changes to the table structure or the relation will not modify the data that the user is viewing at the external view or the storage at the physical level. For example, suppose we have added a new column ‘skills’ which will not modify the external view data on which the user was viewing Ages of the students. Similarly, it will have space allocated for ‘Skills’ in the physical memory, but it will not modify the space or address of Date of Birth (using which Age will be derived) in the memory. Hence external and physical independence is achieved.