Quote of the Day

more Quotes

Categories

Buy me a coffee

Notes on Barbara Liskov paper on data abstraction and hierarchy

Published June 26, 2021 in Architecture - 0 Comments

In October 1987, Barbara Liskov published a research paper in which she discussed about different but related concepts: data abstraction, inheritance, encapsulation, implementation hierarch, type hierarchy and polymorphism. I’ve found the paper to be insightful and informative. In this post, I simply give a recap of what I have learned and share my thoughts from reading the paper.

Data abstraction, inheritance and encapsulation are fundamental concepts in object oriented programming. Data abstraction separates the implementation from the specification of a class, and having the callers of the class to rely on the specification. Encapsulation hides away the implementation details of a class such that no outside classes can depend on those details. Together, data abstraction and encapsulation help to reduce coupling in a program and allow classes to change independently from one another.

One way to design the implementation of a class is using inheritance which is a way of relating one class to another, with a superclass consisting of properties and behaviors from which subclasses “inherit”. Inheritance can be useful in sharing codes or extend the behaviors of a class. However, inheritance allows us to violate encapsulation. For example, when looking at a subclass, it is necessary to also look at the superclass to have a complete understanding of the code. If the superclass inherits from another class, we may need to also look at that class as well and so on. In addition, you may have variables and methods in a superclass that are inaccessible to outside classes but are accessible to subclasses. However, languages like C# and Java have built in mechanisms to help with encapsulation between related classes via inheritance. For example, in both C# and Java, you can declare a variable or a method of a class as private to make it accessible only from within the class, thus hiding the info from not just outsiders but also subclasses.

An alternative to inheritance to design the implementation of a class is using a class as a rep of another class. I suspect this is what we know today as composition. For example, you can have a class A that wraps around another class B to extend it. A call to a method in A may simply delegate to a method in B. In addition, A can extend B by doing some operations of its own before or after delegating to B. A can also define additional behaviors that B does not have. Using composition makes it harder to violate encapsulation. For instance, we can treat A as just another caller of B, and have A relies on the specification of B instead of the implementation.

When we base one class on another either via inheritance or composition, we end up with an Implementation hierarchy. When used properly, implementation hierarchy can be helpful to reuse codes and group related classes together. However, an implementation hierarchy does not allow us to do anything we cannot do with data abstraction. Implementation hierarchy by itself does not enhance data abstraction which is about separating the implementation from the specification.

The author discusses about another concept relating to implementation hierarchy, which is type hierarchy. A type hierarchy consists of subtypes and supertypes. The author uses “subtype” and “supertype” to semantically distinguish from “subclass” and “superclass” which are specific concepts in programming languages. However, type hierarchy has the following substitution property between a subtype and a supertype:

If for each object o1 of type S there is an object o2 of type T such that for all programs P defined in terms of T, the behavior of P is unchanged when o1 is substituted for oz, then S is a subtype of T.

Data abstraction and hierarchy

This property is later known as the Liskov Substitution Principle. This principle is also part of the SOLID principles.

In the paper, the author gives examples of types and subtypes. For instance, arrays are subtypes of collections. An array provides all the behaviors of a collection. In C#, you can initialize a collection with an array, and everything should work fine. Similarly, a HashMap is a subtype of a Map. A Golden Retriever is a type of Dog etc …

Some examples of types that are not subtypes of one another include lists vs sets. A list is not a subtype of a set and vice versa because their behaviors are different from each other. For instance, if an element already exists in a list, adding the same element again causes the size to increase by one. However, adding the same element again to a set is a no-op operation because the set cannot have duplicate elements. A more subtle example could be a superclass and a subclass which does not implement all the behaviors of the superclass. If you cannot substitute the superclass with the subclass without breaking the behavior of your program, then the classes are not type and subtype of each other. Similarly, if a concrete class implements an interface, but does not provide all the functionalities such that you cannot replace the interface with the concrete class, then you don’t have a type and a subtype.

Type hierarchy is helpful in grouping related types together, incremental design, and organizing types in a library. Obviously, a supertype and its subtypes are related to one another. In addition, using type hierarchy can help with incremental design. In an early design stage, a programmer may come up with three types: P, Q, and T. Communication between P and Q is via T. The programmer initially defines some operations for T, but may not know all the features he needs. In studying Q, he may come up with additional behaviors for T, and create S, a subtype of T. Type hierarchy may also help with organizing types in a library. The author suggests that similar types in a library usually are placed together or close together such that when looking for a type, a user can see all the related types.

Polymorphism occurs when a procedure or data structure can work on different types. When you use inheritance and have implementation or type hierarchy, you likely have polymorphism. For example, a function that accepts a superclass as a parameter may not care whether you pass it a subclass of the superclass, as long as the subclass inherits the properties and/or the behaviors of the superclass. Similarly, in C# or Java, you may define an interface and provide multiple implementations for the interface. Classes that depend on the interface can work with any of the concrete classes that implement the interface.

Overall, the paper gave me an insight of the Liskov Substitution Principle, and how the core concepts in object oriented programming relate to one another. The full PDF version of the paper can be found here.

References

Data abstraction and hierarchy

Clean Architecture: A Craftsman’s Guide to Software Structure and Design (Robert C. Martin Series)

Liskov Substitution Principle

No comments yet