Tuesday, November 22, 2005

Data Polymorphism

INTRODUCTION
----------------------------

Languages that support polymorphism seem to require functions to model polymorphic requirements. This suffices in most cases, but there are cases in which it is only the data that is polymorphic - what takes a different form in a different context.

While modeling data polymorphism is possible through the use of templates (will be discussed in a later post), this article makes a case for the language to allow interfaces to have abstract data members, whose types are unknown in the base class and will be meaningful only in the derived context.

Interfaces (and Abstract Base Classes) are an expressive notation for mandating the implementation of required functions in derived classes. Why not utilize the same notation to accommodate data as well?

In a nutshell, the following is possible :-

class Abstract
{
virtual void pureVirtualFunction()=0; //Unknown behaviour, specifiable
void concreteFunction()
{
pureVirtualFunction(); // We can call the undefined function
}
};

But the following is not :-

class Abstract
{
virtual pureVirtualData; //Unknown data, Not allowed

void concreteFunction
{
print pureVirtualData.Count; // We know how to use the data though
}

};

What does this imply? Could there be cases where such a feature would help?

Some months back, I encountered a requirement which, I thought, really called out for more direct support for data polymorphism in the language. I explain the problem in a generalized way, leaving out some domain-specific details.

Please do let me know if more details are needed to understand the problem and why the limitations mentioned are significant.

THE REQUIREMENT
--------------------------------

An algorithm needs to be implemented on the transmissions of an existing client-server system to improve its performance. Broadly, the algorithm transparently intercepts the packets, applies some transforms based on the packet and relays them.

In order to implement the algorithm, components should be deployed at both the server and the client. The algorithm depends on the functionality in a packet and a factor which is determined by whether the component is at the server or at the client.

DESIGN WITH BEHAVIOURAL POLYMORPHISM
----------------------------------------------------------------------

Packet is a concrete class which provides a buffer-storage area and some domain-specific functionality.

class Packet
{
Byte Buffer[];

virtual double toDouble();
virtual int toInteger();


void applyXXXToPacket()
...//Other domain-specific functions
...
};
The algorithm depends on Packet functions to transform the packets with a factor which is based on Packet::toDouble() and on Packet::toInteger(). Both these conversions are to be done slightly differently from the implementations in the Packet class and, also, differently at client- and server- sides.

We have the inheritance,
ServerPacket: public Packet, and we reimplement toInteger and toDouble for server
ClientPacket: public Packet, and we reimplement toInteger and toDouble for client

Algorithm components only need to apply some transforms on Packets. We generalize to the abstraction,
class AlgorithmComponent
{
Packet& p; //Reference type required to achieve behavioural polymorphism

void ProcessPacketXXX();
... //Functions which use Packet::functions
... //
};

The component implementing the algorithm only differs slightly on the server and client sides. But they do differ and so we have the specializations,

ServerComponent: public AlgorithmComponent
ClientComponent: public AlgorithmComponent

It worked! But...

PERFORMANCE ISSUES...
--------------------------------------

It turned out that both the required conversion functions were very expensive, leading to erratic performance. All that was really required was a one-time computation of the conversion, which could be done in the constructors of the specialized Packets. Saving these in a member variable would solve the performance issues.The ClientComponent and the ServerComponent could then use this saved value directly.

But it is not possible to for the generalized AlgorithmComponent to access a data member of the generalized Packet class through a Packet reference and get the implementations of the derived classes.

Consider the code snippet below.

class Base{
public:
int i;
Base(){i=2;} //In B, i is 2
};
class Derived:  public Base
{
public:
int i;
Derived(){i=5;} //In D, i is 5

};

void main()
{
Derived d;
Base& b=d;
cout << class="cpp-comment">//Prints 2, not 5. No polymorphism for data access. :-(
}

This means that the abstraction of AlgorithmComponent would collapse - breaking the class hierarchy.

...AND A TRIVIAL(?) VIOLATION OF A DESIGN CONSTRAINT
------------------------------------------------------------------------

An AlgorithmComponent necessarily has to be either server-side or client-side. It was just an abstraction of design which accurately summarized the operation of the algorithm. AlgorithmComponent should not be instantiable, it is an abstract base class by nature.

Implementing this particular design constraint is possible if we make the functions which depend on Packet pure virtual. Technically, for AlgorithmComponent to be abstract, we need to make only one of the ProcessPacketXXX()s pure virtual. But that would not accurately model the dependency on Packet for the other functions.

Functions that depend on Packet can be completely specified. We could avoid code duplication if we code it in the base class itself. The abstractness of AlgorithmComponent is because of its dependency on Packet. So, Packet is what should be pure virtual really.

What we should have had was an AlgorithmComponent with completely specified functions but with an unspecified Packet that had to be compulsorily supplied by the derived class.

CONCLUSION
----------------------

What if, in the discussed example, ServerComponent depended on a double/class Foo and ClientComponent depended on an int/class Bar? Shouldn't abstract base classes be allowed to have undefined data members? Or equivalently, why not allow data members in interfaces?

Workarounds to these do exist -
1) with templates (the discussion stopped just short of it, don't you think?) - but it would be a compile-time solution. Will be discussed in a later post.
2) with composition instead of inheritance (which is how we chose to implement it finally) -but lacks the simplicity and elegance of the explained design - with code duplication and what not.

Would it not be simpler and cleaner to express such a design if the language just allowed the data also to be truly polymorphic?

- Thomas Jay Cubb

1 comment:

Thomas Jay Cubb said...

Version 1.0

Some time back, I encountered a situation which, I thought, required "data" polymorphism support in the language. Abstract base classes are not allowed to have undefined data members.

REQUIREMENT
-------------------
An algorithm needs to be implemented on the transmissions of an existing client-server system to improve its performance. Broadly, the algorithm transparently intercepts the packets, applies some transforms based on the packet-parameters and relays them.

In order to implement the algorithm, components should be deployed at both the server and the client. The algorithm depends on the functionality in a packet and a factor which is determined by whether the component is at the server or at the client.

Below is a possible design for the requirement.

DESIGN WITH BEHAVIOURAL POLYMORPHISM
------------------------------------------
Packet is a concrete class which provides a buffer-storage area, some domain-specific functionality and two basic conversion functions,say, Packet::toString() and Packet::toInteger().

class Packet
{
Byte Buffer[];

virtual string toString();
virtual int toInteger();

void applyXXXToPacket()
... //Other functions
...
};

The algorithm only requires that toString and toInteger be computed differently at the server and at the client.
Inheritance
ServerPacket: public Packet
ClientPacket: public Packet
and we implement the required functions.

Algorithm components only need to apply some transforms on Packets. We generalize,

class AlgorithmComponent
{
Packet& p; //Reference type required to achieve behavioural polymorphism

void ProcessPacketXXX();
... //There are many such functions which depend on Packet's conversion functions
... //

};

The component implementing the algorithm only differs slightly on the server and client sides.
Inheritance
ServerComponent: public AlgorithmComponent
ClientComponent: public AlgorithmComponent

Both specializations need some extra maintenance functionality as well, different at client- and server- sides.

PERFORMANCE ISSUES...
----------------------
It turned out that both the required conversion functions were very expensive leading to erratic performance. All that was really required was a one-time computation of the conversion, which could be done in the contructors of the specialized Packets. Saving these in variables would solve the performance issues.

But this could not be done without breaking the class hierarchy and the resulting code. It is not possible to specify a data member in an interface.

...AND A TRIVIAL VIOLATION OF DESIGN
-------------------------------------------------
Now, an AlgorithmComponent has to be either server-side or client-side. So AlgorithmComponent should not be instantiable, it is an abstract base class by nature.

Implementing this design constraint is possible only if we make (at least one of) ProcessPacketXXX() pure virtual. Functions that depend on Packet can be completely specified. So we should have been able to implement those completely in the class AlgorithmComponent itself, avoiding code duplication.

The pure-abstractness of AlgorithmComponent is only because of its dependency on Packet. So, Packet is what should be pure virtual really.

(Making the constructor protected to prevent instantiation would be an ugly hack and quite unnecessary from a design perspective.)

IN A NUTSHELL
------------------
The following is possible :-

class Abstract
{
virtual void pureVirtualFunction()=0;
void concreteFunction() {
pureVirtualFunction(); // We can call the undefined function
}
};

But the following is not :-

class Abstract
{
virtual int Count;

void displayCount{
cout Count;
}
};

Shouldn't abstract base classes be allowed to have undefined data members?
Or equivalently, why not allow data members in interfaces?

Workarounds to these do exist - with templates (aaaaarrrrrghhh) or with composition instead of inheritance (which is how we implemented it finally). But, would it not be simpler and cleaner this way - if we just allowed data to be polymorphic?

- Thomas Jay Cubb