Tuesday, November 29, 2005

Heart For The GNU

This short poem tries to encapsulate some Linux history.

Linus Torvalds wanted to modify Andrew Tenenbaum's MINIX operating system initially. But Tanenbaum did not give him permission, so he set about writing his own operating system. This became the kernel for the GNU project which aimed at making a free operating system.

--------------------------------
A HEART FOR THE GNU
--------------------------------
Tanenbaum forbade,
Linus obeyed.
Fiat Linux!
Freedom redux.

- Thomas Jay Cubb


Notes
--------
1. "Fiat Lux" is a Latin phrase meaning "let there be light" from the Bible's Genesis book which relates God's creation of the world. Linus, the Creator! So, Fiat Linux.
2. redux = Brought back. The GNU OS project was losing steam and face, for the lack of a suitable kernel for its operating system. The Linux kernel, so to say, resurrected it. (GNU/Linux)


You may also be interested in reading Punnix.

Monday, November 28, 2005

On The Hacked Track

1. Viewing the dependencies of a binary
ldd gives you the dynamic libraries that a file depends on.
$ ldd binaryfile

2. Viewing the symbols in a file
nm gives you the list of symbols in the file. Functions that are exported by the library will be present as symbols. Many files are stripped of symbols before they are released, though.
$ nm binaryfile/libraryfile

3. Interposing
On many *NIX variants, you can set a dynamic library of your choice to be searched first when dynamic library calls are made.
Set the value of LD_PRELOAD to the full path of the library file you want to be executed in preference to other libraries.

Take an executable and determine its dependencies. View the symbols in those library-files. Make your own library-file with implementations for the symbols that seem to be of relevance to you. Set the LD_PRELOAD variable to the name of your library file. Run the executable.


Tuesday, November 22, 2005

Data Polymorphism

INTRODUCTION
----------------------------

Languages that support polymorphism seem to require functions to model polymorphic requirements. This suffices in most cases, but there are cases in which it is only the data that is polymorphic - what takes a different form in a different context.

While modeling data polymorphism is possible through the use of templates (will be discussed in a later post), this article makes a case for the language to allow interfaces to have abstract data members, whose types are unknown in the base class and will be meaningful only in the derived context.

Interfaces (and Abstract Base Classes) are an expressive notation for mandating the implementation of required functions in derived classes. Why not utilize the same notation to accommodate data as well?

In a nutshell, the following is possible :-

class Abstract
{
virtual void pureVirtualFunction()=0; //Unknown behaviour, specifiable
void concreteFunction()
{
pureVirtualFunction(); // We can call the undefined function
}
};

But the following is not :-

class Abstract
{
virtual pureVirtualData; //Unknown data, Not allowed

void concreteFunction
{
print pureVirtualData.Count; // We know how to use the data though
}

};

What does this imply? Could there be cases where such a feature would help?

Some months back, I encountered a requirement which, I thought, really called out for more direct support for data polymorphism in the language. I explain the problem in a generalized way, leaving out some domain-specific details.

Please do let me know if more details are needed to understand the problem and why the limitations mentioned are significant.

THE REQUIREMENT
--------------------------------

An algorithm needs to be implemented on the transmissions of an existing client-server system to improve its performance. Broadly, the algorithm transparently intercepts the packets, applies some transforms based on the packet and relays them.

In order to implement the algorithm, components should be deployed at both the server and the client. The algorithm depends on the functionality in a packet and a factor which is determined by whether the component is at the server or at the client.

DESIGN WITH BEHAVIOURAL POLYMORPHISM
----------------------------------------------------------------------

Packet is a concrete class which provides a buffer-storage area and some domain-specific functionality.

class Packet
{
Byte Buffer[];

virtual double toDouble();
virtual int toInteger();


void applyXXXToPacket()
...//Other domain-specific functions
...
};
The algorithm depends on Packet functions to transform the packets with a factor which is based on Packet::toDouble() and on Packet::toInteger(). Both these conversions are to be done slightly differently from the implementations in the Packet class and, also, differently at client- and server- sides.

We have the inheritance,
ServerPacket: public Packet, and we reimplement toInteger and toDouble for server
ClientPacket: public Packet, and we reimplement toInteger and toDouble for client

Algorithm components only need to apply some transforms on Packets. We generalize to the abstraction,
class AlgorithmComponent
{
Packet& p; //Reference type required to achieve behavioural polymorphism

void ProcessPacketXXX();
... //Functions which use Packet::functions
... //
};

The component implementing the algorithm only differs slightly on the server and client sides. But they do differ and so we have the specializations,

ServerComponent: public AlgorithmComponent
ClientComponent: public AlgorithmComponent

It worked! But...

PERFORMANCE ISSUES...
--------------------------------------

It turned out that both the required conversion functions were very expensive, leading to erratic performance. All that was really required was a one-time computation of the conversion, which could be done in the constructors of the specialized Packets. Saving these in a member variable would solve the performance issues.The ClientComponent and the ServerComponent could then use this saved value directly.

But it is not possible to for the generalized AlgorithmComponent to access a data member of the generalized Packet class through a Packet reference and get the implementations of the derived classes.

Consider the code snippet below.

class Base{
public:
int i;
Base(){i=2;} //In B, i is 2
};
class Derived:  public Base
{
public:
int i;
Derived(){i=5;} //In D, i is 5

};

void main()
{
Derived d;
Base& b=d;
cout << class="cpp-comment">//Prints 2, not 5. No polymorphism for data access. :-(
}

This means that the abstraction of AlgorithmComponent would collapse - breaking the class hierarchy.

...AND A TRIVIAL(?) VIOLATION OF A DESIGN CONSTRAINT
------------------------------------------------------------------------

An AlgorithmComponent necessarily has to be either server-side or client-side. It was just an abstraction of design which accurately summarized the operation of the algorithm. AlgorithmComponent should not be instantiable, it is an abstract base class by nature.

Implementing this particular design constraint is possible if we make the functions which depend on Packet pure virtual. Technically, for AlgorithmComponent to be abstract, we need to make only one of the ProcessPacketXXX()s pure virtual. But that would not accurately model the dependency on Packet for the other functions.

Functions that depend on Packet can be completely specified. We could avoid code duplication if we code it in the base class itself. The abstractness of AlgorithmComponent is because of its dependency on Packet. So, Packet is what should be pure virtual really.

What we should have had was an AlgorithmComponent with completely specified functions but with an unspecified Packet that had to be compulsorily supplied by the derived class.

CONCLUSION
----------------------

What if, in the discussed example, ServerComponent depended on a double/class Foo and ClientComponent depended on an int/class Bar? Shouldn't abstract base classes be allowed to have undefined data members? Or equivalently, why not allow data members in interfaces?

Workarounds to these do exist -
1) with templates (the discussion stopped just short of it, don't you think?) - but it would be a compile-time solution. Will be discussed in a later post.
2) with composition instead of inheritance (which is how we chose to implement it finally) -but lacks the simplicity and elegance of the explained design - with code duplication and what not.

Would it not be simpler and cleaner to express such a design if the language just allowed the data also to be truly polymorphic?

- Thomas Jay Cubb

Monday, November 21, 2005

Everything Changes

-------------------------------
EVERYTHING CHANGES
-------------------------------

Systems are built to satisfy the requirements of users. Successful systems -
a) meet those requirements and,
b) will continue to be used by the users, who would prefer it to any other.

It is possible for a system to succeed even if it is not flexibly designed. My system does the job and they like it! These are the requirements, my system is designed for that! Who cares that it is not flexible! I have done the requirement-collection to perfection. There's nothing more they could possibly want!

Now who wouldn't want his system to be successful? In fact, most people would want their creation to be indispensable! But with great power, comes great responsibility. Good design is all the more essential if your system has even the remotest chance of being successful - if you want to be a star, not just a shooting star.

Successful systems will be used by the users and, over a long period of time. Any successful system will, in the course of its lifetime, be required to accommodate changes. No matter how well you did the requirement-collection, over time, the set of users will change and by extension, the requirements too! All of a sudden your "successful" systems becomes a failure. Users choose to switch from the system

We must be able to adapt and quickly at that. Accommodating new requirements or changing the behaviour should not become an unnecessarily complex or time-consuming chore - requiring the complete revalidation of the system. Resilience to change is one of the hallmarks of a good design.

Changes in a system may be required because -
a) The users of the system have new requirements. Wouldn't it be cool if this thing could do that as well?... I want to use my credit-card to pay..... or ... I want this bus to fly!
b) Better, more suitable ways of doing things may emerge only after the system is deployed. Would it not be better and simpler to do it like this?... I want the system to start when I press the end button...
c) There were slight errors in our understanding of the requirement. Largely unavoidable. Because only hindsight is 20/20. Oh, was that the order in which you wanted things to happen?... The person should be given the receipt before he pays....

While bus-to-airbus scenarios are well-nigh impossible to design for, with a little bit of thoughtful design, the other two can be designed for more easily. What's more, often you will find that many of the seemingly bus-to-airbus requirement-changes could have been handled in less cumbersome ways, if only you had thought about possible changes during design.

-Thomas Jay Cubb

Wednesday, November 16, 2005

SunOS vs Solaris

What's the difference between SunOS and Solaris?

SunOS refers to the actual operating system that underlies the Solaris OE (operating environment). SunOS is often used to refer to the old SunOS 4.x, a BSD-like operating system with some SVR4 features and OpenWindows.

Solaris is typically used to refer to SunOS 5.x releases of the operating system and environment from Sun Microsystems. The version of Solaris can be derived from the SunOS 5.x designation by dropping the leading 5. e.g., Solaris 7 is SunOS 5.7. However, some earlier verions of Solaris were numbered as 2.5.1 with a SunOS release of 5.5.1.

Monday, November 14, 2005

Peter Deutsch's Eight Fallacies

Picked this up from digg.com

-----------------------------------------------------
The eight fallacies of distributed computing
By Peter Deutsch
-----------------------------------------------------

Essentially everyone, when they first build a distributed application, makes this eight assumptions.
All prove to be false in the long run and all cause big trouble and painful learning experiences.

1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. Topology doesn't change
6. There is one administrator
7. Transport cost is zero
8. The network is homogeneous

Thursday, November 10, 2005

Finding Bliss - UNIX find Introduction

----------------------
FIND-ING BLISS
----------------------
The UNIX find command is, in my book, one of the most powerful time-saving utilities ever. I would even go so far as to say that you have not appreciated the power of UNIX utilities fully unless you have used, and can habitually use, the find command.

What's more, it it offers great flexibility and power. Not only does it allow you to find stuff, but it also allows you to invoke a command you specify, on each item it finds.

STEP-BY-STEP EXAMPLES
-------------------------------
1. Finding stuff
$ find /
gives you a list of everything that's under the / directory. And that, as far as your system is concerned is everything under the sun. You can replace / with any other directory of your choice.

2. Finding a file based on name
$ find / -name foo
finds all items (files and directories) named foo, on your system. You can specify wildcards if you wish but remember to escape shell expansion for the wildcard by using quotes.

3. Getting a list of all directories on your system
$ find / -type d

4. Executing a command on what find found
$ find / -exec ls -l '{}' \;
generates a detailed ls listing of each item found in the system

This needs a little bit of explanation -
-exec specifies that what follows is a command
\; denotes the end of the command
'{}' is where the current item gets substituted in the command for each item.
You can give any command you want instead of ls, of course.

NOTES
---------
1. -name, -type and -exec are like extensions to the basic find utility, which builds a list of everything under a directory. It spiders the subdirectory structure and builds a list.

2. Think of -name and -type as filter conditions for cutting the list down to just what we need. For example, -type f means files and -type d means directories.

3. -exec specifies the action to invoke on each item - whatever is between exec and \; gets taken by find as the command.

4. There are many more modifiers. Please refer to the man pages on your system. These only serve as quick but useful intros.

PUTTING IT ALL TOGETHER
----------------------------------
Let's say you want to view and edit all readme files under a directory, say, /home using an editor, say vi. Think about it.

You can accomplish this in one line with the find command.
$ find /home -type f -name readme.txt -exec vi '{}' \;

So much less troublesome, less strenuous and less time-consuming than getting a list of the files and clicking on them one by one. All in a line's work!

ON THE DESIGN
(external view)
---------------------
find allows you to iterate over a filtered collection invoking a specified action. Call it... I'm-making-it-up....the Moulding Iterator design pattern if you will. I have found that this is an extremely common problem, it deserves a name, so forgive me!

find epitomizes the engine-addon idiom. At its heart, it is a list generator which has the capability of spidering directory structures. The list generator has unobtrusive facilities for filter conditions, to be used if needed and specifying the action to , if needed. Unobtrusive: if you want the feature, you can use it; if you don't care about it, no need to bother even knowing about it.

Flexibility=Good design. find has its limits, yes, but considering the fact that it was designed when it was, it's very, very good indeed. find out for yourself!

- Thomas Jay Cubb

Sunday, November 06, 2005

Printing Call-Stack Tracebacks

pstack on Solaris gives you a stack dump for all active threads. But this might be too heavy for some applications.

libunwind is a Linux library that provides stack unwinding routines.

The link below gives a method of doing it with functions on Solaris.
techno's scratchpad: C/C : Printing Stack Trace with printstack() on Solaris
But this works only for Solaris 9 onwards...

Check this for some Linux info
http://www.codecomments.com/archive286-2004-7-236422.html

For Windows information
http://www.codeproject.com/threads/StackWalker.asp
Homepagehttp://blog.kalmbachnet.de