Revised 2006-09-03 DMB

Return to the Index

A Conceptual Introduction to C++

This file will cover in more detail some of the concepts needed to understand C++. The goal here is not to make you be able to sit down and write your own program from scratch.


Objects and Classes

An object in C++ is a single, conceptual unit that contains data (information about the state of that object) and methods (functions associated with that class of objects) by which the object is modified or can interact with other objects. The data in an object can be either normal variables (e.g. characters, floating point numbers, or integers) or previously-defined objects. A category of objects is a class; an object is a single instance of a class.

For example, in Avida one of the most important classes is called cOrganism -- it is the class that all organism objects belong to. Here is an abbreviated version of the cOrganism class declaration (explained further below), color coded to aide understanding.

Comments are in BROWN
Names of methods are in GREEN
Names of classes are in RED
Names of objects are in BLUE
class cOrganism
private:                     // Data in this class cannot be directly accessed from outside.
  const cGenome m_genome;    // The initial genome that this organism was born with.
  cPhenotype m_phenotype;    // Maintains the status of this organism's phenotypic traits.

  cOrgInterface* m_interface;  // Interface back to the population.

  cGenotype* m_genotype;     // A pointer to the genotype that this organism belongs to.
  cHardwareBase* m_hardware; // The virtual machinery that this organism's genome is run on.

public:                     // The methods are accessible to other classes.
  cOrganism(cWorld* world, cAvidaContext& ctx, const cGenome& in_genome);

  // This batch of methods involve interaction with the population to resolve.
  cOrganism* GetNeighbor() { return m_interface->GetNeighbor(); }
  int GetNeighborhoodSize() { return m_interface->GetNumNeighbors(); }
  void Rotate(int direction) { m_interface->Rotate(direction); }
  int GetInput() { return m_interface->GetInput(); }
  void Die() { m_interface->Die(); }

  // Accessors -- these are used to gain access to private data.
  const cGenome& GetGenome() const { return m_genome; }
  cHardwareBase& GetHardware() { return *m_hardware; }
  cPhenotype& GetPhenotype() { return m_phenotype; }
  cGenotype* GetGenotype() { return m_genotype; }


Style and Syntax Guide

Don't worry too much about how the syntax works. The code presented above is a definition of a class in C++. It is broken into two parts; one labeled private: for those portions of the definition that can only be interacted with from within the class, and another labeled public: which defines the interface to the outside. In this case, we've made all of the variables private and the methods public.

A variable is defined by a description of the type of variable (such a cPhenotype) and then the name of this particular instance of the variable. In this case, since organisms only have one phenotype, we called it merely m_phenotype. Not that because this variable is a member of cOrganism instances, it is prefixed with 'm_'.

Methods are slightly more complex. The declaration of a method starts with the type of data the method returns (such as int for integer), or else lists void if there is no return value. Then the method name is given, followed by a set of parenthesis (which are what indicates to C++ that you are declaring a method). Inside of those parentesis, can be arguments, which are variables that must be given to the method in order for it to operate correctly. The declaration can stop at this point (ending in a semi-colon) if the method body is defined elsewhere. The body of the method is the sourcecode that details how the method operates, and can be included immediately after the declaration (within braces) or be placed elsewhere in the code. Typically short method bodies are included in the class definition, while longer ones are placed outside of it. A method is performed on an object, by listing the object name, followed by a dot ('.'), and then the name of the method to be called with all necessary arguments included. This is explained further below.

The C++ language will accept variable names, class names, and method names of any alpha-numeric sequence as long as all names begin with a letter. The only other character allowed in a name is the underscore ('_'). To make reading code easier, we have adopted certain conventions.

Variable names (including object names) are always all in lowercase letters, with individual words separated by underscores. Variables are either user-defined classes, numbers (integers, boolean values, floating point numbers, etc.) or characters (single symbols)
Method names always have the first letter of each word capitalized, with the remainder of the word in lowercase. The one exception to this is Constructors and Destructors, which must have the same name as the class (see below).
Classes use a similar format to methods, but always begin with a single, lowercase 'c'. Some other specialized types also used this format, but with a different initial letter. For example, an initial 't' indicates a template, which is a special type of class.
Any constant values (that is, numerical values that will never change during the course of the run) are given in all upper-case letters, with individual words separated by underscores.

Different software projects will each use their own style conventions; these are the ones you'll end up working with in Avida. Some exceptions do exist. For example, the C++ language itself does not follow many style rules; built-in C++ names are all lowercase letters, regardless of what they represent. For more details, including spacing and other code formatting standards you must follow in Avida, see the Coding Standards.


Description of Data Elements

The section labeled private above lists those data that are unique to each organism; these are objects and pointers that exist inside of an organism object. First, m_genome keeps the initial state of the organism. Since we never want this genome to change over the organism's life, we place a const directive in front of it. The const command exists so that C++ knows to warn the programmer if they accidentally try to change an object (or variable) that is not supposed to be altered.

The internal m_phenotype object is used to record the behaviors and abilities that the organism demonstrates during its life. This class has variables to track everything from the tasks performed to the gestation time of the organism and the number of offspring it has ever produced. The m_interface allows an organism to communicate with the environment (either the cPopulation or the cTestCPU) that it is part of. This is used, for example, when an organism is ready to finish replicating and needs its offspring to be placed into the population. If an organism is being run on a test CPU rather than in a proper population object, then this interface will cause the statistics about offspring to be recorded for later use instead of activating it.

Next, we have two pointers. A pointer is a value that represents ("points to") a location in the physical memory of the computer. A pointer can be identified by the asterisk ('*') that follows the type name. The code "cGenotype* genotype" indicates that the variable genotype points to a location in memory where an object of class cGenotype is stored. In this case, all of the organisms that are of a single genotype all point to the same cGenotype object so that the genotypic information is accessible to all organisms that may need to make use of it.

The final data element is m_hardware, a pointer to an object of type cHardwareBase. This variable is a pointer for a different reason than the genotype. Where a single genotype is shared by many different organisms, each organism does possess its own hardware. However, Avida supports more than one type of hardware, where any of them can be at the other end of that hardware pointer. The cHardwareBase class is used as an interface to the actual hardware that is used. This is explained in more detail later in the section on inherited classes. For the moment, the key idea is that a pointer can sometimes point to a general type of object, not just those of a very specific class.


Description of Methods

Class descriptions (with limited exceptions) must contain two specific methods called the constructor and the destructor. The constructor always has the same name as the class (it's called cOrganism(...) in this case), and is executed in order to create a new object of that class. The arguments for the constructor must include all of the information required to build on object of the desired class. For an organism, we need the world object within which the organism resides, the current execution context, and perhaps most importantly the genome of the organism. The method is not defined here, only declared. A declared method must be defined elsewhere in the program. All methods must be, at least, declared in the class definition. Note that if there are many ways to create an object, multiple constructors are allowed as long as they take different inputs.

Whereas the constructor is called when an object is created, the destructor is called when the object is destroyed, whereupon it must do any cleanup, such as freeing allocated memory (see the section on memory management below). The name of a destructor is always the same as the class name, but with a tilde ('~') in front of it. Thus, the cOrganism's destructor is called ~cOrganism(). A destructor can never take any arguments, and there must be only one of them in a class definition.

The next group of five methods are all called when an organism needs to perform some behavior, which in all of these cases involves it interacting with the population. For example, if you need to know at whom an organism is facing, you can call the method GetNeighbor() on it, and a pointer to the neighbor currently faced will be returned. Likewise, if you need to kill an organism, you can call the method Die() on it, and it will be terminated. Since each of these require interaction on the population level, the population itself takes care of the bulk of the functionality.

The 'Accessors' are methods that provide access to otherwise private data. For example, the method GetGenome() will literally pass the genome of the organism to the object that calls it. In particular, the hardware object associated with an organism will often call GetPhenotype() in order to get the current state of the organism's phenotype and update it with something new the organism has done. Several things to take note of here. In the first three accessors, the name of the class being returned is followed by an ampersand ('&'). This means that the actual object is being passed back, and not just a copy of all the values contained in it. See the next section on pointers, references, and values for more information about this. Also, in the very first accessor, the keyword const is used twice. The first time is to say that the object being passed out of the method is constant (that is, the programmer should be warned if somewhere else in the code it is being changed) and the second time is to say that the actions of this method will never change anything about the object they are being run on (that is, the object is being left constant even when the method is run.) The net effect of this is that an object marked const can only have const methods run on it. The compiler will assume that a non-const method being run will make a change to the object, and is therefore an error.

This section has contained information about a particular C++ class found in Avida. The next sections will more generally explain some of the principles of the language. If you haven't already, now might be a good time to take a deep breath before you dive back in.


Pointers, References, and Values

The three ways of passing information around in a C++ program is through sending a pointer to the location of that information, sending a reference to it, or actually just sending the value of the information. For the moment, lets consider the return value of a method. Consider the three methods below:

  cGenome GetGenomeValue();
  cGenome* GetGenomePointer();
  cGenome& GetGenomeReference();

These three cases are all very different. In the first case (Pass-by-Value), the value of the genome in question is returned. That means that the genome being returned is analyzed, and the exact sequence of instruction in it are sent to the object calling this method. Once the requesting object gets this information, however, any changes made to it do not affect the original genome that was copied. The second case (Pass-by-Pointer),only a few bytes of information are returned that give the location in memory of this genome. The requesting object can then go and modify that memory if it chooses to, but it must first 'resolve' the pointer to do so. Finally, the last case (Pass-by-Reference) actually passes the whole object out. It is used in a very similar way to pass-by-value, but any changes made to the genome after it is passed out will affect the genome in the actual organism itself! Pass-by-reference does not add any new functionality over pass-by-pointer, but in practice it is often easier to use.


Memory Management

Memory management in C++ can be as simple or complicated as the programmer wants it to be. If you never explicitly allocate a chunk of memory, than you never need to worry about freeing it up when you're done using it. However, there are many occasions where a program can be made faster or more flexible by dynamically allocating objects. The command new is used to allocate memory; for example if you wanted to allocate memory for a new genome containing 100 instructions, you could type:

  cGenome* created_genome = new cGenome(100);

The variable created_genome is defined as a pointer to a memory location containing a genome. This is assigned the location of the newly allocated genome in memory, which is the return value of the new command. The cGenome constructor (called with new) takes as an argument a single number that is the sequence length of the genome.

Unfortunately, C++ won't know when we're done using this genome. If we need to create many different genomes and we only use each of them once, our memory can rapidly fill up. So, we need tell the memory management that we are finished with the current object. Thus, when we're done using it, we type:

  delete created_genome;

And the memory pointed to by the created_genome variable will be freed up.

An excellent example of when allocation and freeing of memory is employed in Avida is with the genotype. Every time a new genotype is created during evolution, Avida needs to allocate a new object of class cGenotype. During the course of a run, millions of genotypes may be created, so we need to be sure to free genotypes whenever they will no longer be needed in the run.


Inherited Classes

One of the beauties of C++ is that well written code is inherently very reusable. As part of that, there is the concept of the class inheritance. When a new class is built in C++, it is possible to build it off of an existing class, then referred to as a base class. The new derived class will have access to all of the methods in the base class, and can overload them; that is, it can change how any of those methods operate.

For example, in the Avida scheduler, we use a class called cSchedule to determine which organism's virtual CPU executes the next instruction. Well, this cSchedule object is not all that clever. In fact, all that it does is run through the list of organisms that need to go, lets them each execute a single instruction, and then does it all over again. But sometimes we need to make some organisms execute instructions at a faster rate than others. For that reason, there are several derived classes, including cIntegratedSchedule, which takes in a merit value for each organism, and assigns CPU cycles proportional to that merit. Since this new class uses cSchedule as its base class, it can be dynamically plugged in during run time, after looking at what the user chooses in the configuration file.

If a method is not implemented in a base class (left to be implemented in the derived classes) it is called an abstract method. If a base class does not implement any of its methods and is only used in order to specify what methods need to be included in the derived classes, it is referred to as an abstract base class (or sometimes an interface class or protocol) and is used simply as a method contract for derived classes. An example in Avida where this is used is the organism interface with the environemnt. The class cOrgInterface is an abstract base class, with cPopulationInterface and cTestCPUInterface as derived classes. This organization allows for organism objects to interact with both the population and the test environment, without having to write separate code for each.


Other C++ Resources:

Return to the Index