Revised 2006-09-05 DMB

Return to the Index

The Avida Configuration File

The Avida configuration file (avida.cfg) is the main configuration file for Avida. With this file, the user can setup all of the basic conditions for a run. Below are detailed descriptions for some of the settings in the configuration file, with particularly important settings highlighted in green. The non-colored entries will probably never need to change unless you are performing a very specialized experiment.


Architecture Variables

This section covers all of the basic variables that describe the Avida run. This is effectively a miscellaneous category for settings that don't fit anywhere below.

These settings allow the user to determine for how long the run should progress in generations and in updates, and determine if one or both criteria need to be met for the run to end. The run will also end if ever the entire population has died out. A setting of -1 for either ending condition will indicate no limit. End conditions can also be set in the events file, as is done by default, so you typically won't need to worry about this.
The settings determine the size of the Avida grid that the organisms populate. In mass action mode the shape of the grid is not relevant, only the number of organisms that are in it.
RANDOM_SEED The random number seed initializes the random number generator. You should alter only this seed if you want to perform a collection of replicate runs. Setting the random number seed to zero (or a negative number) will base the seed on the starting time of the run -- effectively a random random number seed. In practice, you want to always be able to re-do an exact run in case you want to get more information about what happened.


Configuration Files

This section relates Avida to other files that it requires.

DATA_DIR The name (or path) of the directory where output files generated by Avida should be placed.
These settings indicate the names of all of the other configuration files used in an Avida run. See the individual documents for more information about how to use these files.



These settings control how creatures are born and die in Avida.

BIRTH_METHOD The birth method sets how the placement of a child organism is determined. Currently, there are six ways of doing this -- the first four (0-3) are all grid-based (offspring are only placed in the immediate neighborhood), and the last two (4-5) assume a well-stirred population. In all non-random methods, empty sites are preferred over replacing a living organism.
By default, replacement is the only way for an organism to die in Avida. However, if a death method is set, organisms will die of old age. In method one, organisms will die when they reach the user-specified age limit. In method 2, the age limit is a multiple of their length, so larger organisms can live longer.
ALLOC_METHOD During the replication process in the default virtual CPU, parent organisms must allocate memory space for their child-to-be. Before the child is copied into this new memory, it must have an initial value. Setting the alloc method to zero sets this memory to a default instruction (typical nop-A). Mode 1 leaves it uninitialized (and hence keeps the contents of the last organism that inhabited that space; if only a partial copy occurs, the child is a hybrid if the parent and the dead organism, hence the name necrophilia). Mode 2 just randomizes each instruction. This means that the organism will behave unpredictably if the uninitialized code is executed.
DIVIDE_METHOD When a divide occurs, does the parent divide into two children, or else do we have a distinct parent and child? The latter method will allow more age structure in a population where an organism may behave differently when it produces its second or later offspring.
GENERATION_INC_METHOD The generation of an organism is the number of organisms in the chain between it and the original ancestor. Thus, the generation of a population can be calculated as the average generation of the individual organisms. When a divide occurs, the child always receives a generation one higher than the parent, but what should happen to the generation of the parent itself? In general, this should be set the same as divide method.


Divide Restrictions

These place limits on when an organism can successfully issue a divide command to produce an offspring.

CHILD_SIZE_RANGE This is the maximal difference in genome size between a parent and offspring. The default of 2.0 means that the genome of the child must be between one-half and twice the length of the parent. This it to prevent out-of-control size changes. Setting this to 1.0 will ensure fixed length organisms (but make sure to also turn off insertion and deletion mutations).
These settings place limits on what the parent must have done before the child can be born; they set the minimum fraction of instructions that must have been copied into the child (vs. left as default) and the minimum fraction of instructions in the parent that must have been executed. If either of these are not met, the divide will fail. These settings prevent organisms from producing pathological offspring. In practice, either of them can be set to 0.0 to turn them off.
REQUIRE_ALLOCATE Is an allocate required between each successful divide (in virtual hardware types where allocate is meaningful)? If so, this will limit the flexibility of how organisms produce children (they can't make multiple copies and divide them off all at once, for example). But if we don't require allocates, the resulting organisms can be a lot more difficult to understand.
REQUIRED_TASK This was originally a hack. It allows the user to set the ID number for a task that must occur for a divide to be successful. At -1, no tasks are required. Ideally, this should be incorporated into the environment configuration file. NOTE: A task can fire without triggering a reaction. To add a required reaction see below.
IMMUNITY_TASK Allows user to set the ID number for a task which, if it occures, provides immunity from the required task (above) -- divide will proceede even if the required task is not done if immunity task is done. Defaults to -1, no immunity task present.
REQUIRED_REACTION Allows the user to set the ID number for a reaction that must occur for a divide to be successful. At -1, no reactions are required.
DIE_PROB Determines the probability of organism dieing when 'die' instruction is executed. Set to 0 by default, making the instruction neutral.



These settings control how and when mutations occur in organisms. Ideally, there will be more options here in the future.

POINT_MUT_PROB Point mutations (sometimes referred to as "cosmic ray" mutations) occur every update; the rate set here is a probability for each site that it will be mutated each update. In other words, this should be a very low value if it is turned on at all. If a mutation occurs, that site is replaced with a random instruction. In practice this also slows Avida down if it is non-zero because it requires so many random numbers to be tested every update.
COPY_MUT_PROB The copy mutation probability is tested each time an organism copies a single instruction. If a mutation occurs, a random instruction is copied to the destination. In practice this is the most common type of mutations that we use in most of our experiments.
These probabilities are tested once per gestation cycle (when an organism is first born) at each position where an instruction could be inserted or deleted, respectively. Each of these mutations change the genome length. Deletions just remove an instruction while insertions add a new, random instruction at the position tested. Multiple insertions and deletions are possible each generation.
Divide mutation probabilities are tested when an organism is being divided off from its parent. If one of these mutations occurs, a random site is picked for it within the genome. At most one divide mutation of each type is possible during a single divide.


Mutation Reversions

This section covers tests that are very CPU intensive, but allow for Avida experiments that would not be possible in any other system. Basically, each time a mutation occurs, we can run the resulting organism in a test CPU, and determine if that effect of the mutation was lethal, detrimental, neutral, or beneficial. This section allows us to act on this. (Note that as soon as anything here is turned on, the mutations need to be tested. Turning multiple settings on will not cause additional speed decrease)

When a mutation occurs of the specified type, the number listed next to that entry is the probability that the mutation will be reverted. That is, the child organism's genome will be restored as if the mutation had never occurred. This allows us both to manually manipulate the abundance of certain mutation types, or to entirely eliminate them.
The sterilize options work similarly to revert; the difference being that an organism never has its genome restored. Instead, if the selected mutation category occurs, the child is sterilized so that it still takes up space, but can never produce an offspring of its own.
FAIL_IMPLICIT If this toggle is set, organisms must be able to produce exact copies of themselves or else they are sterilized and cannot produce any offspring. An organism that naturally (without any external effects) produces an inexact copy of itself is said to have implicit mutations. If this flag is set, explicit mutations (as described in the mutations section above) can still occur.


Time Slicing

These settings describe exactly what an update is, and how CPU time is allocated to organisms during that update.

AVE_TIME_SLICE This sets the average number of instructions an organism should execute each update. Organisms with a low merit will consistently obtain fewer, while organisms of a higher merit will receive more.
SLICING_METHOD This setting determines the method by which CPU time is handed out to the organisms. Method 0 ignores merit, and hands out time on the CPU evenly; each organism executes one instruction for the whole population before moving onto the second. Method 1 is probabilistic; each organism has a chance of executing the next instruction proportional to it merit. This method is slow due to the large number of random values that need to be obtained and evaluated (and it only gets slower as merits get higher). Method 2 is fully integrated; the organisms get CPU time proportional to their merit, but in a fixed, deterministic order.
SIZE_MERIT_METHOD This setting determines the base value of an organism's merit. Merit is typically proportional to genome length otherwise there is a strong selective pressure for shorter genomes (shorter genome => less to copy => reduced copying time => replicative advantage). Unfortunately, organisms will cheat if merit is proportional to the full genome length -- they will add on unexecuted and uncopied code to their genomes creating a code bloat. This isn't the most elegant fix, but it works.
MAX_LABEL_EXE_SIZE Labels are sequences of nop (no-operation) instructions used only to modify the behavior of other instructions. Quite often, an organism will have these labels in their genomes where the nops are used by another instruction, but never executed directly. To represent the executed length of an organism correctly, we need to somehow count these labels. Unfortunately, if we count the entire label, the organisms will again "cheat" artificially increasing their length by growing huge labels. This setting limits the number of nops that are counted as executed when a label is used.
MAX_CPU_THREADS Determines the number of simultaneous processes that an organism can run. That is, basically, the number of things it can do at once. This setting is meaningless unless threads are supported in the virtual hardware and the instructions are available within the instruction set.


Geneology Info

These settings control how Avida monitors and deals with genotypes, species, and lineages.

THRESHOLD For some statistics, we only want to measure organisms that we are sure are alive, but its not worth taking the time to run them all in isolation, without outside effect (and in some eco-system situations that isn't even possible!). For these purposes, we call a genotype "threshold" if there have ever been more than a certain number of organisms of that genotype. A higher number here ensures a greater probability that the organisms are indeed "alive". Recently, we've been shifting away from using threshold genotypes and instead finding other, more accurate testing methods.
GENOTYPE_PRINT Should all genotypes be printed out upon reaching threshold? Each will receive its own file in the archive directory, so this can get very hard disk intensive. Many runs will have in the millions of organisms.
GENOTYPE_PRINT_DOM Printing only the dominant genotype keeps track of the most successful individual genotypes without costing a huge amount of memory. The number you place here is the total number of updates that a genotype must remain dominant for it to be printed out. A 0 turns this off.
SPECIES_THRESHOLD In Avida, two organisms are said to be of the same species if you can perform all possible crossovers between them, and no more than a certain threshold (set here) fail to be viable offspring. The crossovers are done in isolation, and never affect the population as a whole.
SPECIES_RECORDING This entry sets if and how species should be recorded in Avida. A setting of 0 turns all species tests off. A setting of 1 means that every time a genotype reaches threshold, it is tested against all currently existing species to determine if it is part of any of them. If so, its species is set, and if not, it becomes the prototype of a new species. Finally, a setting of 2 only tests a new threshold genotype against the species of its parent (since each species test can take a long time) and if that fails immediately creates a new species. In practice, methods 1 and 2 produce similar results, but method 1 can take a lot longer to run.
SPECIES_PRINT Toggle: Should new species be printed as soon as they are created?
TEST_CPU_TIME_MOD Many of our analysis methods (such as species testing) require that we be able to run organisms in isolation. Unfortunately, some of these organisms we test might be non-viable. At some point, we have to give up the test and label it as non-viable, but we can't give up too soon or else we might miss a viable, though slow replicator. This setting is multiplied by the length of the organism's genome in order to determine how many CPU-cycles to run the organism for. A setting of 20 effectively means that the average instruction must be executed twenty times before we give up. In practice, most organisms have an efficiency here of about 5, so 20 works well, but for accurate tests on some pathological organisms, we will be required to raise this number.
TRACK_MAIN_LINEAGE In a normal Avida run, the genebank keeps track of all existing genotypes, and deletes them when the last organism of that genotype dies out. With this flag set, a genotype will not be deleted unless both it and all of its descendents have died off. This allows us to track back from any genotypes to its distant ancestors, monitoring all of the differences along the way. Once this information is being saved, see the events file for how to output it.


Log Files

Log files are printed every time a specified event occurs. By default, all logs settings are 0 (i.e. the logs are turned off). Each time a logged event is printed, the update and identifying information on the individual that triggered it is always included.

LOG_CREATURES If toggle is set, print an entry to creature.log whenever a new organism is born. Include position information, parent organism, and a link to it genotype so the run can be reconstructed. This gets very large.
LOG_GENOTYPES If toggle is set, print an entry to genotype.log whenever a new genotype is created. Includes information on its parent genotype.
LOG_THRESHOLD If toggle is set, print an entry to threshold.log whenever a genotype reaches threshold. Includes information on what species it is.
LOG_SPECIES If toggle is set, print an entry to species.log whenever a new species is created. Includes information on the genotype the triggered the creation.
LOG_LINEAGES Lineages can be given unique identifies and printed (into the file lineage.log) whenever they are created. Includes details about the event that created the lineage.
LINEAGE_CREATION_METHOD Details when lineages are created. See config file comments for more detailed information.

Return to the Index