Weasel Ware 2.0 The Evolutionary Informatics Lab 2009 All Rights Reserved Coded By: Atom for The Evolutionary Informatics Lab (www.atomthaimmortal.com) Changes in Version 2.0: ================= - Two new Searches: Proximity Reward Search and Proximity Neutral Search - Multi-Runs Mode: Useful for running the GUI many times in a row, for gathering data on average perfromance. - Sortable Searches: You can now reorder the searches, based on which searches you're interested in viewing. Just grab the search by its label and move up or down. - Disabling of Searches: You can disable searches, so that you spend your CPU cycles on only the searches you are interested in. - Proximity Neutral Search - New Reward Functions: Simple Sum, CRC32, Wave Interference, Partially Neutral: CRC32 Based, Partially Neutral: Anagram, Custom Fitness Function mode (where you can code your own fitness function to run in the GUI.) Weasel Ware 2.0 Highlights: ===================== To latch or not to latch?: Version 2.0 contains both a Partitioned Search, that uses information about the target to freeze correct letters in place, as well as a Proximity Reward Search, that does not freeze letters, but uses information about the target to assign values to candidate strings. Weasel Ware now allows you to run both types of searches side by side. Proximity Neutral Search: Since choosing a reward function rich in target-specific active information, such as a simple Proximity Reward function, is unrealistic for biological purposes, we have a Proximity Neutral Search available. This search can use fitness functions with varying levels of target-specific active information. There are fitness functions that hardly reference the target string at all, other than using information about the target length (such as CRC32, Simple Sum and Wave Interference) and there are those that use some information about the target string to further narrow the search space. These functions (prefixed with "Partially Neutral: ") do narrow the relevant search space, but not to the extent of a Proximity Reward function. For example, the Anagram fitness function ranks all Anagrams of the target string (which by definition will include the target string itself) as the highest rank, and will rank all other strings based on how many letters they contain that the target does not, or vice versa. (In other words, based on their distance from the Anagram subset.) In this way, it quickly converges on a subspace of only roughly n! options from the full set of m ^ n options, where n is the string length and m is the number of letters in the currently used alphabet. Selection will then keep the random walk close to and within this subset, granted that your mutation rate is not too high and your population size is large enough. For an english word ten characters long with no repeated letters, we reduce the relevant search space from 205,891,132,094,649 options (27^10, the 27th "letter" being the space symbol) to only 3,628,800 options (10!), which we'd then have to search through using random methods. (Though even here we can further intelligently tailor our mutation rate and population size to aid our search.) (Note: I use relevant in an informal manner: though our mutations can push an individual candidate string far outside this subset, a small mutation rate will allow the random walk to continue checking for solutions close to and within this anagram subset.) The available Reward (Fitness) Functions for Proximity Neutral Search are: - Simple Sum: This function creates a reward mapping based on the ASCII sum of the characters of the string. It plugs this sum into a sin() function and takes the absolute value to get a number between 0.0 and 1.0, which it then multiplies by the string length. This value becomes the number of "errors" in the string. The string with the lowest number of errors is then chosen. - CRC32: This function uses a CRC32 checksum of the string to get a 32-bit integer, which it then plugs into a sin() function and multiplies against the string length, similar to the Simple Sum method. - Wave Interference: This function uses the interference pattern created by two sin waves of different wavelength and phase to get the value for each string. The lowest value represents the best string. - Partially Neutral CRC32: This function uses the distance between the CRC32 checksum of the string compared to the target string. It then selects the string with the smallest CRC32 distance from that target. -Partially Neutral Simple Sum Distance: This function simply takes the ASCII sum of the target string and that of the candidate strings and assigns fitness values based on the distance from the target string's ASCII sum. The offspring with the ASCII sum closest to that of the target will be considered the "fittest" for selection purposes. - Partially Neutral Anagram: See discussion above. Selects based on distance from being an anagram of the target string. - Custom Fitness Function: In addition to the above mentioned functions, users can test any other fitness function they can come up with by using the Custom Fitness Function mode. A button labeled "Edit Custom Code" appears when that mode is selected, where users can enter valid javascript to assign two values used for comparison: aError and bError. Whichever has the lowest value becomes the "fittest" string, and is selected. The user has access to any standard javascript function and can also code their own functions, as the default examples show. (The default code loaded is a javascript snippet for a Proximity Reward function.) The two strings the users will compare are available as variables "a" and "b" in the javascript code, and the target string is simply the variable "target". The user's code will be evaluated using eval() and if an exception occurs it will be trapped and a message displayed to the user in red text. (On exceptions, aError and bError are both assigned a fitness value of 0.) Does the fitness function we use matter? ------------------------------------------------------ There is no way to code for all the possible Reward Matrices created by all the different possible fitness functions, so the above are just a tiny sampling. Some (like the Proximity Reward matrix) will help a search. Others will not, or may even hinder it. For a string of length n, where each position can have one of m characters, the possible number of string permutations is m ^ n. If we assign each of those permutations a value between 0 and F (in other words, give it a "fitness" value), we get (F + 1) ^ (m ^ n) distinct matrices, which can be generated by as many different fitness functions. To put real numbers to this, assume we have a binary string of three bits that we are searching for. (Let's say we're looking for the string 011.) Since we can have two possible characters (0 or 1) and our string is three bits, we have 2 ^ 3 = 8 possible permutations we'd have to search through. (This is our Search Space.) If we now want to use a fitness function to help find the string, we can decide to limit ourselves to fitness functions that assign values to the three bit strings that are between 0 and 3. Therefore, we now have 4 ^ (2 ^ 3) = 65,536 unique fitness/reward matrices and as many fitness functions that output a unique reward matrix for the string permutations. We must therefore choose our fitness function wisely from the 65,536 available, since not every fitness function will help us in our search. Since Weasel Ware allows you to try a variety of different fitness functions you can see for yourself that it isn't the power of selection that allows a difficult search to be successful, but the choosing of an information-rich fitness matrix that aligns closely enough with your target. Using just any fitness reward matrix (such as one that is neutral to the target) will by no means guarantee success, even if mutation, reproduction and selection are available, as they are in the Proximity Neutral Search. USAGE NOTES: ============ "Offspring" in the Proximity Reward and Proximity Neutral controls how many strings per generation are generated based on the current best string. Mutation Rate is the per-letter mutation rate for offspring strings. The default of 4.0% will change roughly one letter in a 28-letter string each replication, on average. Multi-Run Mode will allow you to test a given search strategy over a fixed number of runs. The "View Results" button will then show the results in both HTML display and CSV display, for pasting into a spread-sheet analysis program such as Excel. Atom's Notes: ========== Any errors in interpretation or understanding in the version 2.0 release are strictly mine, are unintentional, and should not reflect negatively on the EIL or EIL staff. If you do find any errors in the implementation, please let me know by contacting me via my website (atomthaimmortal.com, the contact form.) If you do choose to contact me, please be as friendly as you would in person.