WRITTEN BY LEEANN WAGNER AND BUD WRIGHT
THIS UNIT IS INTENDED FOR GRADES 8 THROUGH 12.
I. Artificial Intelligence (AI) is the ability of an artificial mechanism to exhibit intelligent behavior. AI programs have developed from the primitive stage to the point where they include computer programs that perform medical diagnoses, mineral prospecting, speech understanding, and vision interpretation. The term Artificial Intelligence was coined in 1956 when a group of interested scientists met for an initial summer workshop.
Early work in Artificial Intelligence consisted of attempts to simulate the neural networks of the brain with numerically modeled nerve cells. Success was very limited due to the great complexity of the problem and primitive state of computers. Interest was revived in the 1980's and has continued into the 1990's because of the advances in computer technology. Early and current systems are based on systems that manipulate numbers and symbols. For example, if an AI system is told "If x is a bird, then x can fly." If such an AI system determines that a robin is a bird, then it can also fly.
The first knowledge-based expert program was written in 1967 called Dendral. It could predict the structures of unknown chemical compounds based on routine analysis. More sophisticated rules-based expert systems where subsequently developed, notably the Mycin program. It uses rules derived from the medical domain to reason backwards (deduce) from a list of symptoms to a particular disease.
Some recent uses of Artificial Intelligence using neural networks include the following:
Artificial Intelligence systems makes decisions by using a "neural network." A neural network is a computer with an internal structure that imitates the human brain's interconnected system of neurons. In a neural network, transistor circuits (gates on a computer chip) are the electronic analog of neurons. Neural networks do not follow rigidly programmed rules as more conventional digital computers do. Rather, they build a knowledge base through a trial and error method. A programmer, for instance, will digitally input a photographic image for a neural network to identify and it will "guess" for which circuits to "fire" (activate). It will then identify the photograph and output the correct answer.
Pathways between individual circuits are "strengthened" (resistance turned down) when a task is performed correctly and "weakened" (resistance turned up) if performed incorrectly. In this way a neural network "learns" from it mistakes and gives a more accurate output with each repetition of a task. At a fundamental level, all networks learn by association. For example, a neural network can learn to identify a pumpkin by associating the inputs "large, round, orange and vegetable" with the output "pumpkin."
The neurons in a neural network are usually organized in three layers: input, hidden, and output. Sometimes more than one hidden layer is used. In Diagram 1, each circle represents a neuron. Each column of neurons is a layer and every neuron in one layer is connected to every neuron in the next layer.
The network shown in the diagram is trained to recognize vegetables from their descriptions. Information flows from the input layer through the hidden layer to the output layer. The hidden layer makes the associations between the inputs and outputs. It is called a hidden layer because it has no direct connection to the outside world. You present information to the input layer and the network gives you an answer in the output layer.
There are many different ways a network can learn. The most popular learning method is by example and repetition, also called Back Propagation. The vegetable neural network is trained by this method. Many example pairs of inputs and outputs are collected and presented to the network. Each time any input ("orange, round, large and vegetable") is presented to the network, it guesses what the output is supposed to be. When a network is brand new and has not learned anything yet, it will probably make a wrong guess.
Suppose our untrained network initially decides that a large, round, orange vegetable is a zucchini. The training example which has the correct output indicates the vegetable is really a pumpkin. The network compares the network output to the training example output and makes changes to its internal connections so that the next time it sees the same inputs it will be more likely to produce the correct answer. The connection adjusts so that the inputs are associated more strongly with the pumpkin output and less strongly with the zucchini output. This training is repeated for a set of examples until the network learns the correct answer. Once the network is trained using pre-selected inputs and outputs, we can run it on new input information (without any supplied outputs) and have it recognize, generalize or predict the answer for us.
II. The human brain is a complex biological network of billions of special cells called neurons. These neurons send information back and forth to each other through connections; the result is an intelligent being capable of learning, analysis, prediction and recognition. Artificial neural networks are formed from hundreds of thousands of simulated neurons that are connected in much the same way as the brain's neurons and are thus able to learn in a similar manner to people. Diagram 2 illustrates biological and artificial neurons.
Some early neural network systems used individual electronic devices. Now we can use a neural network simulator to test neural network theories or to make useful applications. A neural network simulator is a program (a set of computer instructions) that creates a model of neurons and the connections between them and then trains this model.
There are many types of neural networks, but all have three things in common. A neural network can be described in terms of its individual neurons, the connections between them (topology) and its learning rule. While neural networks can do some impressive things, they cannot replicate all aspects of a human brain. The brain is far too massive and complex for even a super-computer to fully simulate. There are two important types of simulation a neural network can do: modeling brain processes and modeling brain capabilities.
A brain process model tests theories about brain function. For example, the human brain is able to recognize speech but only with a certain temperature range. If the brain temperature goes below 80 degrees or above 110 degrees, the brain is unable to recognize speech at all. Thus, a neural network that models brain processes might well include a temperature factor. The purpose of the brain capability model is to perform some of the brain's functions though not necessarily in the same way that the brain does. Thus, a neural network that is used to model speech recognition capability would probably be designed without a temperature factor and the inter-connections between neurons and the learning method might be simplified. Most neural networks attempt to model only brain capabilities.
Since the human brain is a complex network of billions of highly inter-connected cells or neurons these cells receive information from as many as 10,000 other cells. A neuron in the brain has four basic parts: the body, the incoming channel, the outgoing channel and the connecting points between the neurons which are called synapses. These are shown in Diagram 3.
The synapses attach "weights" to incoming signals so that each of the signals will have a different effect on the neuron. A synapse can cause a signal to "turn on" (excite) or "turn off" (inhibit) the neuron. A highly excited neuron sends out an output signal, an inhibited one does not. The job of the neuron body is to add up all the incoming signals and decide if the total is enough to send out a signal. Each neuron detects and sends out only one simple thing. It is the job of the inter-connected neurons to determine such things as judging the speed of an oncoming car. Such a group of inter-connected neurons is called a neural network. Learning occurs in the brain in the form of changes to the synapses.
In an artificial neural network, each neuron also receives the output signals from many other neurons. A neuron calculates its own output by finding the weighted sum of it inputs. The point where two neurons communicate is call a connection (analogous to a synapse). The strength of the connection between two neurons is called a weight. As described previously, the neurons are usually connected in three layers which were input, hidden and output. An artificial neural network built with today's technology has very few connections compared to the number in the brain. The human brain has about one hundred billion neurons and ten million billion connections. Nettalk, which converts printed text into speech has about 325 neurons and 20,000 connections. A neural network learns by the system of back propagation in which an error signal is fed back through the network, altering weights as it goes, to prevent the same error from happening again. The network is trained by presenting it with input and output pairs. The weights are changed so that the network will eventually produce the matching output pattern when given the corresponding input pattern of the pair.
Neural networks are best known for their pattern recognition ability. If you need to recognize or classify something, in some instances, a neural network can do it faster and more accurately than a person. A neural network can look at something and identify even (sometimes) with missing or invalid data. Neural networks can recognize cancer from image analysis, aircraft from radar returns and the type of sex of insects from win beat frequencies. They are not known for precision. If you ask a neural network for the sum of 2.01 and 2.02 it will probably give an answer of 4. If we wonder how smart neural networks can get, Diagram 4 shows the level of technology today. The number of neurons (as a log) is on the vertical axis and the compute speed (as a log) in connections per second is on the horizontal axis. The latest chip technology (Intel's 80170NX) has about the compute speed of a cockroach. If we project that neural network performance will double every three years as have the performances of memory and microprocessors, we are about 120 years away from electronic devices with the same performance potential as a person. Although neural networks cannot "think" as fast, they can organize data and results into meaningful categories to an extent which no human is capable of.
III. Much of our summer research consisted of training and testing sessions on neural networks under the direction of our mentor, Zoran Obradovic, and graduate assistants Tim Chenoweth and Radu Drossu. These neural networks were prepared by following five basic steps.
A. Our first task was to become familiar with the UNIX operating system and practice on a routine network with data describing the features of industrial robots. There are three standard benchmark problems for performance comparison of different learning algorithms. The inputted data was very simplistic, consisting of 16 indicators all being 0's and 1's. The 0's and 1's identified the features of the robots. For example: large-1, small-0, lifting ability above 5kg-1, not-0, spot weld capability-1, not-0. The outputs were also 0's and 1's which is true of many neural networks. They simply mean acceptable for a company's use or not acceptable.
We trained the network on the known data (124, 169 and 122 robots for problems 1, 2 and 3 respectively) and then tested it on new data with expected high success rates. Again, the back propagation program uses a weighted sum which it keeps changing by trial and error until it can separate data by multi-layered planes in order to categorize robots with different features into acceptable and not acceptable groups.
B. For the breast cancer recurrence prediction, our mentor, Zoran, had a file of "real live data" which we trained and tested on neural network simulator Ver 1.0 which was written by one of our instructors, Radu Drossu. The data consisted of vital statistics from 286 breast cancer patients from University Medical Center, Ljubljeue, Slovenia. We were supplied with inputs such as pulse, blood pressure, blood cell counts, etc. There were nine indicators and an output for each patient. The output was 0 or 1 indicating that cancer reoccurred within some interval of time.
The task was from this known data. We wanted to train the program to predict whether cancer would reoccur in a former patient knowing their nine vital statistics. With the 286 cases we split the data using 4/5 of it to train the program and the other 1/5 to test it. We also used five different splits. The results are listed in an included table. Using Radu's program is a trial and error tedious process. The program has essentially four main settings which the operator can alter to obtain better prediction results. The network must repeat its iterations or "epochs" which usually number in the thousands. The other three settings are learning rate, momentum, and tolerance.
Radu, who wrote the program, uses the analogy of a ball rolling along an undulating path and the program must learn the path. If the learning rate is set too high it takes big jumps and misses parts of the curve. If it come to a hill after a plateau, it must have increased momentum to climb it. However, just setting these two parameters, low and high respectively, does not always work. These parameters are described in Diagram 5.
Our first attempts yielded dismal results. Then we realized that the data was not "normalized." This means that the inputs were not in the range (0,1). Many neural networks only accept normalized data as inputs and then use this and a weighted sum in the decision function. Input data can be readily normalized with a linear function and an exercise on this is included in this module.
By adjusting the parameters and using many iterations and repeating this process many times, we were able to obtain an average correct prediction rate of over 70%. Keep in mind that we used known results for testing so this means that in predicting cancer reoccurrence in new patient the program would be correct about 70% of the time. This result is comparable to previously reported generalizations obtained using different data modeling techniques and is obtained in a relatively short time period.
C. We also worked on a program, the feature selection for predictive models of the stock market, that was developed at WSU which given suitable inputs predicts the future movements in the Standard & Poor Composite Index which is an average of the value of 500 selected stocks as stated by the school of engineering and science.
"It is well known that the stock market does a very good job of reflecting the actual value of the underlying stock. However, as recently indicated, it is still possible that there are nonlinear relationships between market information and the value of the stock that so far have not been identified and therefore, not reflected in stock prices. Our aim is to explore if their nonlinear relationships can be captured using a machine learning approach of problem tailored artificial neural networks."
The input data for this program were 32 financial indicators such as the consumer price index, the US treasury T bill rate, etc. Many of the inputs were the same indicator, but they would go back several time intervals. The writer's of the program felt it contained "noise", i.e. it had too many inputs which did not appreciably affect the predicted outcome. A complex algorithm was devised which would sequentially remove the indicator which least changed the outcome. By using this process we decreased the number of inputs to the eight "best" which was the desired amount. Working with this program gave us a better understanding of neural networks and an appreciation for the complexity of UNIX.
This system is not PC based so it will be left to "BrainMaker" to teach neural networking to our students.
A. In this example real estate problem, before we can predict the selling house prices for the next day there are certain steps to follow to get your neural network in operating condition. You are going to be creating input and output files, making BrainMaker files, training the network, evaluating the results and finally, changing data to predict the selling price of any house based on this data. This will give you an estimated price of a house if it would go on the market tomorrow.
The training network has 217 examples of houses and their individual data which consists of the following:
|SALEPRIC||actual sale price of home||$103,000-250,000|
|DWLUN||number of dwelling units||1-3|
|RDOS||reverse data of sale (months since sale)||0-23|
|TOTFIXT||number of plumbing fixtures||5-17|
|HEATING||heating system type||coded as 2 or 3|
|WBFPSTKS||wood burning fireplace stacks||0-1|
|ATTFRGAR||attached frame garage area|
|TOTLIVAR||total living area||714-4185|
|DECK/OFP||deck/open porch area||0-738|
|ENCLPOR||enclosed porch area||0-452|
|NBHDGRP||neighborhood group||coded as 1 or 2|
|RECROOM||recreation room area||0-672|
|FINBSMT||finished basement area||0-810|
|TOTOBY||total other value (building and yard)||0-16400|
Open NetMaker (click icon, data files need to be made in this program while the work is done in BrainMaker)
Now we need to let BrainMaker make use of the data in NetMaker.
The last four steps have made it possible to look at the steps as the program learns the data and the statistics tell you which test was the best.
Now we have to evaluate training (the statistics)
Now we want to retrieve the network that was saved just before the best run. If it was 25, the last saved would be 24.
Now, here is where we can enter new data, or a description of a house and our training file will tell us the price of the new home.
B. This neural network will predict the length of stay for hospital patients. This type of neural network is one that is used in a quality improvement and cost reduction medical system at Anderson Memorial Hospital in South Carolina.
Since knowing the length of stay at a hospital is another way of stating the severity of an illness, a treatment program can be planned with this in mind and new patients can be more easily compared to past similar cases. This network will try to predict from the current data of a patient just how long they will stay at the hospital which in turn helps the hospital figure out the "costs" that that patient will bring them. It is important for hospitals to try predict what costs certain patients will bring them.
C. We might want to predict the winner of the annual Apple Cup game based on the UW's and WSU's performance in the Pac 10 leading up to the final game. Actually, we would use the league results for each team considering both home and away games as inputs to train the network. This does involve entering time consuming files.
Typical statistical values for each game played are listed in the table below. Note the ranges which must be normalized for the network to train on. As outputs, we could have the point spread (0-10), and win (1), loss (0) or tie (.5). For each contest played we would have 22 input neurons. When the network has trained on past statistics and outcomes, we would present the Cougars and Huskies pre-Apple Cup statistics. Note: the first game of the season would be trained on last years statistics.
The pattern 1 input table shows typical and normalized values which would be presented to the network. The ranges are taken from the statistical value table for normalization. Notice that in the example, team B's 80 average yards per game allowed (sounds like the Cougs) registered as a 0 because it is outside the specified 100 to 500 range. Presented with the training facts and then testing on a small percentage of the inputs, we could refine the system and have it predict the outcome.
|Statistical Values for Each Game Played
Inputs (one set for each team):
|Avg. yards gained per game||100 to 500|
|Avg. yards allowed per game||100 to 500|
|Avg. points scored per game||0 to 50|
|Avg. points allowed per game||0 to 50|
|Percentage wins at home||0 to 100|
|Percentage wins away||0 to 100|
|Net turnovers||-30 to 30|
|Avg. penalties per game||2 to 15|
|Avg. penalty yards per game||10 to 150|
|Avg. point spread per game||0 to 10|
|Home/visit team||0 or 1|
|Point spread this game||0 to 10|
|Team A win/loss||0(loss), .5(tie), 1(win)|
|Team B win/loss||0(loss), .5(tie), 1(win)|
|Pattern 1 Input|
|1||Team A avg. yards gained per game||250||0.375|
|2||Team A avg. yards allowed per game||200||0.250|
|3||Team A avg. points scored per game||29||0.580|
|4||Team A avg. points allowed per game||15||0.300|
|5||Team A % wins at home||73||0.730|
|6||Team A % wins away||52||0.520|
|7||Team A net turnovers||2||0.533|
|8||Team A avg. penalties per game||5||0.231|
|9||Team A avg. penalty yards per game||20||0.071|
|10||Team A avg. point spread per game||4||0.400|
|11||Team A home/visit team||1||1.000|
|12||Team B avg. yards gained per game||220||0.300|
|13||Team B avg. yards allowed per game||80||(0)|
|14||Team B avg. points scored per game||23||0.460|
|15||Team B avg. points allowed per game||10||0.200|
|16||Team B % wins at home||65||0.650|
|17||Team B % wins away||61||0.61|
|18||Team B net turnovers||-3||0.450|
|19||Team B avg. penalties per game||8||0.462|
|20||Team B avg. penalty yards per game||80||0.500|
|21||Team B avg. point spread per game||2||0.200|
|22||Team B home/visit team||0||0.000|
|Pattern 1 Output|
|2||Team A win/loss||0||0.000|
|3||Team B win/loss||1||1.000|
Many neural network programs only accept normalized data, i.e. numbers with a range of from 0 to 1.
By using the point-slope form for the equation of a line, data may be normalized as follows.
Starting with (X1,Y1) we want the equation of a line connecting it with (X2, Y2)
interval range for the data is [4,64], normalize the following
input values on (0, 1), f(4) = 0 ; f(64) = 1
X = 12
X = 20
X = 8
X = 44
X = 56
Give 7 descriptive inputs that would identify the fruit outputs.
For output units, choose 4 sports balls,
e.g. football, golfball, ... Then pick 7 input
characteristics, e.g. smooth, dimpled, dewn, ... In which
3 or more would identify these outputs.
How many connections are in this neural network?
On the back of this sheet (or another
sheet), create a neural network as follows:
Use 5 automobiles as teh outputs. As the input units, use 8 characteristics which would identify these autos, e.g., pre 1950, 2-door, American made, Classic might yield, "Little Deuce Coupe."