Dan Morris
CG102 Final Project
5-7-98
Java-based Observation of the Hopfield Network
I. Running JOHN :
This paper will describe a Java Applet written to explore the properties of the Hopfield network as an autoassociative learning system with Widrow-Hoff error correction, and observations made in that environment. This text is thus incomplete without observation of the program itself, available for online, in-browser viewing at :
http://techhouse.brown.edu/dmorris/JOHN/StinterNet.htmlNote that earlier versions of Internet Explorer will not run the Applet, as it is written in Java 1.1 . Also note that Netscape users will be asked to download Sun’s Java plugin, which everyone should have anyway, because Java is all the rage these days. While you’re noting that, also note that the Applet may take a few moments to load (but it really will). For some reason, despite its status as the Internet hub of the Northeastern United States, MIT has a painfully slow webserver.
The purpose of this paper is to provide a brief background and mathematical basis for the program, to provide a concise description of my results, and to draw conclusions about the applicability of the Hopfield network to character recognition. The details of the program – complete with information on the GUI components, examples to try, conclusions about network performance, and implications for character recognition – are presented online at the above addresses, and are included here as Appendix A (probably easier to read here than in a browser). The Java code for the program is included as Appendix B. Although these documents are included as appendices, they represent the bulk of the project. Furthermore, the component-by-component description presented in Appendix A is the most appropriate format for describing the system; this writeup may thus actually be considered supplementary to the appendices.
II. Background and Project Motivations :
Of the limited number of neural network algorithms to which I have been exposed, I have found the Hopfield network the most interesting. And the binary nature of the elements in the traditional Hopfield system suggests application to graphical pattern recognition/reconstruction. I thus set out to implement a Hopfield network for a character recognition application; hence the graphic-based nature of the final product. The original plan was to allow the user to draw a character in the input field, and run it through a network whose previously trained stable states are the Arabic digits.
I’ll begin by briefly describing the limited success that I did achieve. Numerical demonstration (the CORRECTION TRIAL portion of the program) confirms that error-correction works very well on a Hopfield network. Autoassociation for a limited number of patterns can be virtually perfect with proper error correction, and performance is maximized by placing certain constraints on element and weight values.
Unfortunately, I have discovered that this provides little support for the cause of character recognition. Basically, there is much more to character recognition than neural networking. If I were going to really implement a useful neural-net-based OCR algorithm, I would almost certainly not use the Hopfield model. Inherent limitations on the architecture – resulting in spurious stable states, limited recall capacity, and disproportionate emphasis on certain patterns – make it perhaps less applicable than algorithms like backpropagation, which is more popular for such things for a reason.
But in reality, if my primary goal were quality character recognition, I might well not use a neural net at all. And if I did, it would be secondary to thousands of lines of image parsing code. A vast literature is available on extracting various features – other than active pixel locations – from digital characters. Additionally, one might like to simply pass a large template library over an image to be analyzed. These procedures can be quite effective, but are dissociated entirely from neural networking.
However, my primary goal was not quality character recognition. My primary goal was to attempt an application of the Hopfield network. Hence, in the absence of successful character recognition, the primary final product is an elaborate analysis of the Hopfield network, of my implementation of the Hopfield network as a learning algorithm, and of my application of Widrow-Hoff error correction to the network. JOHN represents a visual environment for demonstrating this analysis.
III. Mathematical Background : The Hopfield and Widrow-Hoff Algorithms :
The primary mechanism for memory storage is the binary Hopfield network, proposed originally by Hopfield (1982) and implemented for autoassociative learning largely as described by Dayhoff (1990). Elements take values of +1 or –1 (unless binary nonlinearization is removed), and Hebbian learning occurs by a simple weighted sum learning rule (in two dimensions) :
where wij-->ab is the weight between elements i,j and a,b, p is one of a set of n patterns, and vxy is the value of element x,y. As implemented in the program, training takes place as patterns are presented, so weights are simply incremented when elements carry the same value in a new pattern, or decremented when elements carry opposite values.
Hopfield later proposed a similar network in which elements are varied continuously, rather than subjected to a binary threshold (1984). This is the basis for the network implementation without binary nonlinearization, as described in Hertz et al (1991). It seems Hopfield had a great deal more luck with this methodology than I did. I attempted to use the same discrete-time updating procedure that was applied to the binary network, and discovered that this will not work terribly well. In fact, it leads to instability, oscillation, and programmer frustration. However, continuous nonlinearization can be applied in the final project. In this case, a "soft threshold" function is applied to the above sum for each element. The chosen function was tanh(nE ) where n is a constant less than one and E is the above activity sum. This constrains element values to the range between +1 and –1, but allows for a continuous distribution of values.
Error-correction for simultaneously learned sets of patterns was applied using the Widrow-Hoff gradient descent algorithm. Error values were generated by propagating each pattern through the Hopfield network and comparing input to output. Weights were modified by the traditional delta rule :
where a is the learning constant, inputxy is the input value (or desired output value) of element xy, and outputzyis the actual output value of element xy. Provided weights are constrained to symmetry, a change made to any connection is also applied to the reciprocal connection.
IV. Results –
Although the primary goal of the project was to implement the described algorithms for a graphical application, it is difficult to objectively assess or report performance in a graphical environment. Hence purely numerical simulation was used to determine performance parameters. These results, therefore, may not be perfectly accurate representations of the network as applied to character recognition.
The most important numerical results are displayed in Figure 1. Here it is clear that error correction was very successful in its application to the Hopfield network, and that constraining connection weights to symmetry is a distinct advantage. In both cases, for fifteen learned patterns, autoassociation can be perfect with enough learning trials. It should be noted that fifteen is not a particularly significant number of patterns, but it is on the same order of magnitude as the number of patterns that would be required for a simple character or digit recognition scheme.
Of course, the initial objective was graphical in nature, and the "result" is the program itself. In qualitative description, JOHN is an excellent autoassociator for graphical patterns, but is not very good at deducing an intended character, even after supervised training on a prototype character set.
V. Conclusion - Implications for Character Recognition :
The binary nature of elements in the traditional Hopfield network suggests a potential application in character recognition, where binary pixels are the primary image elements, and recall of a limited set size is the primary goal. However, despite the intuitive connection, data representation in any useful character recognition scheme would likely not involve elements that are directly correlated with pixels. Even after scaling and sharpening (crucial and relatively simple preprocessing), pixel location is simply not the best indicator of character form.
A wide variety of other feature sets have been proposed that are perhaps less intuitive. For example, in order to account for variability in handwriting styles, characters may be decomposed into vertices, lines, polygons, and arcs (Chianese et al, 1987); in many ways a reversal of the process used to construct "TrueType" scalable fonts. Other features may include intensity changes along various linear projections, occurrence of loops (especially when analyzing script characters), and other geometrical properties that may be compared with template characters. Template matching of geometrical properties may even be insufficient; Fourier transformation into the frequency domain before matching may provide significant performance improvement (Skrzypek et al, 1991).
In any case, for whatever feature set is chosen, the process of comparison with templates may or may not involve a neural network in practical application. But proposed architectures for neural-network-based recognition are rarely as simple as the Hopfield net. Hence I may have been optimistic in approaching character recognition in this paradigm, or in trying to assess the Hopfield net through character recognition. The Hopfield network has gained its popularity due to its strong analogy to energy-minimizing physical systems and its ready implementation in VLSI (Maren, 1990), not for its promise in OCR.
Additional Conclusions –
Further conclusions are presented online and in Appendix A. If you’re not interested in reading about the Applet itself, summary moments have been blatantly designated as PROJECT CONCLUSION fields.
Appendix A : Description of the Program
With Conclusions Regarding Performance and Character Recognition
This is a screen shot of the program.

The appendix that follows assumes that reader is viewing the Applet online, but this should suffice. The text is identical to that presented online along with the Applet itself.
From here on I will assume you have the Applet properly loaded, and will make reference to various GUI components. In fact, the following is a description of each of the components and the specifics of the network implementation where appropriate; this seems an appropriate approach for description of the project. Examples, conclusions, and interesting observations will be interspersed throughout the discussion of the components. If you just want to skip to the primary conclusions, look for the PROJECT CONCLUSION fields; you can’t miss them.
I. The input fields :
PROJECT CONCLUSION #1 : if I were really implementing a character recognition algorithm, a simple neural network would be inadequate, and a great deal of image parsing would be required (well beyond the simple scaling employed here). Furthermore, even if I were going to implement a neural-net-based OCR procedure, a Hopfield net would likely not be the most efficient choice. End project conclusion #1.
PROJECT CONCLUSION #2 : The system is much better at recognizing a figure that has had considerable noise applied to it than it is at discerning a hand-drawn shape. In fact, it is quite good at reconstructing noisy patterns that are unrecognizable to the human eye, but is completely incompetent at building a figure that differs in shape from the prototype, even for figures easily recognizable to a literate reader.
As a quick example, try drawing a simple figure (my favorite is an 'x' that is 5 squares from corner to corner). Click 'scale' (because it looks nicer that way), then click 'train'. Then set the noise field to 20%, and click the 'noise' button. Likely you would never recognize the figure in front of you as an 'x'. However, clicking 'propagate' should fully reconstruct the figure.
This is, of course, a silly demonstration, because only one figure has been stored. But it does show that on some level, the network is capable of reconstructing badly damaged prototypes that are beyond human recognition (but still really bad at handwritten characters). Similar properties are indeed demonstrated for larger training sets. Perhaps this network would thus be well-suited to recovering damaged typewritten characters into digital text (fuzzy-looking faxes are a great example).
II. The output fields :
This leads to a unique property of the Hopfield Network : the possibility of a stable state at the inverse of a stored pattern. For example, train the network on any arbitrary figure. Set the input field's noise to 50% (complete randomness), and propagate a few times. You'll note that, for random input, the original learned pattern and its inverse occur equally often as output. Hence both represent equally stable states.
This is an inconvenience, and will affect the performance of the network when the only performance criterion is the absolute number of 'correct' elements (see numerical performance analysis below). But the 'inverse pattern' effect probably does not have tremendous implications for character recognition, where shape is the primary parameter. In order to apply the Hopfield Network to character recognition, a post-processing algorithm would be required to compare the network output to prototype characters. It is trivial to deal with inverse patterns at this stage.
For each iteration, a random element a will be chosen from the field. The connection weight between a and each other element b will be multiplied by the value of b to obtain a weighted sum of connection influences. Provided the 'binary nonlinearization' option is checked below (more on this later), the new value of a will simply be +1 or -1, bearing the sign of the weighted sum. It is relevant to note here that in the Hopfield model, the previous state of an element has no influence whatsoever on its current state. This seems logical, as it would not be sensible for an element to have a connection weight other than 1.0 with itself, and self-weights would thus not influence the network in any useful way.
Iteration ceases when 10 * N (in this case 2250) iterations have passed without a change in an element's value. This is somewhat arbitrary, but provides reasonable confidence that the network is 'finished.' The total number of iterations required (including the 2250 uninteresting iterations at the tail end of propagation) will be displayed in the message window.
Virtually all instructions to propagate result in stable states within a few thousand iterations (again provided binary nonlinearization is applied, again more on this later). The required number of iterations for a given pattern may well be an indication of certainty (or proximity of the initial pattern to a stored state), again useful for a character recognition application where a rejection criteria is essential.
Unfortunately, the stable states reached often do not correspond to trained patterns; these 'spurious stable states' are a major drawback of the Hopfield network. One will often observe stable patterns that represent superpositions of trained patterns and/or their inverses. The occurrence of these states is dramatically reduced when error-correction is applied (discussed below), especially for larger numbers of stored figures.
III. The animation fields :
IV. The pattern set fields :
It is important to note that large numbers of learning trials can take a while, but will vastly improve the stability of the learned patterns (the autoassociative capabilities of the network). And again, the error-correction process is stable provided binary nonlinearization is applied (I still promise to discuss this later).
PROJECT CONCLUSION #3: Widrow-Hoff correction, or some similar supervised learning algorithm, is so effective as to be ESSENTIAL to any potential memory/recall applications of the Hopfield net. The numerical demonstration discussed below will make it very clear that pattern recognition with simple Hebbian learning is, so to speak, really hokey. Widrow-Hoff learning, however, can allow for perfect autoassociation - and thus good recovery from noise that does not damage figure shape - for numbers of patterns on the same order as the number of letters in the alphabet or digits available in Western numbering.
V. The file fields :
The StinterNetLocal.class file is a Java 1.1 class that will load the above Applet with security restrictions disabled. I'll put a quarter on the fact that no one's interested enough to click that link. If you do, or even if you can just convince your browser to relax security enough to read a URL (which IE4.0 is SUPPOSED to let you do), an error-corrected set of weights representing the 10 Arabic digits in block form is available at :
http://techhouse.brown.edu/dmorris/JOHN/digitweights.txtThis is essentially the data file that represents my 'character-recognition' prototypes. If you are in a position to read files, simply enter the above URL in the FILENAME field, and click 'READ WEIGHTS'. Then watch digits that you probably didn't intend to draw emerge as stable patterns. This would be a central feature of the program (and I would try much harder to make it more accessible) if character recognition was really all that good. Also, read weights at your own risk... since most browsers won't support it, I haven't been able to fully debug it and funny things happen sometimes...
VI. The correction trial fields :
The results should be rather convincing, provided sufficient learning trials are applied. For example, try running the simulation with a learning constant of .2, 15 patterns, a random seed of 5555, and 350 learning trials. Actually you might not want to actually try it, since it will take five minutes or so. But you will see that Hebbian learning alone gives 1671 errors (varying slightly, perhaps, with other implementations of the random number generator), and Widrow-Hoff learning with these parameters corrects the system to perfect autoassociation.
PROJECT CONCLUSION #4: See project conclusion #3... I just wanted to reiterate how useless the Hopfield network would be for OCR with no correction, and how much potential is added with Widrow-Hoff weight adjustments. Perfect autoassociation is, of course, a LONG way from useful pattern recognition. But it is certainly a prerequisite, and perhaps the only of the many prerequisites for OCR that can be achieved solely with the Hopfield net.
Incidentally, for a very small number of learning trials and a small learning constant, you may find that no change at all occurs; weight changes need to be large enough to actually change the sign of elements during propagation. So don't hold it against me if you run a trial with an LC of .2 and 20 learning trials, and nothing happens. In other words, good things come to those who can wait a few iterations.
VII. Nonlinearization to binary elements :
Unfortunately, the behavior of the network is VERY unpredictable with continuous-valued units, especially when weights are not constrained to symmetry (see below). As expected, when it works, one sees more effective learning with the Widrow-Hoff algorithm for a given number of iterations. This is because 'corrections' can be made when more information than the sign of the weighted connection sum is available.
I encourage you to try this only at risk of your own patience, as it does have a tendency to result in infinite propagation. This is a consequence of allowing very small values, which can lead to oscillations between signs. Hence the 10*N iterations necessary to declare the network 'stable' will never be reached.
PROJECT CONCLUSION #5: The binary nature of the elements in the traditional Hopfield network may seem a simplification, but it does not prevent high-quality autoassociation and it avoids potentially oscillatory states.
VIII. Constraining weight symmetry :
PROJECT CONCLUSION #6: While constraining the weights in the network to symmetry may intuitively seem to place limitations on possible weight solutions, it results in an immediately observable performance increase and should be applied whenever there is a limit on possible iterations.
Note that Hebbian learning alone inherently results in symmetric weights, with no applied constraints.
And so that's about it. An interesting exploration, though a disappointment with regard to character recognition. A task for graphics-types...
References
Chianese A, Cordella LP, De Santo M, Marcelli A, Vento M. "A Structural Method for Handprinted Character Recognition." In Lecture Notes in Computer Science (Goos and Hartmanis, eds.) Volume 399 : Recent Issues in Pattern Recognition (Cantoni, Creutzburg, Levialdi, and Wolf, eds.). 1989, pages 289-315.
Dayhoff, Judith E. Neural Network Architectures. Van Nostrand Reinhold : 1990.
Hertz, Krogh, and Palmer. Introduction to the Theory of Neural Computation. Addison-Wesley : 1991.
Hopfield, J. "Neural Networks and Physical Systems with Emergent Collective Computational Abilities." Proceedings of the National Academy of Sciences, 1982:79:2554-2558.
Hopfield, J. "Neurons with graded response have collective computational properties like those of two-state neurons." Proceedings of the National Academy of Sciences, 1984:81:3088-3092.
Maren, A. "Laterally-Connected, Autoassociative Networks." In Handbook of Neural Computing Applications; Maren, Harston, and Pap. 1990, pages 125-140.
Skrzypek, J; and Hoffman, J. "Visual Recognition of Script Characters and Neural Network Architectures." In Neural Networks : Advances and Applications, E. Gelenbe (ed.), 1991, pages 109-144.
Appendix B : Java Source
Following is the Java source for the project, unless you're viewing this online, in which case you can download the entire source at: