NEURAL NETS FOR PERSONAL INVESTING: ISSUES by William Arnold This version, submitted to us by the author, is an adaptation of his original article submitted to HEURISTICS: The Journal of Intelligent Technologies, published in their special issue: Neural Networks for Financial Systems, v9, #1. ------------------------------------------------------------------------ In business, Neural Nets have found a diverse range of applications, being used to predict everything from the highest achievable selling price for a two-bedroom house with a swimming pool and good "curb appeal" to the effect of various wage levels on the corporate staff attrition rate. Financial institutions now use the Networks almost routinely to aid in securities analysis, and the applications used in this process are understandably complex and expensive to create and maintain. This does not mean, though, that only financial professionals are capable of constructing and using such programs. Many commercial Neural Net products include financial forecasting tutorials because these provide effective demonstrations of the most powerful program features. Examining these demonstration models, it is apparent that the architecture of the networks is similar to that of chemical, environmental or character-recognition programs. Perhaps this observation is standing the matter on its head, because in the development process many software packages were initially designed with the lucrative financial-forecasting market in mind, with other applications added on later to increase marketability. Although they are useful in demonstrating features, these tutorials are more or less toy applications -- they don't deal with real securities, real data or real money. Knowledge of setting up and implementing the networks is only part of what is required to bridge the gap between a tutorial and a practical personal investment tool. The ultimate goal may be to develop a network or group of networks to aid in investment decisions, but even if this goal is not achieved , the effort almost certainly will yield knowledge both of the applications and of the financial markets as a corollary benefit. Actually, constructing the network itself may not be the primary problem. It may be putting a network together without expending more time and money than we want to commit. We can apply this insight immediately. We need to select a software package, either for purchase or online access. There are many such packages available, falling into two categories : stand-alone applications and those designed to add into spreadsheets such as Microsoft Excel. Deciding which of these to choose is based on the availability of the spreadsheet, and perhaps as importantly on operator facility with the spreadsheet program. Various data entry routines and data manipulations are obviously required in constructing the networks, and utilities included in the stand-alone packages are specifically designed to facilitate these. On the other hand, making networks using an add-in package may serve to increase operator capabilities with the spreadsheet program. Very low-cost shareware packages are available on the Internet; some of these have excellent features. The drawback of getting programs in this way is that when it comes to support you get what you pay for. Moving into a new application, particularly in unfamiliar terrain, can be expected to entail getting help from those who have been through the exercise before. Additionally, for commercial software packages evaluations are available[1] to facilitate easy comparison. The value of the time saved in using the telephone-help service provided by commercial vendors, and the more complete support that is provided must be weighed against the additional upfront cost. In selecting the network, it is perhaps best to look at the cost of the telephone support as well; this argues for buying a software package from a nearby company. Once the software is bought, unpacked and installed, it is a good idea to go through the tutorials ... all of them. It soesn't hurt to see all of the features of the software deemed worthy of display by the programmers. As a side excursion, particular attention should be paid to any horse- racing applications that are included, since handicappers are apparently among the most successful users of Neural Net programs. It may be that there is a higher signal-to-noise ratio in data from Saratoga than there is in data from Wall Street. The software documentation, and the programs themselves, are tempting, but at this point they are distractions. We could well experiment with transfer functions, learning parameters, and the like; that's what engineers and scientists like to do. For financial applications, though, the network nuts and bolts are in many cases considered fine-tuning tools even by advanced practitioners. To see how, or even whether such tinkering will be useful, we need a first-approximation network. To get this, there are a few tiny decisions we must make. For example, what are we going to try to predict (in neurocomputing terms, what's the pattern?). What data will be used to predict it? The network is a market-timing tool, but a well-selected target security is crucial to the overall success of the investment. In the search for a pattern worthy of analysis and eventual investment, some additional guidance is provided by Peter Lynch, an enormously successful Mutual Fund manager turned author. His five-word advice to investors : Invest In What You Know. As technologists, if we are aware of a given technology area that is currently doing well, this is an excellent choice, since we may know more about technology's future than do most investors. However, some investors may not recognize the importance of selecting securities that are already showing good economic performance. One of the oldest saws on Wall Street is "don't fight the tape", referring to the ticker-tape where transactions are displayed. It is indeed possible to find bargains in areas that are not generally prosperous, but this is more an exercise for gamblers or for those who invest lots of time unearthing opportunities than for ordinary investors. More specifically, what sort of investment vehicle should be selected as the target? The network is being counted upon to cover for lack of sophistication in securities analysis, but we face a signal-to-noise problem as individual securities may fluctuate in price unpredictably. Fortunately, the Mutual Fund industry has developed investment vehicles that are very well tailored for network requirements. Several fund families offer what are called sector funds: baskets of various stocks concentrated in a particular indistry. Fidelity, Vanguard, T. Rowe Price and Invesco, among others, offer such funds. The sales charges and expense ratios for these funds are available at any library; many investors feel that these charges are a fair price for the diversification of investments offered by the funds. This diversification is meant to shield investors somewhat from adversities affecting individual companies, but it also has the effect of damping fluctuations in share prices. An additional advantage of Mutual Funds is the ease with which they can be bought and sold, and this may come in handy if the investment regime indicates frequent buying and selling. Of course, considerations of restrictions and costs of such transactions apply here as elsewhere. One criterion peculiar to the selection of a pattern for the Neural Nets involves the magnitude of the share price. If the price per share is relatively high, then a given percentage fluctuation will be a larger value, which may be easier for the network to recognize. Another way of looking at this is to observe that more shadings of value increase or decrease are possible if the magnitude of price change can be large or small. Additionally, when the Net Asset Value of the security is calculated for each time period, there is some rounding error involved. As the share price of a security grows, this error becomes less important. In addition to selecting a target security, consideration must be given to data that will be used to predict its future value. If we are approaching this as a pattern-recognition problem, we will be looking to interpret some sort of business-cycle data. But opening up a copy of Barron's or the Wall Street Journal reveals page after page of data, any column of which is a potential input. A sort of overall, fundamental approach is needed to select the useful inputs. The number and variety of the sector funds suggests an approach based on the observations that uptrends in the performance of certain economic sectors tends to lead or lag that of others in a fairly predictable fashion(2). Observers of market history agree that with each passing year institutions have more influence in the market. Collectively, Mutual Fund managers paid on the basis of performance against benchmark indices such as the Dow Jones Industrials or NASDAQ Composite have a great deal of influence over the direction of the market. In quest of superior performance, these managers tend to jump from one area of the market to the next, riding industry trends and switching emphasis with the ebb and flow of profitability in various sectors. The idea of using the price performance of the sector funds as inputs is appealing enough. There is even a chance that we could construct networks consisting entirely of the price action for the funds, with some funds playing the role of pattern on some networks, and as input on others. But this may not be a good idea for several reasons. For example, even the dozens of sector funds leave some areas of the economy sparsely covered. For another, the funds are managed for maximum performance, regardless of how the sector or economy is doing at the time. Holdings of such funds are thus often confined to just a few companies, and substantial cash may accumulate in some funds. In addition, the funds pay distributions, giving back a portion of their earnings to shareholders. When the fund Net Asset Value is adjusted for these distributions, the result is a sudden one-day drop in the share price unrelated to the actual market action for that day. Since the price for later days reflects this charge, there is no straightforward way in which to adjust for this phenomenon. In the pattern security, we can dig into the spreadsheet and make an acceptable adjustment by hand. For dozens or hundreds of inputs, however, this process is not too satisfactory. There is also the question of where to get data for our inputs. Virtually every kind of material imaginable is available on the Internet. As of this writing, though, data of the type that we need is not available online-- not for free, in any event. No matter how the data is obtained, some sort of underlying rationale is needed as a guide in selecting it. Data on stock-market sectors, suitable and seemingly tailor-made for Neural Net use, is in fact cheaply available. The Standard and Poors Company has formulated composites of dozens of industry groups(and sub-groups, and it often appears, sub-sub-groups). These composites are pre-adjusted for distributions to shareholders, and are published in tables. Their most convenient form is the table of index weekly closing prices, entitled Current Statistics, available at most libraries. If we use these data as inputs, we can match them with the weekly closing prices of our pattern Mutual Fund, to predict the closing price of the fund one week in advance. These data are also available, on a daily as well as a weekly basis, from download services, but the use of stock-analysis software and a download service may add hundreds of dollars to the cost of the network, and may wipe out any potential profits we stand to make. Many professional securities traders find weekly data sufficient for their purposes(3).There is an additional argument for sticking to weekly data: to cover a given time period, a weekly network will require only one-fifth as many spreadsheet entries as a daily network . The cheapest way of accessing the data is indeed to hand-keypunch the S&P data into the spreadsheet. Presumably we will do this ourselves, as we wouldn't dream of asking a subordinate or spouse to toil at such drudgery. This is a lot of data entry, so a careful look at the data is merited: is there some way to avoid entering data on all of the over 100 groups covered by S&P? There is in fact ample reason to believe that some trimming can be done. Practically, each Neural Net seems to function best with some optimum number of inputs: given too few, the network doesn"t "learn" all of the dimensions of the data surface, and given too many, the network takes an excessively long time adjusting weights between neurons reflecting redundant information--and we run the risk of "memorization" as well(4). Without a lot of experimentation, there is no way of determining how many of these groups we can cut out. As a reasonable approximation, and keeping in mind that we will find it easier to cut data later than to add it, we can go through the groups, eliminating obvious redundancies, and come up with a reduction of about 50%. We should be careful, though. In general Neural Net Theory, and also specifically in financial applications, considerable thought has been given to the process of "pruning" inputs(5). On the face of things, it makes sense not to include two inputs possessing a Correlation Coefficient of, say, 0.8. However, the advisability of deleting one of these as an input lies in the importance of the uncorrelated information that will be lost (this amounts to 36% based on the correlation). The information discarded may turn out to have been indispensable. As an example, consider the following list of Industry Groups as the input matrix for the networks: S&P 100 Capital Goods Consumer goods Energy composite Entertainment/Leisure Aluminum Automobiles Heavy Duty Trucks Building Materials Chemicals Composite Specialty Chemicals Communications Eqpt Conglomerates Containers(Metal/Glass) Insurance Composite Paper Containers Electrical Equipment Defense Electronics Electronic Instruments Engineering/Construction Foods Gold Mining Hardware/Tools Homebuilding Health Care Composite Health Care/Drugs Medical Products Hotel/Motel Household Products Machine Tools Diversified Industrials Manufactured Housing Metals Misc. Office Supplies Oil composite Oil Equipment Paper/Forest Products Pollution Control Savings and Loan Publishing Retail Store Composite Retail/Department Stores Retail/Food Retail/Gen. Merchandise Steel Textile Electric Companies Natural Gas Telephone Airlines Railroads Truckers Bank Composite Center Banks Regional Banks Computers This would leave out the following groups, among many others: S&P Small Cap Index High Tech Composite Low Price Common Stock Autos(except GM) Beverages Alcoholic Beverages/Soft Drink Broadcast Media Computer Software Computers(except IBM) Elec/Semiconductors Cnsmr. Product Dist Healthcare Hospital Management Leisure Time Shoes Even with the loss of information from the omitted groups, we have a sizable input matrix. Nonetheless the selections cover most of the economy, with overlapping coverage for the areas most likely to help in our prediction. Yet these selections reflect a conscious effort to cover diverse areas of the economy instead of concentrating on technology itself. There will of necessity be strict limits on the time period that the network covers, and there are some clues as to how long this might be. In its own fashion, our network is to make value judgments on the economic factors driving the overall economy. These change over time--after all, that is part of the rationale for the construction of the network from sector data. Over how long a period can these factors be expected to remain consistent enough to give us a solid pattern? Mutual funds advertise performance based on one-year and ten-year figures, set in type of various sizes depending on how favorable these figures appear. While Fund Managers might grudgingly reconcile themselves to being evaluated on performance for periods as short as one year, it is no secret that such evaluations are short-sighted. At the other end of the scale, we can't expect coherent economic forces over ten years. Three to five years is a reasonable compromise, and many Technical Analysts evaluate time- series data using weighted or exponentially smoothed averages, which by greatly decreasing the influence of the earliest data give extra, well, weight to the argument for a three-year period. In constructing the network, difference columns are formed from the raw price data; this enables the network to see the price fluctuations more dramatically than if the raw prices are used. To filter out some random noise, moving averages are formed, and it is at this point that a difference between daily and weekly data becomes apparent. Daily data obviously contains more noise than weekly data--effectively, the weekly data is self-smoothing. While financial institutions are not particularly forthcoming regarding the particulars of their network Programs, experiment has shown that daily data often requires five-period smoothing, rendering it almost equivalent to weekly data. For weekly figures two to three periods of smoothing are often used. Each weekly set of inputs or fact presented to the network must contain some time history to produce a prediction. Ignoring the moving-average smoothing for the moment, note that use of a three-period time distance to the oldest prediction takes into account events that have occurred a month before the prediction date. Without more information regarding the relative weight accorded to older data, it is reasonable to use inputs representing prices one, two and three weeks ago, with the possibility of sorting things out better after obtaining performance figures for the network. Inputs may be grouped in several ways. Using, for instance, the S&P Gold Mining Stock Index as a starting point, we can group the inputs as the Gold Index raw price difference, Gold Index moving average, and this moving average one, two, and three periods ago. We can also look at the inputs in classes of all the raw index price differences, moving averages, and averages for the various time periods. In constructing the spreadsheeet, column headings will facilitate our analysis of the network by designating the Gold Index as, say, p18, the difference as d18, moving average as m18, and the time-displaced averages as t18, 2t18 and 3t18. If we get a statistical report of the importance of the various inputs, tabulated by the name of each input, we can (somewhat) easily form conclusions about which inputs have the highest influence. We might also happen into a situation where, for example , the time-displaced values from two periods ago turn out have high predictive capacity. Without systematic naming of inputs, this sort of hidden relationship is difficult to recognize. With 56 inputs processed through moving averages and time displacements, the spreadsheet contains a total of 280 inputs. Using a commercial stand- alone Neural Network package and a hand-keypunched textfile spreadsheet containing a 143-week run of S&P Index data, a number of networks were set up and tested. After some experimentation, the Fidelity Select Technology Fund was picked as the target security. Networks trained using the T. Rowe Price Science and Technology Fund, the Invesco Strategic Technology Fund and the yield on the 30-year U.S. Treasury bond generated less accurate predictions for this input set, with the Invesco Fund networks showing some usefulness and the bond yield networks the least effective. The software divided the data set into training(90%) and test(10%) sets. The noisiness of the S&P data means that this division is more likely to yield an unrepresentative test set than would be the case if the data were more homogeneous(5). Instead of the usual practice of reserving 10% of cases for testing, an increase to 15% was considered; for these experiments problems were adequately minimized by making many networks with shuffled fact files. Once the spreadsheet was made and the various time-displaced moving averages were formed, some preliminary experiments were done to determine optimum size. The rule of thumb for establishing the number of neurons is to begin at a figure midway between the number of inputs and outputs; for our 280 inputs, this yielded an initial size of 140 neurons. Experiment showed, however, that a much smaller number still produced effective networks. The networks were trained using one of the approaches commonly used with Neural Network programs(6). Two parameters were varied : the Training Tolerance , or maximum acceptable proportional error necessitating no adjustment of Network weights, and the proportion of correct facts (a fact comprises all of the inputs corresponding to a single day or week, and the pattern value corresponding to the inputs). Training was started with a Training Tolerance of .20 and a requirement that the network train until 80 % of predictions are correct. These specifications were easily met but yielded an unfinished and not very useful network. The Tolerance was subsequently tightened and the number of required correct facts increased in turn. For example, the program used offers histograms indicating not only how many facts cause correct predictions, but also some indication of the size of errors on incorrectly predicted facts. With this information useful values for this progressive tightening process become apparent(Figure 1). If in many cases the network misses the first tolerance by a small amount, it should pay to tighten tolerances slightly. For other networks it may become obvious that such tightening is ineffective until more facts are brought into line at a looser initial Tolerance. The overall predictive capacity of the network may actually be damaged if training is continued until the most extreme facts are brought within training tolerance. An overall statistical error for the predictions can also be easily calculated, but it has often been observed that this parameter is no more than a loose guide toward actual network performance. In any case, varying the Tolerance and the number of acceptable facts in this way has been found to work, and it is common for such networks to behave in an almost organic fashion. Typically when a parameter is varied after having been kept constant for a while, the network will in a sense lose its equilibrium and perform poorly. With more adjustments, performance will improve, sometimes quite suddenly.Once this alternate variation seems to have reached the point of diminshing returns, statistical analysis may be of value in refining the network. In any case, once training has proceeded to its limit(governed in some cases by the amount of time available, in others by limitations of the data) predictions are tested. For the experiments set forth below, the same value was used as that of permissible training error. In practice, testing tolerances are often less stringent than those used for training. Note the training times, and overall error rates for the following neuron numbers, based on 4 shuffled replicates of the 143 weekly closes, for a total exposure of 572 facts: Number of Neurons Avg. Training Time,sec Number of Errors ------------------------------------------------------------- 4+18(2 layers) 263 62 24 234 65 18 261 61 12 280 65 The latest date covered by the data set was April 28, 1995. Shortly after this date, the S&P data began to reflect a virtually uninterrupted rise in the market ; networks constructed from such trending data reflect the overall trend rather than the influence of specific inputs. Several representative 18- Neuron networks were constructed, and their performance against the last six weeks of data was evaluated. Actual Network Predicted Value End of Period Closing Change "A" "B" "C" --------------------------------------------------------------------- 3/22/95 +1.30 -0.25 +0.30 +0.22 3/29/95 -0.78 -0.79 -0.38 -0.83 4/05/95 +1.16 +0.06 +0.28 +0.22 4/12/95 -1.90 -1.55 -1.60 -1.65 4/19/95 +2.42 +1.39 +1.27 +2.01 4/26/95 +1.02 +0.50 -0.25 +0.52 It should be noted that even with very brief training and limited use of program fine-tuning features, the networks produced good predictions of market direction, and passable predictions of magnitudes of price changes. This period was marked by no fewer than four reversals of market direction; if the networks were mere trend-following devices, the predictions would not have been as accurate as those tabulated above. Similar results were found for networks of other sizes. The April 5 data was poorly forecast by all of the networks observed. The most plausible explanation for this was that over that period market conditions not reflected by the input set had a pronounced impact on prices. Reflecting on the many short-term influences that do impact individual securities and even sectors, the effectiveness of even the more sophisticated networks used by institutions is surprising. Using the statistics generated by the program, we were able to determine that despite their simplicity the predictive capacity of these weekly S&P Index networks was approximately equal to that of many networks in use by financial institutions(7). For the S&P data set, it seemed plausible that even at these smaller neuron numbers the inputs could be pruned. Using the statistical report accompanying the software the sensitivity to inputs was tabulated and the 10% of inputs ranked as least influential were dropped. As the program faced fewer inputs, training times were similar to those of networks facing a full input set even though more runs were required. Again, several representative networks were tested on the March and April data. Actual Network Predicted Value End of Period Closing Change "D" "E" "F" ------------------------------------------------------------------ 3/22/95 +1.30 +0.29 +0.99 +0.29 3/29/95 -0.78 -0.48 0.00 -0.51 4/05/95 +1.16 +0.52 +0.73 +0.48 4/12/95 -1.90 -1.79 -1.38 -0.83 4/19/95 +2.42 +1.32 +0.01 +1.41 4/26/95 +1.02 +0.67 -0.08 +0.95 For this small sample, the average error was less for the full data set than for the pruned-input networks. These results are intended to indicate the type of adjustments that are easily done, and to give a rough outline of the results that may be obtained. While the short training times for these networks were a byproduct of the effort to minimize expenditure of time and money on input data, the speed with which such networks can be produced means that replicate networks can be made to produce a sort of network consensus decision. Because each network begins with random weights between neurons, the production of several such replicates provides the fullest use of the information content of the inputs. The decision of whether to use the network could be based solely on whether it traces the target security's price with sufficient precision. Yet it should be noted that if testing indicates that the network at least predicts the direction of price moves, even predictions that might be classified incorrect by testing criteria may be of value in reinforcing investment decisions made using other information; the networks then function as a classification mechanism. As with any management tool (and indeed, like some Neural Net applications not directly concerned with business) the practical function of the network is to give added confidence that decisions are correct. Such confidence is a valuable commodity in investing; as a rough estimate of its possible value, consider that full-commision brokers may charge several per cent of the value of transactions for their services, the most highly valued of which is market intelligence. Still, there may be data sets, or periods in the fluctuations of particular securities, where useful networks may be difficult or even impossible to construct. Whatever market knowledge is gained in constructing the network remains of value whether or not the network can be used for real-life decisions. References 1. M.Jurik , "Consumer's Guide to neural network software". Futures, July 1993 2. L. Valentine and D. Ellis, Business Cycles and Economic Forecasting, South-Western Publishing, 1991. 3. R. Colby and T. Myers, Encyclopedia of Technical Market Indicators, Dow- Jones Irwin, 1988, Chapter 9 4. J. Lawrence, Introduction to Neural Networks, CSS Publishing, 1993 5. M. Jurik, Financial Forecasting and Neural Networks, Box 2379, Aptos, CA 95001, 1991. 6. Brainmaker User's Manual, California Scientific Software, 1993. 7. J. Lawrence, personal communication.