Jamal Munshi, Sonoma State Univesity, 1992
|Stock market data have thwarted decades of effort by mathematicians and statisticians to discover their hidden pattern. Simple time series analyses including AR, MA, ARMA, and ARIMA were eventually replaced with more sophisticated instruments of torture such as spectral analysis. But the data refused to confess. |
The failure to discover the structure in price movements convinced many researchers that the movements were random. The so called random walk hypothesis (RWH) of Osborne and others was developed into the efficient market hypothesis (EMH) by Eugene Fama. The `weak form of the EMH says that movements in stock returns are independent random events independent of historical values. The rationale is that if patterns did exist, arbitrageurs would take advantage and thereby quickly eliminate them.
Both the RWH and the EMH came under immediate attack from market analysts and this attack continues to this day partly because statistics used in tests of the EMH are controversial. The null hypothesis states that the market is efficient. The test then consists of presenting convincing evidence that it is not. The tests usually fail. Many argue that the failure of these tests represent a Type II error, that is, a failure to detect a real effect because of low power of the statistical test employed.
Besides, the methods of analysis assume a normal and linear world that is difficult to defend. All residuals are assumed to be independent and normally disrtributed, all relationships are assumed to be linear, and all effects are assumed to be linearly additive with no interactions. At each point in time the data are assumed to be taken from identically distributed independent populations of numbers the other members of which are unobservable. Econometric models such as ARIMA assume that all dependencies in time are linear.
It is therefore logical to conjecture that the reason for the failure of statistics to reject the EMH is due not to the strength of the theory but to the weakness of the statistics. Many hold that a different and more powerful mathematical device that allowed for non-linearities to exist might be more successful in discovering the hidden structure of stock prices.
In the early seventies, it appeared that Catastrophe Theory was just such a device. It had a seductive ability to model long bull market periods followed by catastrophic crashes. But it proved to be a mathematical artifact whose properties could not be generalized. It yielded no secret structure or patterns in stock prices. The results of other non-EMH models such as the Rational Bubble theory and the Fads theory are equally unimpressive.
Many economists feel that the mathematics of time series implied by Chaos Theory is a promising alternative. If time series data are allowed to be non-linearly dependent, rather than independent as the EMH requires, or linearly dependent as the AR models require, then much of what appears to be erratic random behavior or "white noise" may to be part of the deterministic response of the system. Certain non-linear dynamical system of equations can generate time series data that appear remarkably similar to the observed stock market data.
By using new mathematical techniques hidden structures can be discovered in what appears to be a random time series. One technique, attributed to Lorenz, uses a plot of the data in phase space to detect patterns called strange attractors. Another method proposed by Takens uses an algorithm to determine the `correlation dimension' of the data. A low correlation dimension indicates a deterministic system. A high correlation dimension is indicative of randomness.
The correlation dimension technique has yielded mixed results with stock data. Halbert, Brock, and others working with daily returns of IBM concluded that the correlation dimension was sufficiently high to regard the time series as white noise. However, Schenkmann et al claim that weekly data of IBM returns have a significant deterministic component. These structures may not be inconsistent with the EMH if the discovery of the structure, though providing insight to economic theorists, do not provide arbitrage opportunities.
A third technique for discovering structure in time series data has been described by Mandelbrot, Hurst, Feder, and most recently by Peters . Called `rescaled range analysis', or R/S, it is a test for randomness of a series not unlike the runs test. The test rests on the relationship that in a truly random series, a serial selection of sub-samples without replacement should produce a random sampling distribution with a standard deviation given by
sigmaXbar = [ sigma/n^0.5 ] * [ (N-n)/(N-1) ]
Here sigmaXbar is the standard deviation of the distribution of sample means obtained by drawing samples without replacement of size n from a population of size N, and sigma is the standard deviation of the population, i.e., when n=1.
However, when the time series has runs, it can be shown that the exponent of n in the term `n^0.5', will differ from 0.5. The paper by Peters describes the following relationships.
R/S = NH (Peters equation 4)
where R is the range of subsample sums, S is the standard deviation of the large sample, and N is the size of the sub-samples . The `H' term is called the Hurst constant and is equal to 0.5 if no runs exist and the data are sequenced randomly. If there is a tendency for positive runs, that is increases are more likely to be followed by increases and decreases are more likely to be followed by decreases, then H will be greater than 0.5 but less than 1.0. Values of H between 0 and 0.5 are indicative of negative runs, that is increases are more likely to be followed by decreases and vice versa. Hurst and Mandelbrot have found that many natural phenomena previously thought to be random have H-values around 0.7. These values are indicative of serious departures from independence.
Once `H' is determined for a time series, the autocorrelation in the time series is computed as follows:
CN = 2(2H-1) -1
CN is he correlation coefficient and its magnitude is indicative of the degree to which the elements of the time series are dependent on historical values. The interpretation of this coefficient used by Peters to challenge the EMH is that it represents the percentage of the variation in the time series that can be explained by historical data. The weak form of the EMH would require that this correlation be zero; i.e., the observations are independent of each other. Therefore, any evidence of such a correlation can be interpreted as to mean that the weak form does not hold.
Peters studied 463 monthly returns of the S&P500 index returns, 30-year government T-bond returns, and the excess of stocks returns over the bond returns. He found, using R/S analysis, that these time series were not random but that they contained runs or persistence as evidenced by values of CN ranging from 16.8% to 24.5%. The correlation estimates indicate that a significant portion of the returns are determined by past returns. This finding appears to present a serious challenge to the efficient market hypothesis.
Peters obtained sequential subsamples for eleven different values of N and computed R/S for each N. To estimate H he converted his equation 4 to linear form by taking logarithms to yield
log(R/S) = H * log(N)
and then used OLS linear regression between log(R/S) and log(N). The slope of the regression is taken to be an unbiased estimate of H. The results are summarized in Table 1.
Summary of Results Using Logarithmic Transformations
Returns Regression Serial Correlation
Stocks -0.103 0.611 0.168
A Re-examination of the Analysis
However, the logarithmic transformation used by Peters and the interpretation of the linear regression parameters raise some questions that require a re-examination of his results. First, consider the logarithmic conversion.
The OLS regression procedure minimizes the error sum of squares between the predicted log(R/S) and the observed log(R/S). However, it does not necessarily follow that the value of H at which the error sum of squares of the log(R/S) is at a minimum is conincident with the value of H at which the error sum of squares of R/S is also at a minimum. This is because of the nature of exponential functions which assures that R/S changes more rapidly at the high end than at the low end for the same change in log(R/S).
For instance, an error of 0.1 when the ln(R/S) = 4, implies an error in R/S of about 6 but the same error in logarithms at ln(R/S)=8 carries an error in R/S of 313. To the OLS regression routine working on logarithms, these errors are equivalent. This means that it would give up an error of 300 on the high end to gain an error reduction of 6 on the low end.
Secondly, the equation to be fitted, R/S = NH may also be written as R/S = 1 * NHand taking logarithms would yield log(R/S) = log(1) + H * log(N)or, specifically, since log(1) = 0, we can write log(R/S) = 0 + H * log(N).
This means that to fit the model as stated, the intercept term must be tested against zero. If the interecept term is significantly different from zero, then the model must be rejected. In all three regression equations above, the intercept is negative and significantly different from zero. Therefore, we would expect that the computed slope is an over-stated estimate of `H.
An Alternative Interpretation
Both problems with the logarithmic transformation mentioned above may be avoided by applying a non-linear least squares fit directly to the model. The results of such a procedure are so different from those obtained by logarithmic transformations that the interpretation and conclusion must be re-evaluated.
Table 2 shows the data used by Peters to infer his regression parameters. Figure 1 shows the error sum of squares plotted against values of H. An unbiased estimator of H is that at which the error sum of squares is at a minimum. These values of H, shown in Table 3, are significantly different from those shown in Table 1 and they are closer to 0.5 than previously thought. In particular, the correlations are much lower; that is, a much lower proportion of the variance in security returns are determined by runs or persistence. Rather than 16% to 25% , past prices only explain 5% to 13% of returns variance.
The Data Used in the Regression Models
Summary of Results Using Non-Linear Regression
Returns Regression Serial Correlation
Stocks 0 0.56 .0867
A comparison of the logarithmic fit to the direct non-linear fit is shown in Figures 2, 3, and 4. In each case, the non-linear fit follows the data more closely while the logarithmic fit shows wide dispersions at the high end as expected.
This analysis shows that the amount of variance in returns explained by the fractal model is very low. It has not been established that the correlation is significantly different from zero. Even if it were, the low magnitude of the correlation precludes any conclusions of practical significance either in terms of arbitrage profits or financial theory. Therefore, the derived model parameters may not be subjected to interpretations with regard to behavior of the market and the results may not be considered to be inconsistent with the efficient market hypothesis.