Does Stock-Market Data Really Go Back 200 Years? * By JASON ZWEIG As of June 30, U.S. stocks have now underperformed long-term Treasury bonds for the past five, 10, 15, 20 and 25 years. Still, brokers and financial planners keep reminding us: "Ever since 1802, there's never been a 30-year period when stocks have underperformed bonds." These true believers rely on the gospel of "Stocks for the Long Run," the book by finance professor Jeremy Siegel of the Wharton School at the University of Pennsylvania that was first published in 1994. [Intelligent Investor] Heath Hinegardner Using data assembled by other scholars, Prof. Siegel extended the history of U.S. stock returns all the way back to 1802. He came to two conclusions that became articles of faith to millions of investors: Ever since Thomas Jefferson was in the White House, stocks have generated a "remarkably constant" average return of nearly 7% a year after inflation. (Adding inflation at 3% a year yields the commonly cited 10% annual stock return.) And, declared Prof. Siegel, "the risks of holding stocks decrease over time." There is just one problem with tracing stock performance all the way back to 1802: It isn't really valid. Prof. Siegel's numbers are based on data first gathered in the early 1930s by two economists, Walter Buckingham Smith and Arthur Harrison Cole. For the years 1802 through 1820, Profs. Smith and Cole collected prices on 37 banking, insurance and transportation stocks -- but ended up including only seven, all banks, in their stock-market index. Through 1845, they tracked 19 insurance stocks, but rejected 95% of them, adding only one to their index. For 1834 onward, they added a grand total of 27 railroad stocks. To be a good measure of stock returns, an index should be comprehensive (by including many stocks) and representative (by including the stocks commonly held by investors). The Smith and Cole indexes are neither, as the professors conceded in their 1935 book, "Fluctuations in American Business." They cherry-picked their indexes by throwing out any stock that didn't survive for the whole period, whose share prices were too hard to estimate or whose returns seemed "inflexible," "erratic," or not "typical." The online database of early U.S. securities at eh.net has so far identified 1,184 stocks that were listed on 10 different exchanges -- including Charleston, S.C., New Orleans, and Norfolk, Va. -- between 1790 and 1860. Thus the indexes relied on by Prof. Siegel exclude 97% of all the stocks that existed in the earliest years of the U.S. market, and include only the bluest of the blue-chip survivors. Never mind all of the canals, wooden turnpikes, rubber-hat companies and the other doomed stocks that investors lost millions on -- and whose returns may never be reconstructed. There is a second problem with Prof. Siegel's data. In an academic article published in 1992, he estimated the average annual dividend yield from 1802-1870 at 5.0%. Two years later in his book, it had grown to 6.4% -- raising the average annual return in the early years from 5.7% to 7.0% after inflation. Why does that matter? Using the higher number for the earlier period raised Prof. Siegel's estimate of the rate of return for the entire period by about half a percentage point annually. Prof. Siegel estimated in his 1992 article that $1 invested in stocks in 1802 would have grown, after inflation, to $86,100 in 1990. In his book just two years later, however, he estimated that $1 in 1802 would have mushroomed into $260,000 by 1992. But in 1991 and 1992, stocks gained 30.5% and 7.6%, respectively, which should have taken the cumulative return up to only about $121,000. Nearly all of that huge difference seems to have come from Prof. Siegel's revised number for early dividends. "I made an estimate of the dividend yield," Prof. Siegel told me, "through looking at a smaller set of securities and projecting it out." Money manager Robert Arnott of Research Affiliates LLC has recently estimated the early dividend yield at 5.2%. "Arnott has a much lower estimate, and that's a big difference," said Prof. Siegel. "I mean, I don't know what more to say." I later called Prof. Siegel again to ask him about the difference between his original research and his book, but he didn't get back to me by press time. What, then, are the odds that stocks will continue to lag behind bonds for the long run? The sad truth is that history can't tell us the answer. The 1802-to-1870 stock indexes are rotten with methodological flaws. So we have onlyd since then, or four distinct 30-year stretches of stock returns, to base our long-term investment decisions on. Another emperor of the late bull market, it seems, has turned out to have no clothes. Write to Jason Zweig at intelligentinvestor AT wsj.com