[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
3.9 STOXPRED: Stock Market Prediction As A Service
Far out in the uncharted backwaters of the unfashionable end of the Western Spiral arm of the Galaxy lies a small unregarded yellow sun.
Orbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue-green planet whose ape-descendent life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.
This planet has — or rather had — a problem, which was this: most of the people living on it were unhappy for pretty much of the time. Many solutions were suggested for this problem, but most of these were largely concerned with the movements of small green pieces of paper, which is odd because it wasn't the small green pieces of paper that were unhappy.
Douglas Adams, The Hitch Hiker's Guide to the Galaxy
Valuable services on the Internet are usually not implemented
as mobile agents. There are much simpler ways of implementing services.
All Unix systems provide, for example, the cron
service.
Unix system users can write a list of tasks to be done each day, each
week, twice a day, or just once. The list is entered into a file named
‘crontab’. For example, to distribute a newsletter on a daily
basis this way, use cron
for calling a script each day early
in the morning.
# run at 8 am on weekdays, distribute the newsletter 0 8 * * 1-5 $HOME/bin/daily.job >> $HOME/log/newsletter 2>&1 |
The script first looks for interesting information on the Internet, assembles it in a nice form and sends the results via email to the customers.
The following is an example of a primitive newsletter on stock market prediction. It is a report which first tries to predict the change of each share in the Dow Jones Industrial Index for the particular day. Then it mentions some especially promising shares as well as some shares which look remarkably bad on that day. The report ends with the usual disclaimer which tells every child not to try this at home and hurt anybody.
Good morning Uncle Scrooge, This is your daily stock market report for Monday, October 16, 2000. Here are the predictions for today: AA neutral GE up JNJ down MSFT neutral … UTX up DD down IBM up MO down WMT up DIS up INTC up MRK down XOM down EK down IP down The most promising shares for today are these: INTC http://biz.yahoo.com/n/i/intc.html The stock shares to avoid today are these: EK http://biz.yahoo.com/n/e/ek.html IP http://biz.yahoo.com/n/i/ip.html DD http://biz.yahoo.com/n/d/dd.html … |
The script as a whole is rather long. In order to ease the pain of studying other people's source code, we have broken the script up into meaningful parts which are invoked one after the other. The basic structure of the script is as follows:
BEGIN { Init() ReadQuotes() CleanUp() Prediction() Report() SendMail() } |
The earlier parts store data into variables and arrays which are
subsequently used by later parts of the script. The Init
function
first checks if the script is invoked correctly (without any parameters).
If not, it informs the user of the correct usage. What follows are preparations
for the retrieval of the historical quote data. The names of the 30 stock
shares are stored in an array name
along with the current date
in day
, month
, and year
.
All users who are separated from the Internet by a firewall and have to direct their Internet accesses to a proxy must supply the name of the proxy to this script with the ‘-v Proxy=name’ option. For most users, the default proxy and port number should suffice.
function Init() { if (ARGC != 1) { print "STOXPRED - daily stock share prediction" print "IN:\n no parameters, nothing on stdin" print "PARAM:\n -v Proxy=MyProxy -v ProxyPort=80" print "OUT:\n commented predictions as email" print "JK 09.10.2000" exit } # Remember ticker symbols from Dow Jones Industrial Index StockCount = split("AA GE JNJ MSFT AXP GM JPM PG BA HD KO \ SBC C HON MCD T CAT HWP MMM UTX DD IBM MO WMT DIS INTC \ MRK XOM EK IP", name); # Remember the current date as the end of the time series day = strftime("%d") month = strftime("%m") year = strftime("%Y") if (Proxy == "") Proxy = "chart.yahoo.com" if (ProxyPort == 0) ProxyPort = 80 YahooData = "/inet/tcp/0/" Proxy "/" ProxyPort } |
There are two really interesting parts in the script. One is the function which reads the historical stock quotes from an Internet server. The other is the one that does the actual prediction. In the following function we see how the quotes are read from the Yahoo server. The data which comes from the server is in CSV format (comma-separated values):
Date,Open,High,Low,Close,Volume 9-Oct-00,22.75,22.75,21.375,22.375,7888500 6-Oct-00,23.8125,24.9375,21.5625,22,10701100 5-Oct-00,24.4375,24.625,23.125,23.50,5810300 |
Lines contain values of the same time instant, whereas columns are
separated by commas and contain the kind of data that is described
in the header (first) line. At first, gawk
is instructed to
separate columns by commas (‘FS = ","’). In the loop that follows,
a connection to the Yahoo server is first opened, then a download takes
place, and finally the connection is closed. All this happens once for
each ticker symbol. In the body of this loop, an Internet address is
built up as a string according to the rules of the Yahoo server. The
starting and ending date are chosen to be exactly the same, but one year
apart in the past. All the action is initiated within the printf
command which transmits the request for data to the Yahoo server.
In the inner loop, the server's data is first read and then scanned
line by line. Only lines which have six columns and the name of a month
in the first column contain relevant data. This data is stored
in the two-dimensional array quote
; one dimension
being time, the other being the ticker symbol. During retrieval of the
first stock's data, the calendar names of the time instances are stored
in the array day
because we need them later.
function ReadQuotes() { # Retrieve historical data for each ticker symbol FS = "," for (stock = 1; stock <= StockCount; stock++) { URL = "http://chart.yahoo.com/table.csv?s=" name[stock] \ "&a=" month "&b=" day "&c=" year-1 \ "&d=" month "&e=" day "&f=" year \ "g=d&q=q&y=0&z=" name[stock] "&x=.csv" printf("GET " URL " HTTP/1.0\r\n\r\n") |& YahooData while ((YahooData |& getline) > 0) { if (NF == 6 && $1 ~ /Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec/) { if (stock == 1) days[++daycount] = $1; quote[$1, stock] = $5 } } close(YahooData) } FS = " " } |
Now that we have the data, it can be checked once again to make sure that no individual stock is missing or invalid, and that all the stock quotes are aligned correctly. Furthermore, we renumber the time instances. The most recent day gets day number 1 and all other days get consecutive numbers. All quotes are rounded toward the nearest whole number in US Dollars.
function CleanUp() { # clean up time series; eliminate incomplete data sets for (d = 1; d <= daycount; d++) { for (stock = 1; stock <= StockCount; stock++) if (! ((days[d], stock) in quote)) stock = StockCount + 10 if (stock > StockCount + 1) continue datacount++ for (stock = 1; stock <= StockCount; stock++) data[datacount, stock] = int(0.5 + quote[days[d], stock]) } delete quote delete days } |
Now we have arrived at the second really interesting part of the whole affair.
What we present here is a very primitive prediction algorithm:
If a stock fell yesterday, assume it will also fall today; if
it rose yesterday, assume it will rise today. (Feel free to replace this
algorithm with a smarter one.) If a stock changed in the same direction
on two consecutive days, this is an indication which should be highlighted.
Two-day advances are stored in hot
and two-day declines in
avoid
.
The rest of the function is a sanity check. It counts the number of correct predictions in relation to the total number of predictions one could have made in the year before.
function Prediction() { # Predict each ticker symbol by prolonging yesterday's trend for (stock = 1; stock <= StockCount; stock++) { if (data[1, stock] > data[2, stock]) { predict[stock] = "up" } else if (data[1, stock] < data[2, stock]) { predict[stock] = "down" } else { predict[stock] = "neutral" } if ((data[1, stock] > data[2, stock]) && (data[2, stock] > data[3, stock])) hot[stock] = 1 if ((data[1, stock] < data[2, stock]) && (data[2, stock] < data[3, stock])) avoid[stock] = 1 } # Do a plausibility check: how many predictions proved correct? for (s = 1; s <= StockCount; s++) { for (d = 1; d <= datacount-2; d++) { if (data[d+1, s] > data[d+2, s]) { UpCount++ } else if (data[d+1, s] < data[d+2, s]) { DownCount++ } else { NeutralCount++ } if (((data[d, s] > data[d+1, s]) && (data[d+1, s] > data[d+2, s])) || ((data[d, s] < data[d+1, s]) && (data[d+1, s] < data[d+2, s])) || ((data[d, s] == data[d+1, s]) && (data[d+1, s] == data[d+2, s]))) CorrectCount++ } } } |
At this point the hard work has been done: the array predict
contains the predictions for all the ticker symbols. It is up to the
function Report
to find some nice words to introduce the
desired information.
function Report() { # Generate report report = "\nThis is your daily " report = report "stock market report for "strftime("%A, %B %d, %Y")".\n" report = report "Here are the predictions for today:\n\n" for (stock = 1; stock <= StockCount; stock++) report = report "\t" name[stock] "\t" predict[stock] "\n" for (stock in hot) { if (HotCount++ == 0) report = report "\nThe most promising shares for today are these:\n\n" report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \ tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n" } for (stock in avoid) { if (AvoidCount++ == 0) report = report "\nThe stock shares to avoid today are these:\n\n" report = report "\t" name[stock] "\t\thttp://biz.yahoo.com/n/" \ tolower(substr(name[stock], 1, 1)) "/" tolower(name[stock]) ".html\n" } report = report "\nThis sums up to " HotCount+0 " winners and " AvoidCount+0 report = report " losers. When using this kind\nof prediction scheme for" report = report " the 12 months which lie behind us,\nwe get " UpCount report = report " 'ups' and " DownCount " 'downs' and " NeutralCount report = report " 'neutrals'. Of all\nthese " UpCount+DownCount+NeutralCount report = report " predictions " CorrectCount " proved correct next day.\n" report = report "A success rate of "\ int(100*CorrectCount/(UpCount+DownCount+NeutralCount)) "%.\n" report = report "Random choice would have produced a 33% success rate.\n" report = report "Disclaimer: Like every other prediction of the stock\n" report = report "market, this report is, of course, complete nonsense.\n" report = report "If you are stupid enough to believe these predictions\n" report = report "you should visit a doctor who can treat your ailment." } |
The function SendMail
goes through the list of customers and opens
a pipe to the mail
command for each of them. Each one receives an
email message with a proper subject heading and is addressed with his full name.
function SendMail() { # send report to customers customer["uncle.scrooge@ducktown.gov"] = "Uncle Scrooge" customer["more@utopia.org" ] = "Sir Thomas More" customer["spinoza@denhaag.nl" ] = "Baruch de Spinoza" customer["marx@highgate.uk" ] = "Karl Marx" customer["keynes@the.long.run" ] = "John Maynard Keynes" customer["bierce@devil.hell.org" ] = "Ambrose Bierce" customer["laplace@paris.fr" ] = "Pierre Simon de Laplace" for (c in customer) { MailPipe = "mail -s 'Daily Stock Prediction Newsletter'" c print "Good morning " customer[c] "," | MailPipe print report "\n.\n" | MailPipe close(MailPipe) } } |
Be patient when running the script by hand.
Retrieving the data for all the ticker symbols and sending the emails
may take several minutes to complete, depending upon network traffic
and the speed of the available Internet link.
The quality of the prediction algorithm is likely to be disappointing.
Try to find a better one.
Should you find one with a success rate of more than 50%, please tell
us about it! It is only for the sake of curiosity, of course. :-)
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |