| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
3.6 STATIST: Graphing a Statistical Distribution
In the HTTP server examples we've shown thus far, we never present an image to the browser and its user. Presenting images is one task. Generating images that reflect some user input and presenting these dynamically generated images is another. In this section, we use GNUPlot for generating ‘.png’, ‘.ps’, or ‘.gif’ files.(9)
The program we develop takes the statistical parameters of two samples
and computes the t-test statistics. As a result, we get the probabilities
that the means and the variances of both samples are the same. In order to
let the user check plausibility, the program presents an image of the
distributions. The statistical computation follows
Numerical Recipes in C: The Art of Scientific Computing
by William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.
Since gawk does not have a built-in function
for the computation of the beta function, we use the ibeta function
of GNUPlot. As a side effect, we learn how to use GNUPlot as a
sophisticated calculator. The comparison of means is done as in tutest,
paragraph 14.2, page 613, and the comparison of variances is done as in ftest,
page 611 in Numerical Recipes.
As usual, we take the site-independent code for servers and append
our own functions SetUpServer and HandleGET:
| function SetUpServer() {
  TopHeader = "<HTML><title>Statistics with GAWK</title>"
  TopDoc = "<BODY>\
   <h2>Please choose one of the following actions:</h2>\
   <UL>\
    <LI><A HREF=" MyPrefix "/AboutServer>About this server</A></LI>\
    <LI><A HREF=" MyPrefix "/EnterParameters>Enter Parameters</A></LI>\
   </UL>"
  TopFooter  = "</BODY></HTML>"
  GnuPlot    = "gnuplot 2>&1"
  m1=m2=0;    v1=v2=1;    n1=n2=10
}
 | 
Here, you see the menu structure that the user sees. Later, we
will see how the program structure of the HandleGET function
reflects the menu structure. What is missing here is the link for the
image we generate. In an event-driven environment, request,
generation, and delivery of images are separated.
Notice the way we initialize the GnuPlot command string for
the pipe. By default,
GNUPlot outputs the generated image via standard output, as well as
the results of print(ed) calculations via standard error.
The redirection causes standard error to be mixed into standard
output, enabling us to read results of calculations with getline.
By initializing the statistical parameters with some meaningful
defaults, we make sure the user gets an image the first time
he uses the program.
Following is the rather long function HandleGET, which
implements the contents of this service by reacting to the different
kinds of requests from the browser. Before you start playing with
this script, make sure that your browser supports JavaScript and that it also
has this option switched on. The script uses a short snippet of
JavaScript code for delayed opening of a window with an image.
A more detailed explanation follows:
| function HandleGET() {
  if(MENU[2] == "AboutServer") {
    Document  = "This is a GUI for a statistical computation.\
      It compares means and variances of two distributions.\
      It is implemented as one GAWK script and uses GNUPLOT."
  } else if (MENU[2] == "EnterParameters") {
    Document = ""
    if ("m1" in GETARG) {     # are there parameters to compare?
      Document = Document "<SCRIPT LANGUAGE=\"JavaScript\">\
        setTimeout(\"window.open(\\\"" MyPrefix "/Image" systime()\
         "\\\",\\\"dist\\\", \\\"status=no\\\");\", 1000); </SCRIPT>"
      m1 = GETARG["m1"]; v1 = GETARG["v1"]; n1 = GETARG["n1"]
      m2 = GETARG["m2"]; v2 = GETARG["v2"]; n2 = GETARG["n2"]
      t = (m1-m2)/sqrt(v1/n1+v2/n2)
      df = (v1/n1+v2/n2)*(v1/n1+v2/n2)/((v1/n1)*(v1/n1)/(n1-1) \
           + (v2/n2)*(v2/n2) /(n2-1))
      if (v1>v2) {
          f = v1/v2
          df1 = n1 - 1
          df2 = n2 - 1
      } else {
          f = v2/v1
          df1 = n2 - 1
          df2 = n1 - 1
      }
      print "pt=ibeta(" df/2 ",0.5," df/(df+t*t) ")"  |& GnuPlot
      print "pF=2.0*ibeta(" df2/2 "," df1/2 "," \
            df2/(df2+df1*f) ")"                    |& GnuPlot
      print "print pt, pF"                         |& GnuPlot
      RS="\n"; GnuPlot |& getline; RS="\r\n"    # $1 is pt, $2 is pF
      print "invsqrt2pi=1.0/sqrt(2.0*pi)"          |& GnuPlot
      print "nd(x)=invsqrt2pi/sd*exp(-0.5*((x-mu)/sd)**2)" |& GnuPlot
      print "set term png small color"             |& GnuPlot
      #print "set term postscript color"           |& GnuPlot
      #print "set term gif medium size 320,240"    |& GnuPlot
      print "set yrange[-0.3:]"                    |& GnuPlot
      print "set label 'p(m1=m2) =" $1 "' at 0,-0.1 left"  |& GnuPlot
      print "set label 'p(v1=v2) =" $2 "' at 0,-0.2 left"  |& GnuPlot
      print "plot mu=" m1 ",sd=" sqrt(v1) ", nd(x) title 'sample 1',\
        mu=" m2 ",sd=" sqrt(v2) ", nd(x) title 'sample 2'" |& GnuPlot
      print "quit"                                         |& GnuPlot
      GnuPlot |& getline Image
      while ((GnuPlot |& getline) > 0)
          Image = Image RS $0
      close(GnuPlot)
    }
    Document = Document "\
    <h3>Do these samples have the same Gaussian distribution?</h3>\
    <FORM METHOD=GET> <TABLE BORDER CELLPADDING=5>\
    <TR>\
    <TD>1. Mean    </TD>
    <TD><input type=text name=m1 value=" m1 " size=8></TD>\
    <TD>1. Variance</TD>
    <TD><input type=text name=v1 value=" v1 " size=8></TD>\
    <TD>1. Count   </TD>
    <TD><input type=text name=n1 value=" n1 " size=8></TD>\
    </TR><TR>\
    <TD>2. Mean    </TD>
    <TD><input type=text name=m2 value=" m2 " size=8></TD>\
    <TD>2. Variance</TD>
    <TD><input type=text name=v2 value=" v2 " size=8></TD>\
    <TD>2. Count   </TD>
    <TD><input type=text name=n2 value=" n2 " size=8></TD>\
    </TR>                   <input type=submit value=\"Compute\">\
    </TABLE></FORM><BR>"
  } else if (MENU[2] ~ "Image") {
    Reason = "OK" ORS "Content-type: image/png"
    #Reason = "OK" ORS "Content-type: application/x-postscript"
    #Reason = "OK" ORS "Content-type: image/gif"
    Header = Footer = ""
    Document = Image
  }
}
 | 
As usual, we give a short description of the service in the first
menu choice. The third menu choice shows us that generation and
presentation of an image are two separate actions. While the latter
takes place quite instantly in the third menu choice, the former
takes place in the much longer second choice. Image data passes from the
generating action to the presenting action via the variable Image
that contains a complete ‘.png’ image, which is otherwise stored
in a file. If you prefer ‘.ps’ or ‘.gif’ images over the
default ‘.png’ images, you may select these options by uncommenting
the appropriate lines. But remember to do so in two places: when
telling GNUPlot which kind of images to generate, and when transmitting the
image at the end of the program.
Looking at the end of the program, the way we pass the ‘Content-type’ to the browser is a bit unusual. It is appended to the ‘OK’ of the first header line to make sure the type information becomes part of the header. The other variables that get transmitted across the network are made empty, because in this case we do not have an HTML document to transmit, but rather raw image data to contain in the body.
Most of the work is done in the second menu choice. It starts with a
strange JavaScript code snippet. When first implementing this server,
we used a short "<IMG SRC=" MyPrefix "/Image>" here. But then
browsers got smarter and tried to improve on speed by requesting the
image and the HTML code at the same time. When doing this, the browser
tries to build up a connection for the image request while the request for
the HTML text is not yet completed. The browser tries to connect
to the gawk server on port 8080 while port 8080 is still in use for
transmission of the HTML text. The connection for the image cannot be
built up, so the image appears as “broken” in the browser window.
We solved this problem by telling the browser to open a separate window
for the image, but only after a delay of 1000 milliseconds.
By this time, the server should be ready for serving the next request.
But there is one more subtlety in the JavaScript code.
Each time the JavaScript code opens a window for the image, the
name of the image is appended with a timestamp (systime).
Why this constant change of name for the image? Initially, we always named
the image Image, but then the Netscape browser noticed the name
had not changed since the previous request and displayed the
previous image (caching behavior). The server core
is implemented so that browsers are told not to cache anything.
Obviously HTTP requests do not always work as expected. One way to
circumvent the cache of such overly smart browsers is to change the
name of the image with each request. These three lines of JavaScript
caused us a lot of trouble.
The rest can be broken down into two phases. At first, we check if there are statistical parameters. When the program is first started, there usually are no parameters because it enters the page coming from the top menu. Then, we only have to present the user a form that he can use to change statistical parameters and submit them. Subsequently, the submission of the form causes the execution of the first phase because now there are parameters to handle.
Now that we have parameters, we know there will be an image available.
Therefore we insert the JavaScript code here to initiate the opening
of the image in a separate window. Then,
we prepare some variables that will be passed to GNUPlot for calculation
of the probabilities. Prior to reading the results, we must temporarily
change RS because GNUPlot separates lines with newlines.
After instructing GNUPlot to generate a ‘.png’ (or ‘.ps’ or
‘.gif’) image, we initiate the insertion of some text,
explaining the resulting probabilities. The final ‘plot’ command
actually generates the image data. This raw binary has to be read in carefully
without adding, changing, or deleting a single byte. Hence the unusual
initialization of Image and completion with a while loop.
When using this server, it soon becomes clear that it is far from being perfect. It mixes source code of six scripting languages or protocols:
-  GNU awkimplements a server for the protocol:
- HTTP which transmits:
- HTML text which contains a short piece of:
- JavaScript code opening a separate window.
- A Bourne shell script is used for piping commands into:
- GNUPlot to generate the image to be opened.
After all this work, the GNUPlot image opens in the JavaScript window where it can be viewed by the user.
It is probably better not to mix up so many different languages. The result is not very readable. Furthermore, the statistical part of the server does not take care of invalid input. Among others, using negative variances will cause invalid results.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] | 
