Chapter 5: How Gmail Works

By now you’ve learned how to use Gmail with some flair, and you can change the way it looks to a certain extent. Now you have to look into exactly how it works. You already know that the majority of the Gmail functionality is enacted client-side — that is, on the browser, rather than at the server — and is done with JavaScript. This chapter describes exactly how this works and how you can exploit it.

Chapter 5: How Gmail Works

What the Devil Is Going On?

Before revealing just what’s happening, let’s recap. In Chapter 4
you used the DOM inspector inside Firefox to help you dissect
the HTML, and this will help you again. So, as before, open up
Gmail in Firefox, and open the DOM inspector.
You already know that the main document is made of two frames,
the first made of many subframes and the second one with nothing but a huge chunk of JavaScript. Figure 5-1 shows you that in
the DOM inspector.
Using the DOM inspector’s right-click menu Copy as XML
function, you can grab the text of the script and copy it to a text
editor. Ordinarily, I would include this code as a listing right
here, but when I cut and pasted it into the manuscript of this
book, it added another 120 pages in a single keystroke. This does
not bode well, especially as Google has tried as hard as it can to
format the JavaScript as tightly as possible. This saves bandwidth
but doesn’t help anyone else read what Google is doing. We’ll
reach that problem in a page or two.

Back to the browser, then, and you find you have a very complicated page seemingly made up of close to 250KB of JavaScript, one iFrame you can see, and
apparently ten or more that don’t appear on the screen. Furthermore, the eagleeyed in our midst will have noticed that the Gmail URL doesn’t change very
much when you’re moving around the application. Changing from Inbox to All
Mail for the subset of your mail you want to see on the screen changes the page
but not the URL. For anyone used to, say, Hotmail, this is all very puzzling.

Preloading the Interface

What is actually happening is this: Gmail loads its entire interface into the one
single HTML page. When you move around the application, you’re not loading
new pages, but triggering the JavaScript to show you other parts of the page you
have already in your browser’s memory. This is why it is so fast: There’s no network connection needed to bring up the Compose window, or show the Settings
page, as you’ve already loaded it. You can see this inside the DOM inspector.
Figure 5-2 shows the section of the page with the various divs, each containing
part of the interface.
You’ll remember from Chapter 4 that the div d_tlist contains the majority of
the interface for the Inbox. Well, further inspection shows that d_comp holds the
Compose window, and d_prefs hold the Settings window, and so on.
This is all very interesting, but it doesn’t really show how the application works. If
anything, it asks a difficult question: if the page never refreshes, how does it send
or receive any messages? The answer to this is in the JavaScript, and the use of one
very clever function, XMLHttpRequest

Introducing XMLHttpRequest

I like to think of this as quite a romantic story. JavaScript, you see, has had a bad rap over the years: it’s commonly misconceived as a scrappy language for dodgy website effects circa 1999, and up there with the tag as something to be avoided by the truly righteous web developer. This is, of course, utter rot: Modern JavaScript is a rich and powerful language, and is rapidly regaining momentum. Perhaps since IE5 was launched, and certainly since Mozilla and Safari became mainstream, the majority of browsers have been capable of doing some very clever things in JavaScript. It’s just that no one bothered to look. One such function is XMLHttpRequest. Invented by Microsoft and now universally implemented, it allows a JavaScript program to communicate with a server in the background, without refreshing the page. This is very key for Gmail. It means that the JavaScript code can, upon a button push or any other trigger, send a tiny request to the Gmail server, parse the response, and throw it onto the screen, entirely without refreshing the page or causing any more traffic than is really necessary. It’s blazingly fast, especially if you have a server optimized for just such a thing. Google, naturally, does

Using XMLHttpRequest Yourself

To get an idea of just what is going on, it’s a good idea to use XMLHttpRequest
yourself. In this section you’ll use it to create a little application of your own. You
can skip this section if you’re not interested in a deep understanding, but it’s pretty
cool stuff to play with anyway.
10_59611x ch05.qxp 11/28/05 11:15 PM Page 55
56 Part II — Getting Inside Gmail
First, open up a directory on a website. You’ll need to access it via a proper
domain, you see. Create the directory, and make sure your browser can see it. In
that directory, place a text file, called Listing.txt, and put the exclamation
“Horrible!” inside the file. Bear with me.
Then create an HTML file, containing the code in Listing 5-1, and save this file
to the directory you created earlier

Finding XMLHttpRequest within the Gmail code

Don’t take the presence of XMLHttpRequest within Gmail on trust. You can see
this in action in Gmail’s own code. Go back to the DOM inspector and open the
second frameset — the one with all of the JavaScript in it. Copy the entire script
into a text editor and save it, as you’re going to refer to it a lot in this section.
Once you’ve done that, search for the string xmlhttp. You’ll find the function in
Listing 5-2.
Listing 5-2: Gmail’s XMLHttpRequest Function
function zd(){var R=null;if(da){var
vN=lJ?”Microsoft.XMLHTTP”:”Msxml2.XMLHTTP”;try{R=new
ActiveXObject(vN)}catch(f){C(f);alert(“You need to enable active
scripting and activeX controls.”)}}else{R=new
XMLHttpRequest();if(!R){;alert(“XMLHttpRequest is not supported on
this browser.”)}}return R}
As with all of the Gmail JavaScript, this is compressed and slightly confusing.
Reformatted, it looks like Listing 5-3.
Listing 5-3: Gmail’s XMLHttpRequest Function, Tidied
function zd(){
var R=null;
if(da){
var vN=lJ?”Microsoft.XMLHTTP”:”Msxml2.XMLHTTP”;
try{R=new ActiveXObject(vN)}
catch(f){
C(f);alert(“You need to enable active scripting and
activeX controls.”)}
}else{
R=new XMLHttpRequest();
if(!R){
;alert(“XMLHttpRequest is not supported on this
browser.”)}
}
return R}
This listing does exactly the same thing you did earlier: tries out the Microsoft
Active X controls, then tries the more standard XMLHttpRequest and then, if all
fails, bails with an error message. For future reference, and remember this because
you’ll need it later, the XMLHttpRequest object in the Gmail code is called R

Sniffing the Network Traffic

So now that you understand how XMLHttpRequest works, you’re led to some further questions: What is being sent and received using the XMLHttpRequest functions, and what are the URLs? Once you know the answers to these questions,
you can write your own code to spoof these requests, and can then interface
directly with the Gmail system. The rest of the book relies on this idea.
To find out what Gmail is saying to the browser, use a new tool: the packet sniffer.
This is a generic term for a range of applications that can listen to raw network
traffic, display it on the screen, log it, analyze it, and so on. What you’re interested
in is watching what your browser is doing in the background: what it is sending,
where it is sending it to, and then the replies it is getting.
My packet sniffer of choice for this job is Jeremy Elson’s Tcpflow, available at
www.circlemud.org/~jelson/software/tcpflow/.
I use Marc Liyanage’s OS X package, which you can download from
www.entropy.ch/software/macosx/#tcpflow.
Tcpflow is available under the GPL, and can be compiled on most proper computing platforms. Windows users will need to look elsewhere, but the following
techniques remain the same

Firing Up Tcpflow

Install Tcpflow, and set it running inside a terminal window, monitoring port 80. On my machine, that means typing the following: sudo tcpflow -c port 80 Then open a browser and request a page. Any will do: Figure 5-5 shows the start of a typical result. As you can see from the figure and your own screen, Tcpflow captures all of the traffic flowing backward and forward across Port 80 — all your web traffic, in other words. It shows the requests and the answers: headers, content, and all. Tcpflow is perfect for the job. But there’s a snag. Open up Gmail, and let it sit there for a while. After it settles down, you will notice that Tcpflow regularly burps up new traffic looking very similar to Listing 5-4. This is Gmail’s heartbeat: checking for new mail. But it’s very odd looking. 10_59611x ch05.qxp 11/28/05 11:15 PM Page 62 Chapter 5 — How Gmail Works 63 FIGURE 5-5: The start of a Tcpflow session Listing 5-4: Gmail Checking for New Mail 216.239.057.107.00080-192.168.016.050.59607: HTTP/1.1 200 OK Set-Cookie: SID=AfzuOeCbwFixNvWd6vNt7bUR2DpPxRzYhOB54dzyYwHeLIHjVq_eeHH5s6MYQbPE0hVUK_LMROFuRWkMhfSR-U=; Domain=.google.com;Path=/;Expires=Tue, 06-Jan-2015 00:12:12 GMT Set-Cookie: GBE=; Expires=Fri, 07-Jan-05 00:12:12 GMT; Path=/ Cache-control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Content-Encoding: gzip Transfer-Encoding: chunked Server: GFE/1.3 Date: Sat, 08 Jan 2005 00:12:12 GMT a .......... 216.239.057.107.00080-192.168.016.050.59607: 2c8 R...A{[uj...*..lQ...D.M.”.h...}...”G...RD..7../}.c...K H$g.....U.........M-.J 4......Y.......&....M.(..=.b..t...t.M.*...S!.....dZ.r......... ..w..iy....RQ.T.....n.....n.*.sqK.0.e.Y.m..g...h....{.k[i.k... ..,d!....X..”...Y.a..v......;...J.f29.4....E...Q..,.gA.D.<.... l....r...n0X..z.]0...~g>o1.. x1,...U..f.VK....R++.6. Continued 10_59611x ch05.qxp 11/28/05 11:15 PM Page 63 64 Part II — Getting Inside Gmail Listing 5-4 (continued) .YG......Q...Y......V.O...v Oh7.D.M.X..3{%f.6].N...V*[email protected]..)8..?Z./o....j*o .........3.. !=*.a.v.s..........”\..i{.;o..nh....K+q.\||...G.3]....x.;h.].r ...+..U?,...c........s..PF.%!....i2...}..’+.zP._. ....M...a35u]9.........-A...2.].F|.=..eQK ..5k.qt.....Wt..@Wf{.y.I.. X..*;.D...<*.r.E>...?.uK9p...RC..c..C.~.<..<..0q..9..I.pg.>... . ...x$.......... The headers are understandable enough, but the content is very strange indeed. This is because your browser is taking advantage of Gzip encoding. Most modern web servers can serve content encoded with the Gzip algorithm, and most modern browsers are happy to decode it on the fly. Human brains, of course, cannot, so you need to force Gmail to send whatever it is sending over unencoded. In the first few chapters of this book, you’ve been using Firefox, so return to that browser again now. In the address bar, type the URL about:config. You should see a page looking like Figure 5-6. FIGURE 5-6: The Firefox secret settings page 10_59611x ch05.qxp 11/28/05 11:15 PM Page 64 Chapter 5 — How Gmail Works 65 This page allows you to change the more fundamental browser settings. You need to change only one. Scroll down to network.http.accept-encoding and click on the string. By default it reads gzip/deflate. Just delete that, and leave it blank, as shown in Figure 5-7. FIGURE 5-7: The changed HTTP setting Empty Firefox’s cache to prevent a strange bug, and restart the browser for good measure. Now go back to Gmail and watch for the heartbeat. It will now look like Listing 5-5. Listing 5-5: Gmail’s Heartbeat, Unencoded 192.168.016.050.59622-216.239.057.107.00080: GET /gmail?ik=344af70c5d&view=tl&search=inbox&start=0&tlt=1014fb79 f15&fp=54910421598b5190&auto=1&zx=24c4d6962ec6325a216123479 HTTP/1.1 Host: gmail.google.com User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9 ,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-gb,en;q=0.5 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://gmail.google.com/gmail?ik=344af70c5d&search=inbox&view= tl&start=0&zx=24c4d6962ec6325a116384500 Cookie: GV=101014fb09ab5-af53c8c5457de50bec33d5d6436e82c6; PREF=ID=2dfd9a4e4dba3a9f:CR=1:TM=1100698881:LM=1101753089:GM=1 :S=nJnfdWng4uY7FKfO; SID=AcwnzkuZa4aCDnqVeiG6- pM487sZLlfXBz2JqrHFdjIueLIHjVq_eeHH5s6MYQbPE4wm3vinOWMnavqPWq3 SNNY=; GMAIL_AT=e6980e93d906d564-1014fb09ab7; S=gmail=h7zPAJFLoyE:gmproxy=bnNkgpqwUAI; TZ=-60 216.239.057.107.00080-192.168.016.050.59622: HTTP/1.1 200 OK Continued 10_59611x ch05.qxp 11/28/05 11:15 PM Page 65 66 Part II — Getting Inside Gmail Listing 5-5 (continued) Set-Cookie: SID=AbF6fUKA6tCIrC8Hv0JZuL5cLPt3vlO6qonGit87BAlMeLIHjVq_eeHH5s 6MYQbPE-F6IjzxJjnWuwgSIxPn3GQ=;Domain=.google.com;Path=/ Cache-control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Server: GFE/1.3 Date: Sat, 08 Jan 2005 00:31:09 GMT 62 This you can recognize: The heartbeat had my browser requesting the following URL: /gmail?ik=344af70c5d&view=tl&search=inbox&start=0&tlt=1014fb79f15& fp=54910421598b5190&auto=1&zx=24c4d6962ec6325a216123479 Likewise, the heartbeat had my browser passing the following cookie: Cookie: GV=101014fb09ab5-af53c8c5457de50bec33d5d6436e82c6; PREF=ID=2dfd9a4e4dba3a9f:CR=1:TM=1100698881:LM=1101753089:GM=1:S=n JnfdWng4uY7FKfO; SID=AcwnzkuZa4aCDnqVeiG6- pM487sZLlfXBz2JqrHFdjIueLIHjVq_eeHH5s6MYQbPE4wm3vinOWMnavqPWq3SNNY =; GMAIL_AT=e6980e93d906d564-1014fb09ab7; S=gmail=h7zPAJFLoyE:gmproxy=bnNkgpqwUAI; TZ=-60 The browser then received a new cookie: SID=AbF6fUKA6tCIrC8Hv0JZuL5cLPt3vlO6qonGit87BAlMeLIHjVq_eeHH5s6MYQ bPE-F6IjzxJjnWuwgSIxPn3GQ=;Domain=.google.com;Path=/ Along with the new cookie, my browser also received a snippet of JavaScript as the contents of the page: What can you tell from all of this? Well, you now know how Gmail on your browser communicates with the server, and you know how to listen in on the conversation. Two things remain in this chapter, therefore: collecting as many of these phrases as possible and then working out what they mean.

Prodding Gmail to Hear It Squeak

The technique to further learn Gmail’s secrets is obvious. Use it — sending mail,
receiving mail, and so on — and watch what it does in the background. From
these clues, and the JavaScript listing you already have, you can piece together a
complete picture of the Gmail server’s interface. And it’s that interface that you
ultimately want to deal with directly.
To get a clear idea of what is going on, you need to capture everything that happens when Gmail is loaded, when it sits idle, and when you perform the common
actions with it.

Preparing to Watch the Gmail Boot Sequence

To start the process with gusto, open up Firefox again, and clear all of the caches.
Then open up a terminal window, and set Tcpflow running, and save its output to
a text file, like so:
sudo tcpflow -c ‘(port 80 or 443)’ >> login_capture.txt
This records everything that goes over HTTP or HTTPS. Then log in to Gmail
until you get to a nice, calm, idle Inbox like the placid Inbox shown in Figure 5-8.
FIGURE 5-8: A nice, calm Inbox at the end of the boot sequence
10_59611x ch05.qxp 11/28/05 11:15 PM Page 67
68 Part II — Getting Inside Gmail
You’ll be referring back to this figure in a page or two.
Now, stop the Tcpflow application with a judicious Control+c and open up the
login_capture.txt file.

Cleaning Up the Log

Before looking through the log properly, it needs to be cleaned up a bit. There’s a
lot of information that you don’t need. For instance, every request sent by my
browser has this code, which is superfluous to your needs:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9
,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Search for this code and replace it with a single new line. Next, toward the end,
line 1862 in my working version is a whole collection of requests and responses
for image files. You’re not interested in these at all, so you can reduce them until
they look like so:
192.168.016.053.64150-216.239.057.106.00080: GET
/gmail/help/images/logo.gif 216.239.057.106.00080-
192.168.016.053.64150: HTTP/1.1 200 OK
This makes things much more readable. Now, between lines 394 and 1712 (more
or less, it may be slightly different in your log file) is the serving of the one enormous JavaScript file. Strip the code out, and replace it with your own comment.
Finally, right at the beginning, are a few pages going backward and forward that
seem to be made of utter nonsense. These are encrypted. So, again, strip them out
and replace them with a comment.
You should now have around 500 lines of traffic between your browser and Gmail.
It’s time to step through it and see what is going on. To see the entire boot
sequence log, flip to Appendix A and look through Listing A-3.

Stepping Through the Gmail Boot Sequence

To be able to write an API, you need to know how the login works, so we shall start
there. In all of the following, my machine has the IP address 192.168.016.053.
10_59611x ch05.qxp 11/28/05 11:15 PM Page 68
Chapter 5 — How Gmail Works 69
Logging In
Start by requesting the page http://gmail.google.com. Whereupon,
Gmail replies back with an http 302 redirect to https://gmail.google.
com/?dest=http%3A%2F%2Fgmail.google.com%2Fgmail, which the browser
automatically follows, switching to encrypted traffic:
192.168.016.053.64142-216.239.057.106.00080: GET / HTTP/1.1
Host: gmail.google.com
216.239.057.106.00080-192.168.016.053.64142: HTTP/1.1 302
Moved Temporarily
Location:
https://gmail.google.com/?dest=http%3A%2F%2Fgmail.google.com%2
Fgmail
Cache-control: private
Content-Length: 0
Content-Type: text/html
Server: GFE/1.3
Date: Sun, 16 Jan 2005 17:11:18 GMT
192.168.016.053.64143-216.239.057.106.00443
LOTS OF ENCRYPTED TRAFFIC CLIPPED OUT FROM THIS SECTION
Because the login page is encrypted — the traffic flows over HTTPS not HTTP —
you can’t follow what it does using the log. You need to use a script to follow the
URLs until you get back to the trace. I used the following snippet of Perl code to
pretend to be a browser to see what is going on:
#!/usr/bin/perl -w
use LWP::UserAgent;
use HTTP::Request;
This Is Going to Break
During the writing of this book, the Gmail login sequence has changed at least three times. Not
massively so, it must be said, but enough to break code until I worked out just what had
changed. This section, and the chapters following, therefore, must be taken as guides to reverse
engineering the thing yourself, and not as a definitive reference to the Gmail login sequence. If
what I describe here no longer matches reality completely, I apologize. Take solace in the fact
that I have no idea what Google is up to either

Logging In

Start by requesting the page http://gmail.google.com. Whereupon, Gmail replies back with an http 302 redirect to https://gmail.google. com/?dest=http%3A%2F%2Fgmail.google.com%2Fgmail, which the browser automatically follows, switching to encrypted traffic: 192.168.016.053.64142-216.239.057.106.00080: GET / HTTP/1.1 Host: gmail.google.com 216.239.057.106.00080-192.168.016.053.64142: HTTP/1.1 302 Moved Temporarily Location: https://gmail.google.com/?dest=http%3A%2F%2Fgmail.google.com%2 Fgmail Cache-control: private Content-Length: 0 Content-Type: text/html Server: GFE/1.3 Date: Sun, 16 Jan 2005 17:11:18 GMT 192.168.016.053.64143-216.239.057.106.00443 LOTS OF ENCRYPTED TRAFFIC CLIPPED OUT FROM THIS SECTION Because the login page is encrypted — the traffic flows over HTTPS not HTTP — you can’t follow what it does using the log. You need to use a script to follow the URLs until you get back to the trace. I used the following snippet of Perl code to pretend to be a browser to see what is going on: #!/usr/bin/perl -w use LWP::UserAgent; use HTTP::Request; This Is Going to Break During the writing of this book, the Gmail login sequence has changed at least three times. Not massively so, it must be said, but enough to break code until I worked out just what had changed. This section, and the chapters following, therefore, must be taken as guides to reverse engineering the thing yourself, and not as a definitive reference to the Gmail login sequence. If what I describe here no longer matches reality completely, I apologize. Take solace in the fact that I have no idea what Google is up to either. 10_59611x ch05.qxp 11/28/05 11:15 PM Page 69 70 Part II — Getting Inside Gmail use Crypt::SSLeay; my $ua = LWP::UserAgent->new(); $ua -> agent(“Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)”); my $request = HTTP::Request->new(GET => ‘https://gmail.google.com/’); my $result = $ua->request($request); if ($result->is_success) { print $result->content; } else { print $result->status_line; } You can infer from actually doing it, or by using a script like the one above, that the page continues with another redirect (or perhaps more than one), finally ending up at https://www.google.com/accounts/ServiceLogin? service=mail&continue=http%3A%2F%2Fgmail.google.com%2Fgmail, as you can see in Figure 5-9. FIGURE 5-9: The Gmail login screen 10_59611x ch05.qxp 11/28/05 11:15 PM Page 70 Chapter 5 — How Gmail Works 71 Viewing source on this page shows you two important things. First, there is the username and password form itself and second some JavaScript that sets a cookie. Deal with the form first. Listing 5-6 gives a cleaned-up version of the code, with the styling removed. Listing 5-6: The Gmail Login Form

Username: Password: Don’t ask for my password for 2 weeks.

From this we can see that the URL the page POSTs towards to log in is produced as follows, split here for clarity. https://www.google.com/accounts/ServiceLoginBoxAuth/continue=h ttps://gmail.google.com/gmail &service=mail &Email=XXXXX &Passwd=XXXXX &PersistentCookie=yes &null=Sign%20in You will need this later on, but now, the cookie setting.

The First Cookie

This JavaScript sets two cookies. The first, GMAIL_LOGIN2, is set with a value of
Tstart_time/start_time/now where both start_time and now are the datetime exactly then. As you can see from the comments in the code, Google intends
to replace this in the future.
The second cookie is called GMAIL_RTT2 and contains the time it takes to retrieve
a 1-pixel image file from the Gmail servers. RTT, presumably, stands for Round
Trip Time.
You won’t look at it in this book, but the rest of the JavaScript code on that page
presents a very nice listing of a browser check that removes the login window if
the browser isn’t capable of using Gmail.
If you watch the Gmail login sequence from your own browser, you will see that it
goes through more redirects before it settles into HTTP again, and you can see
what is going on from the Tcpflow trace file.
Hitting stop on the browser at just the right time (and that is, to quote the fine
words of my editor, a total crapshoot), gives you this URL:

You have seen this sort of URL before: Look back again at Listing A-3, after the
second excised block of encrypted code. So now you know that between the form
submission and the page you get in Listing 5-8, something else happens. You can
also guess that something happens to the cookie you set on the first page — it is
being checked for something. Considering that those cookies do not contain anything but the time they were set, I am guessing that this step is to ensure that the
connection is current and not the result of caching from someone’s browser. It’s to
ensure a good, fresh session with Gmail on the part of the browser application and
the user himself. Or so I would guess.
Either way, the boot sequence continues from here automatically, with everything
in standard HTTP. You will see within the trace that the boot sequence loads the
Inbox next. So that’s what the next section considers

Loading the Inbox

As you come to the end of the boot sequence you have nothing to do but load in the Inbox and address book. This section deals specifically with the Inbox loading. The output from the Tcpflow program earlier in this chapter doesn’t contain enough mail to be of use in this regard, but if you do the trace again, only this time with a few more messages in the Inbox, you can see what is going on. Figure 5-10 shows the new Inbox, loaded with messages. 10_59611x ch05.qxp 11/28/05 11:15 PM Page 74 Chapter 5 — How Gmail Works 75 FIGURE 5-10: Gmail with some new, unread messages Listing 5-9 shows the new trace. A Summary of the Login Procedure As I have said before, the login procedure for Gmail seems to be changing on a very regular basis. Check with the libraries examined in Chapter 6 for the latest news on this. Basically, however, the login procedure goes like this, with each step moving on only if the previous was reported successful. 1. Request the Gmail page. 2. Set the two cookies. 3. Send the contents of the form. 4. Request the cookie check page. 5. Request the Inbox. 10_59611x ch05.qxp 11/28/05 11:15 PM Page 75 76 Part II — Getting Inside Gmail Listing 5-9: The Inbox with More Messages Within 192.168.016.051.59905-064.233.171.107.00080: GET /gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=vzmurwe44cpx 6l HTTP/1.1 Host: gmail.google.com User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9 ,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-gb,en;q=0.5 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://gmail.google.com/gmail/html/hist2.html Cookie: GV=1010186d43b2b-b6b21a87a46b00d1bc5abf1a97357dd7; PREF=ID=0070250e68e17190:CR=1:TM=1106068639:LM=1106068639:S=O1 Nivj_xqk7kvdGK; GMAIL_LOGIN=T1106068635841/1106068635841/1106068648645; SID=DQAAAGoAAAC06FIY2Ix4DJlCk7ceaOnWPvpK4eWn9oV6xpmOT4sNhdBPkZ 2npQE8Vi8mWY9RybWVwJet9CHeRBw99oUdRqQHvBb8IWxhLcurTBFZJstXoUbW FDZTmxZKt55eUxnspTHLanel9LsAU1wqHcHhlHI7; GMAIL_AT=5282720a551b82df-10186d43b2e; S=gmail=WczKrZ6s5sc:gmproxy=UMnFEH_hYC8; TZ=-60 064.233.171.107.00080-192.168.016.051.59905: HTTP/1.1 200 OK Set-Cookie: SID=DQAAAGoAAAC06FIY2Ix4DJlCk7ceaOnWPvpK4eWn9oV6xpmOT4sNhdBPkZ 2npQE8Vi8mWY9RybWVwJet9CHeRBw99oUdRqQHvBb8IWxhLcurTBFZJstXoUbW FDZTmxZKt55eUxnspTHLanel9LsAU1wqHcHhlHI7;Domain=.google.com;Pa th=/ Cache-control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Server: GFE/1.3 Date: Tue, 18 Jan 2005 17:17:36 GMT 936 What to make of these traces? First, you can see that to call the contents of the Inbox, the browser requests two URLs. First, this one: /gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=z6te3fe41hmsjo And next, this one: /gmail?ik=&search=inbox&view=tl&start=0&init=1&zx=781ttme448dfs9 And second, it appears that the real workings of the Inbox are contained in the JavaScript function that starts D([“t”]), as Listings 5-10 and 5-11 show. Listing 5-10: With One Message D([“t”,[“101480d8ef5dc74a”,0,0,”Jan 6”,”Ben Hammersley”,”» ”,”Here\’s a nice message.”,,[] ,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”] ] ); 10_59611x ch05.qxp 11/28/05 11:15 PM Page 78 Chapter 5 — How Gmail Works 79 Listing 5-11: With Three Messages D([“t”,[“101865c04ac2427f”,1,0,”4:06pm”,”Ben Hammersley”,”» ”,”This is the third message”,,[] ,””,”101865c04ac2427f”,0,”Tue Jan 18 2005_7:06AM”] ,[“101865b95fc7a35a”,1,0,”4:05pm”,”Ben Hammersley”,”» ”,”This is the second message”,,[] ,””,”101865b95fc7a35a”,0,”Tue Jan 18 2005_7:05AM”] ,[“101480d8ef5dc74a”,0,1,”Jan 6”,”Ben Hammersley”,”» ”,”Here\’s a nice message.”,,[“^t”,”Heads”] ,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”] ] ); From looking at these listings, you can deduce that the Inbox structure consists of one or more of the following arrays (I’ve added in line breaks for clarity): [ “101480d8ef5dc74a”, 0, 0, “Jan 6”, “Ben Hammersley”, “» ”, “Here\’s a nice message.”, ,[] ,”” ,”101480d8ef5dc74a” ,0 ,”Thu Jan 6 2005_4:44AM” ] From further deduction, where I sent different types of e-mail to Gmail and watched what it did — I’ll omit all of that here for the sake of brevity, but you should have the idea — you can see that the array consists of the following: [ “101480d8ef5dc74a”, -> The message id. 0, -> Unread=1, Read=0 0, -> Starred=1, plain=0 10_59611x ch05.qxp 11/28/05 11:15 PM Page 79 80 Part II — Getting Inside Gmail “Jan 6”, -> The date displayed “Ben Hammersley”, -> Who sent it “» ”, -> The little icon in the inbox “Here\’s a nice message.”, -> The subject line ,[] -> Labels ,”” -> Attachments ,”101480d8ef5dc74a” -> The message ID ,0 -> Unknown ,”Thu Jan 6 2005_4:44AM” -> The full date and time ] You now know how to decode the Gmail mail listing. You can also see how to request this data structure — by calling the URL, and parsing the returned JavaScript function. You can do this in simple regular expressions, a topic explored in Chapter 7.

Storage Space

The detail of the mail in the Inbox isn’t the only information sent when you
request that URL. Look above the mail function and you can see the following:
D([“qu”,”1 MB”,”1000 MB”,”0%”,”#006633”]
This line of data sent from Gmail’s servers clearly corresponds to the display at
the bottom of the screen giving your mailbox usage statistics:
 D([“qu”,: The name of the Gmail function that deals with the usage 
information.
 “1 MB”,: The amount of storage used.
 “1000 MB”,: The maximum amount available.
 “0%”,: The percentage used.
 “#006633”: The hex value for a nice shade of green.

Labels

In Figure 5-10 I have added some labels to the Gmail system. Spotting them in
the Tcpflow is easy:
D([“ct”,[[“Heads”,0],[“Knees”,0],[“Shoulders”,0],[“Toes”,0]]]);
You can deduce straight away that the function starting with D([“ct” contains
the names and an unknown value (perhaps it’s a Boolean, perhaps it’s a string, you
don’t know as yet) of the Labels. You can more easily harvest this data when you
come to write your own API.

Reading an Individual Mail

Fire up Tcpflow again, and click one of the messages in the Inbox in Figure 5-10. The trace resulting from this action is shown in Listing 5-12. Listing 5-12: Trace from Reading a Message 192.168.016.051.59936-064.233.171.105.00080: GET /gmail?ik=344af70c5d&view=cv&search=inbox&th=101865c04ac2427f& lvp=-1&cvp=0&zx=9m4966e44e98uu HTTP/1.1 Host: gmail.google.com User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0 Accept:text/xml,application/xml,application/xhtml+xml,text/htm l;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-gb,en;q=0.5 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://gmail.google.com/gmail?ik=&search=inbox&view=tl&start=0 &init=1&zx=iv37tme44d1tx5 Cookie: GV=1010186dcc455-ce01891ce232fa09b7f9bcfb46adf4e7; PREF=ID=0070250e68e17190:CR=1:TM=1106068639:LM=1106068659:GM=1 :S=3jNiVz8ZpaPf0GW0; S=gmail=WczKrZ6s5sc:gmproxy=UMnFEH_hYC8; TZ=-60; SID=DQAAAGoAAACm_kF5GqnusK0rbFcAlLKoJUx26l6npH5Een1P_hN--yWqycLWSJUZt3G9Td_Cgw_ZK1naS891aWxZ6IkbNiBFN1J4lmO COTvOn7r3bnYjWlOqB6netb06ByuEf56Cd12ilfgika0MxmuamO3FWzw; GMAIL_AT=29a3f526e2461d87-10186dcc456; GBE=d-540-800 064.233.171.105.00080-192.168.016.051.59936: HTTP/1.1 200 OK Set-Cookie: SID=DQAAAGoAAACm_kF5GqnusK0rbFcAlLKoJUx26l6npH5Een1P_hN--yWqycLWSJUZt3G9Td_Cgw_ZK1naS891aWxZ6IkbNiBFN1J4lmO COTvOn7r3bnYjWlOqB6netb06ByuEf56Cd12ilfgika0MxmuamO3FWzw;Domai n=.google.com;Path=/ Set-Cookie: GBE=; Expires=Mon, 17-Jan-05 18:00:37 GMT; Path=/ Cache-control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Server: GFE/1.3 Continued 10_59611x ch05.qxp 11/28/05 11:15 PM Page 81 82 Part II — Getting Inside Gmail Listing 5-12 (continued) Date: Tue, 18 Jan 2005 18:00:37 GMT 4d5 First thing first: the URL. Requesting this message caused Gmail to load this URL: /gmail?ik=344af70c5d&view=cv&search=inbox&th=101865c04ac2427f&l vp=-1&cvp=0&zx=9m4966e44e98uu. Or, to put it more understandably: /gmail? ik=344af70c5d &view=cv &search=inbox &th=101865c04ac2427f &lvp=-1 &cvp=0 &zx=9m4966e44e98uu As you can see, th is the message ID of the message I clicked on. But the others are mysterious at the moment. At this point in the proceedings, alarms went off in my head. Why, I was thinking, is the variable for message ID th— when that probably stands for thread. So, I sent a few mails back and forth to create a thread, and loaded the Inbox and the message back up again under Tcpflow. Listing 5-13 shows the resulting trace. It is illuminating. Listing 5-13: Retrieving a Thread, Not a Message THE INBOX LOADING: D([“t”,[“10187696869432e6”,1,0,”9:00pm”,”Ben, me, Ben (3)”,”» ”,”This is the third message”,,[] Continued 10_59611x ch05.qxp 11/28/05 11:15 PM Page 83 84 Part II — Getting Inside Gmail Listing 5-13 (continued) ,””,”10187696869432e6”,0,”Tue Jan 18 2005_12:00PM”] ,[“101865b95fc7a35a”,1,0,”4:05pm”,”Ben Hammersley”,”» ”,”This is the second message”,,[] ,””,”101865b95fc7a35a”,0,”Tue Jan 18 2005_7:05AM”] ,[“101480d8ef5dc74a”,0,1,”Jan 6”,”Ben Hammersley”,”» ”,”Here\’s a nice message.”,,[“^t”,”Heads”] ,””,”101480d8ef5dc74a”,0,”Thu Jan 6 2005_4:44AM”] ] ); D([“te”]); THE GETTING MESSAGE EXCHANGE 192.168.016.051.61753-216.239.057.105.00080: GET /gmail?ik=344af70c5d&view=cv&search=inbox&th=10187696869432e6& lvp=-1&cvp=0&zx=24lfl9e44iyx7g HTTP/1.1 Host: gmail.google.com User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9 ,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-gb,en;q=0.5 Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://gmail.google.com/gmail?ik=&search=inbox&view=tl&start=0 &init=1&zx=cs149e44iu4pd Cookie: GV=101018770f6a0-36b4c5fcaa4913584af2219efa21740e; SID=DQAAAGoAAACTZryXzUYHgTI4VWtHGXDY5J8vchRrqp_Ek4XjEgdZYQwBUE 10_59611x ch05.qxp 11/28/05 11:15 PM Page 84 Chapter 5 — How Gmail Works 85 pXOuyokCt-EOOmsaL8J8_bQ3jkrMfskffoH8Mb6GvEJJPAhS6noKP8IjnREcWN8MTvIPeqOYYoxE52oLva00EWdOrsGhtCy18RphU; GMAIL_AT=aa5dcfedda2d8658-1018770f6a2; S=gmail=pl14BJCt_4:gmproxy=c9z4V0uxx2o; TZ=-60; GMAIL_SU=1; PREF=ID=e38a980ef675b953:TM=1106078936:LM=1106078936:GM=1:S=T0 D_V1EFUHr7faSw; GBE=d-540-800 216.239.057.105.00080-192.168.016.051.61753: HTTP/1.1 200 OK Set-Cookie: SID=DQAAAGoAAACTZryXzUYHgTI4VWtHGXDY5J8vchRrqp_Ek4XjEgdZYQwBUE pXOuyokCt-EOOmsaL8J8_bQ3jkrMfskffoH8Mb6GvEJJPAhS6noKP8IjnREcWN8MTvIPeqOYYoxE52oLva00EWdOrsGhtCy18RphU;Domain=.google.com ;Path=/ Set-Cookie: GBE=; Expires=Mon, 17-Jan-05 20:12:34 GMT; Path=/ Set-Cookie: GMAIL_SU=; Expires=Mon, 17-Jan-05 20:12:34 GMT; Path=/ Cache-control: no-cache Pragma: no-cache Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Server: GFE/1.3 Date: Tue, 18 Jan 2005 20:12:34 GMT b23 As you can deduce, th does indeed stand for thread. In Gmail, it turns out, you do not just retrieve single messages. Rather, you retrieve the requested message and also the entire set of headers for the rest of the messages in the thread. You can see 10_59611x ch05.qxp 11/28/05 11:15 PM Page 87 88 Part II — Getting Inside Gmail this quite clearly in the example above. The lines in bold type show the headers for all three messages, and the whole thing finishes with the entire content of the requested message. You then allow the JavaScript code to wrangle the interface afterward. This is a clever trick: it allows the interface to be very quick at the point the user wants it to be — when you’re reading through a thread — instead of loading each message individually. So, you now know how to retrieve messages. But how do you read them? Listing 5-14 shows the relevant bit of JavaScript. Listing 5-14: The Message Itself D([“mi”,0,3,”10187696869432e6”,0,”0”,”Ben Hammersley”,”[email protected]”,”me”,”8:59pm (12 minutes ago)”,[“Ben Hammersley <[email protected]>”] ,[] ,[] ,[] ,”Tue, 18 Jan 2005 20:59:40 +0100”,”Re: This is the third message”,””,[] ,1,,,”Tue Jan 18 2005_11:59AM”] ); D([“mb”,”And this is another reply back yet again
”,1] ); D([“mb”,”

- Show quoted text -

On 18 Jan 2005, at 20:59, Ben Hammersley wrote:

> And this is a reply back
>
>
> On Tue, 18 Jan 2005 16:05:17 +0100, Ben Hammersley
> <[email protected]> wrote:
>> 3rd! THREE! THIRD!
>>
>>

”,0] ); From this you can see that the message is sent in three JavaScript arrays. D([“mi” contains the header information — its status, the message ID, who sent it, and so on — and then there are two arrays starting with D([“mb” that contain the first 10_59611x ch05.qxp 11/28/05 11:15 PM Page 88 Chapter 5 — How Gmail Works 89 line and the whole rest of the message, respectively, marked up in HTML. Parsing this out, as you will in Chapter 8, will be easy. So you now know how to request a message and read it.</[email protected]>

And Now . . .

In this chapter, you learned how Gmail works, and you looked at the techniques
you would use to probe the system for the knowledge you need to communicate
with the Gmail server directly. You can log in, request mail, read mail, and access
label titles and other sorts of information. In the next chapter, however, you will
look at the existing APIs for Gmail — both confirming what you have learned
here — and learn how to put your new expertise to use.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow