HTML file being edited by vim.

How To Use Google AdSense Within XML/XHTML

Table of Contents / Summary:

Using Google Adsense within XML/XHTML

Before starting, a better plan is to convert your old HTML4/XHTML to modern HTML

First, How Does AdSense Work?

You set up an account with Google and specify the ads you want. I said "Images if you have them but text if not" and specified some medium and large banners and rectangles in certain color schemes. Google then generated some small JavaScript blocks for me to place as I want within my pages.

The JavaScript I place within my page is short and simple so that I and my server have little to do.

When you view my page, you direct your browser to retrieve data from a specified URL. My server sends over an HTML document. If you have JavaScript enabled in your browser, then your computer executes each JavaScript block within the page.

The simple JavaScript blocks I added to my pages just set four variables and then they retrieve and execute a JavaScript program from pagead2.googlesyndication.com. That automatically retrieved JavaScript code does the real work of getting the ad itself.

The ad retrieved is the result of doing something like the reverse of the typical Google search. Instead of answering the question, "What pages related to this search string?" it's more like "To which search strings or ad descriptions would this page relate?" Put another way, it tries to automatically select ads on topics similar to the topic of the page. One of those four original variables specifies my Google account, so if you happen to retrieve an ad that interests you and you click on it, Google knows whom to credit.

A side effect that I didn't anticipate is that it shows me what Google thinks my pages are about. It does a good job selecting relevant ads on many of my pages, like TCP/IP, Linux/Unix, some of my information security pages, my attempts to understand Turkish grammar, my travel suggestions, and the Toilets of the World.

Google gets a little confused on many of my information security pages, obsessing on the term "security" appearing in the URL and throughout the page, and frequently offering ads for the vast and largely non-technical physical security industry in the U.S.

It's harder for it to automatically figure out what some pages are about, like one explaining how to create Cyrillic text in Unicode, the LATEX markup language, and Postcript. If it cannot decide what the page might be about, it might offer "public service ads", generally promotions for charities. However, if you have enough content on your site and at least some pages clearly on some topic, Google will reasonably assume that a mystery page should get ads from what it sees as the general theme of the site.

OK, that's what AdSense is and how it works. Why is this page here?


The Problem and an Attempted Solution

Google AdSense ads are based on JavaScript using document.write() calls. However, that doesn't work within an XML/XHTML document. Here is a workaround! However, as discussed below, we will also have to solve far worse problems caused by Microsoft Explorer's inability to handle XHTML.

In more detail, a Google AdSense ad looks like the following within a web page. The first block sets values for four variables, and the second effectively says "Set some variables, and then retrieve a JavaScript program from the following location and execute it." As for those variables, google_ad_client identifies me so I get the credit for ad views and clicks, while google_ad_slot refers to one of my specific ad definitions: specific dimensions, color scheme if this page load yields a text-only ad instead of a graphical one, and a specification of the types of pages and usual location within the page where this ad appears. Google suggests that you simply insert this JavaScript into an HTML file, but that doesn't necessarily work — hence the reason for this page!

<script type="text/javascript">
    <!--
        google_ad_client = "pub-5845932372655417";
        google_ad_slot = "1979399418";
        google_ad_width = 728;
        google_ad_height = 90;
    //-->
</script>
<script type="text/javascript"
        src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

The problem is that the retrieved program show_ads.js contains calls to JavaScript's document.write() function, and that function is disabled within an XHTML or XML page. Depending on your browser and how strict it is, you might get the ad, or maybe an error message, or maybe nothing at all.

How do I know what's in the JavaScript program? I wondered why it didn't work and so I retrieved a copy with wget:

$ wget http://pagead2.googlesyndication.com/pagead/show_ads.js
$ vim show_ads.js

Sure enough, document.write(); plays a crucial role. Some XML/XHTML solution is needed....

Here is a workaround: step by step:

Create a Proper HTML Document Containing the JavaScript

Note that this page assumes you are using Apache and using the default locations for things:

/var/www/conf     Configuration files, including httpd.conf and mime.types as discussed below.
/var/www/logs     Log files, including access_log and error_log as discussed below.
/var/www/htdocs     The web site itself is located here. The file /var/www/htdocs/Index.html is the default page, what the server provides when asked for simply http://server-name-here/.

I created the below HTML file as:
  /var/www/htdocs/ads/content-banner-technical.html
so it could be retrieved as:
  http://cromwell-intl.com/ads/content-banner-technical.html
It's just the JavaScript from above as the body of a small HTML file. Notice the CSS style elements, width: 100% and overflow: visible — those are critical to get the ad to appear as it should, without scroll bars or other visual oddities.

<?php header("Content-Type: text/html;charset=utf-8"); ?>
<html>
    <head>
        <title>Sponsorship</title>
        <style type="text/css">
	    body { margin: 0; padding: 0; width: 100%; overflow: visible;}
        </style>
    </head>
    <body>
	<div style="padding: 0; width: 100%; overflow: visible;">
            <script type="text/javascript">
            <!--
                google_ad_client = "pub-5845932372655417";
                /* Top Banner for technical content pages */
                google_ad_slot = "1979399418";
                google_ad_width = 728;
                google_ad_height = 90;
            //-->
            </script>
            <script type="text/javascript"
                    src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
            </script>
        </div>
    </body>
</html>

Create a "Wrapper" to Minimize Code Maintenance

I next created the below HTML file as:
  /var/www/htdocs/ads/banner-technical.html
Notice that it sets the MIME type of just this encapsulated object as text/html so that document.write(); will work within what will be this embedded HTML document. Also notice that it specifies the width and height of the included HTML object in pixels, in both the outer DIV wrapper and the object itself. This wrapper's width is defined as 100% of the page, so the enclosed banner will be centered. The other wrappers for rectangles and squares will have their width defined in pixels so they can be floated to the left and right and the text can then flow around them. If you do not specify the height and width as I have done here, the browser will not know in advance how to lay out the page and the result may look very strange.

<div style="width: 100%; height: 90px;">
  <table style="width: 100%; background: #f0a0f0; height: 90px; padding: 0;">
    <tr>
        <td style="text-align: center;">
        <object data="/ads/content-banner-technical.html"
		type="text/html"
		style="width: 728px; height: 90px; padding: 0;">
        </object>
        </td>
    </tr>
  </table>
</div>

I could have put above "wrapper" block directly in an HTML file, but then I have to maintain twelve-line blocks scattered across hundreds of HTML files! The next step makes it much easier.

Include The Ad by Pulling in the Wrapper with PHP

All I have to do now is add one PHP line to a file to include the wrapper and let it pull in the code. This page, for example, literally begins as shown here. See how it pulls in two ads at the very beginning, a 728x90 banner across the top before the large header "How To Use Google AdSense Within XML/XHTML", and then a 300x250 box that floats to the right, followed by the rest of the page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
	"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
	<head>
		<title>How To Use Google AdSense Within XML/XHTML</title>
		<meta name="description" content="How to use Google AdSense
			within XML/XHTML pages.
			Google AdSense uses JavaScript document.write();,
			which is not allowed within XML/XHTML.
			Here is a simple solution to the problem." />
		<meta http-equiv="content-type" content="application/xhtml+xml; charset=iso-8859-1" />
		<link rel="stylesheet" type="text/css" href="../css/style.css" media="screen" />
	</head>

	<body style="background: #f4e4b0">

		<?php @ include ('../ads/banner-728x90.html'); ?>

		<h1>How To Use Google AdSense Within XML/XHTML</h1>

		<?php @ include ('../ads/responsive-rectangle-technical.html'); ?>

		<div class="bordered" style="font-size: 9pt;">
		<p style="margin-top: 1px; margin-bottom: 1px;">
		<b> Table of Contents / Summary: </b>
		</p>
		<ul style="margin-top: 1px; margin-bottom: 1px;">
			<li>
			<a href="#first">
				First, How Does AdSense Work? </a>
			</li>
		[ .... and so on with the rest of the page .... ]

The overall result making up the entire page is a valid XML/XHTML document encapsulating a short HTML block of specified size containing nothing but Google's JavaScript program.

Keep Logging Under Control

The Apache log file /var/www/logs/access_log will be just exploding with unwanted details by now — every request for images to complete the page, plus the style sheet, and now plus these ad pages. So, let's tell Apache to just log the pages themselves and not their "prerequisites", the things they require.

Below is a section of my /var/www/conf/httpd.conf. Lines highlighted in yellow are lines that I added, lines highlighted in magenta are original lines that I commented out, and the green %h is something I added to capture the client IP for each logged referral:

[ ... over 500 preceding lines not shown ...]
#
# The following directives define some format nicknames for use with
# a CustomLog directive (see below).
#
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%h -- %{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

#

#
# The location and format of the access logfile (Common Logfile Format).
# If you do not define any access logfiles within a <VirtualHost>
# container, they will be logged here.  Contrariwise, if you *do*
# define per-<VirtualHost> access logfiles, transactions will be
# logged therein and *not* in this file.
#

### Stop requests for images, style sheets, etc, as described here:
### http://www.vbulletin.com/forum/showthread.php?t=25287
SetEnvIf Request_URI \.gif not-logged
SetEnvIf Request_URI \.png not-logged
SetEnvIf Request_URI \.jpg not-logged
SetEnvIf Request_URI \.jpeg not-logged
SetEnvIf Request_URI \.ico not-logged
SetEnvIf Request_URI style\.css not-logged
SetEnvIf Request_URI ads/content not-logged
CustomLog /dev/null combined env=not-logged
CustomLog logs/access_log common env=!not-logged
# CustomLog logs/access_log common

#
# If you would like to have agent and referer logfiles, uncomment the
# following directives.
#
CustomLog logs/referer_log referer env=!not-logged
#CustomLog logs/referer_log referer
#CustomLog logs/agent_log agent

[ ... over 500 more lines not shown ...]

Dealing with Bigger Problems Caused by Explorer's Inability to Handle XHTML

Microsoft's Internet Explorer cannot handle XHTML documents. See this description for the details, which include IE8. If you serve an XHTML document to Explorer, it doesn't know how to handle it and asks if you want to save it to a file. All other browsers can handle XHTML, see the W3 group's answer to the question of which browsers accept media type application/xhtml+xml, where I have emphasized a significant comment they made on the page they wrote in 2004:

Browsers known to us include all Mozilla-based browsers, such as Mozilla, Netscape 5 and higher, Galeon and Firefox, as well as Opera, Amaya, Camino, Chimera, DocZilla, iCab, Safari, and all browsers on mobile phones that accept WAP2. In fact, any modern browser. Most accept XHTML documents as application/xml as well. See the XHTML Media-type test for details.

Stupid Microsoft and their worthless software. 'Mouse Movement Detected: Windows has detected that your mouse has moved.  Please reboot for changes to take effect.'

Windows: It's almost this stupid.

I would happily have an entire web site that required you to use any browser other than Explorer. However, that would mean less page views, less ad clicks, and less ad income for me. So, I need to find a way to support the the lame Microsoft Explorer, the world's most dangerous software, especially when coupled with ActiveX.

There are at least four possible solutions:

1 — Re-write all my pages to use plain old HTML instead of XHTML

This is the preferred solution!

2 — Modify the MIME type and document content using PHP as the page is served up, based on what the browser says it can handle

There are PHP tricks to use the browser's advertised Accept string to figure out what the user agent, be it a browser or crawler or whatever, can handle, and give it precisely that. For example, Neil Crosby's detailed solution. As an explanation of the Accept string, see the below Wireshark capture of Firefox viewing this page. The first block, starting "GET", is from my client; the second, starting "HTTP/1.1" is from the server:

GET /technical/google-adsense-and-xhtml.html HTTP/1.1
Host: cromwell-intl.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.3) Gecko/20090914 \
                Mageia Linux/1.9.1.3-2mdv2010.0 (2010.0) Firefox/3.5.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://cromwell-intl.com/technical/
Cache-Control: max-age=0

HTTP/1.1 200 OK
Date: Thu, 15 Oct 2009 23:46:01 GMT
Server: Apache
X-Powered-By: PHP/5.2.8
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html

[... page content appears here...]

I am impressed by the careful attention to detail in solutions such as those by Neil Crosby, but I just don't want to add too much bulk and overhead to every single page. And as you can see on Neil's page, the server's PHP engine needs to modify all the page content, for example, changing every instance of XHTML <br /> to the HTML <br>.

3 — Use the W3C XML trick

They suggest inserting these two bold lines at the beginning of each XHTML document:

<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet type="text/xsl" href="/copy.xsl"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
	"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
	<head>
		<title>How To Use Google AdSense Within XML/XHTML</title>
		[.... and so on ....]

Then you would create that /copy.xsl file:

<stylesheet version="1.0"
     xmlns="http://www.w3.org/1999/XSL/Transform">
     <template match="/">
	 <copy-of select="."/>
     </template>
</stylesheet>

4 — Write the pages in XHTML, but serve them with a MIME type of text/html

Careful readers of the above packet capture will have already seen that I took the easy and common way out. The W3C says:

Furthermore see "Sending XHTML as text/html Considered Harmful".

Some day I will really need to put in the work to add the PHP modification to every web page on my site. And that will definitely need a Unix shell script instead of a marathon editing session.

For now, though, I'm taking the lazy way out. I modified the mime.types file on my server to serve out files named *.html as MIME type text/html.


And in conclusion — It just can't be done....

I really thought I had this solved, and all that remained was for me to implement Neil Crosby's solution for modifying the MIME type and document content field with PHP, based on what the browser says it can handle.

But then I watched someone looking at my site, with Explorer, and saw what happened when they clicked on an ad. With Explorer, and only with Explorer, those embedded HTML objects within the page remain precisely that — embedded objects. A clicked ad on Explorer does not open the ad in the main browser window, it opens within a small window on the main page. You don't see the entire advertisement, you see just a small rectangle of it viewed through a small window with slider bars below and to its right.

So, I went back to an earlier plan. No "wrapper", my PHP include now just pulls this in:

<table style="width: 100%; height: 90px; padding: 0;">
  <tr>
    <td style="text-align: center;">
	<script type="text/javascript">
	<!--
		google_ad_client = "pub-5845932372655417";
		/* Top Banner 728x90 */
		google_ad_slot = "5257000457";
		google_ad_width = 728;
		google_ad_height = 90;
	//-->
	</script>
	<script type="text/javascript"
		src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
	</script>
    </td>
  </tr>
</table>

What's Left?

The only remaining annoyance, and it's fairly minor, is PHP's inability to easily deal with absolute paths. The include is relative to that page's location, so I need the ../ shown above.

The "easy" fix is to instead use all this:

<?php include($_SERVER['DOCUMENT_ROOT'].'/ads/banner-728x90-wrapper.html'); ?>

Ugh. I'll just keep track of how many instances of "../" are needed....

Now I'm ready to harness the awesome power of a converted Ukranian tanker full of Click Monkeys! And if you find that amusing, also see the same group's Pets or Food and ZooBQ.

The next step is search engine optimization (SEO), the art of making search engines pay more attention to your pages.


HTML Tools
How to Make Money with Search Engine Optimization
Various Technical Topics