Scraped Templates

This is a very neat trick. A common problem with templating systems, such as WebMake, is that they don't actually help at all in certain areas.

Here's one of the problems. When a HTML Guy edits up a page template, he's typically going to edit an entire page, not just small snippets; he has to see what the overall page looks like, align the items correctly, make sure that font looks OK with that font, that bgcolor with that bgcolor, etc.

However, as Talin mentions in this thread on Advogato, there's a problem: most large web sites use the notion of "components" - that is, re-usable fragments of dynamic HTML which are assembled to form a complete page.

So once the HTML Guy has designed up a good-looking, nice page to display "a list of top 10 selling movies on a site that sells VHS tapes", as the example in the Advo article suggests, the page now contains the following templates:

overall page template
top-10 page content
top-10 list table template
one-row-of-the-table template (which could in turn be broken down into 2 templates: one for odd rows, one for even, etc.)

So someone has to go and cut up the page the HTML Guy has created, into components (template and content items, in WebMake terminology). What a pain.

How do we deal with this problem?

Scraping

WebMake has some features which help here:

Content "src" attribute: templates can be loaded from a named file (or even a remote webpage). Multiple templates or content items can be loaded from the same file.
Pre-processing: Using the preproc attribute, you can specify a block of perl code to execute over each content item's text.
Scraping: The scrape_xml() and scrape_out_xml() perl code library functions allows you to easily cut out the bits of the page you want, based on patterns in the page text or HTML.

What you need to do is isolate -- or specify to the HTML Guy -- some patterns in the text that delimit the areas of the page, which you will be turning into templates. You then set up WebMake commands which will scrape the templates from the designer-provided page.

Let's go with the 'top-10 videos on VHS' list page example from the Advogato thread. That contains the following templates:

overall page template
top-10 page content (text, images maybe etc.)
top-10 list table template
one-row-of-the-table template (which could in turn be broken down into 2 templates: one for odd rows, one for even, etc.)

Let's say the designer has provided you with this page, called "top10.htm" (hopefully he's filled in the ... bits, of course!):


    <html>
      <head>
      <title>Top 10 Movies on VHS</title>
      </head><body>

      .... blah blah navigation, other generic-page-template stuff ...

      <!-- start of top-10 page content -->

      Lorem ipsum dolor sit amet, consectetaur adipisicing elit, sed do
      eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
      ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
      aliquip ex ea commodo consequat. ...

      <!-- start of top-10 table -->
      <table bgcolor=nice etc.>

	<!-- start of even row -->
	<tr>
	  <td>....</td> <td>....</td> <td>....</td>
	</tr>
	<!-- end of even row -->

	<!-- start of odd row -->
	<tr>
	  <td>....</td> <td>....</td> <td>....</td>
	</tr>
	<!-- end of odd row -->

      </table>
      <!-- end of top-10 table -->

      <!-- end of top-10 page content -->

      .... blah blah more generic-page-template stuff ....
      </body>
    </html>

We can see that the following content or template items can be scraped out:

overall page template: everything between the html tags, but with text from start of top-10 page content to end of top-10 page content stripped out
top-10 page content: start of top-10 page content to end of top-10 page content, strip out top-10 table section
top-10 list template: top-10 table, strip out even row and odd row sections
even-table-row template: even row
odd-table-row template: odd row

That translates into this WebMake code:

  <{perl        # define the scraping functions we will use.

  sub scrape_page_template {
    return scrape_out_xml (shift
        qr/start of top-10 page content/i, qr/end of top-10 page content/i);
  }

  sub scrape_top10_content {
    my $text = scrape_xml (shift,
        qr/start of top-10 page content/i, qr/end of top-10 page content/i);
    return scrape_out_xml ($text,
        qr/start of top-10 table/i, qr/end of top-10 table/i);
  }

  sub scrape_top10_list_template {
    my $text = scrape_xml (shift,
        qr/start of top-10 table/i, qr/end of top-10 table/i);
    $text = scrape_out_xml ($text,
        qr/start of even row/i, qr/end of even row/i);
    return scrape_out_xml ($text,
        qr/start of odd row/i, qr/end of odd row/i);
  }

  sub scrape_top10_even_row_template {
    return scrape_xml (shift, qr/start of even row/i, qr/end of even row/i);
  }

  sub scrape_top10_odd_row_template {
    return scrape_xml (shift, qr/start of odd row/i, qr/end of odd row/i);
  }

  # (Note the qr// for the search patterns use the 'i' modifier;
  # non-programmers love to mess with capitalisation ;)

  '';           # replace this perl block with an empty string

  }>

  <!-- and now define the templates, using those functions: -->
  <template name="page_template" src="top10.htm"
                          preproc=scrape_page_template></template>
  <content name="top10_content" src="top10.htm"
                          preproc=scrape_top10_content></content>
  <template name="top10_list_template" src="top10.htm"
                          preproc=scrape_top10_list_template></template>
  <template name="top10_even_row_template" src="top10.htm"
                          preproc=scrape_top10_even_row_template></template>
  <template name="top10_odd_row_template" src="top10.htm"
                          preproc=scrape_top10_odd_row_template></template>

That's it. Those templates can now be used safely in the site logic, and will work as long as the page designer doesn't muck about with the comments too much.

You don't have to use comments, by the way; if your HTML Guy's editor allows him to mark out "zones" of a page in some way, then just use whatever zone markers it provides instead, or even just use patterns in the HTML tags or text.

WebMake Documentation (version 2.4)

[ Back | Forward | Index | All In One ]