Scraped Templates
This is a very neat trick. A common problem with templating systems, such as
WebMake, is that they don't actually help at all in certain areas.
Here's one of the problems. When a HTML Guy edits up a page template, he's
typically going to edit an entire page, not just small snippets;
he has to see what the overall page looks like, align the items correctly,
make sure that font looks OK with that font, that bgcolor with that bgcolor,
etc.
However, as Talin mentions in this thread on Advogato,
there's a problem: most large web sites use the notion of "components" -
that is, re-usable fragments of dynamic HTML which are assembled to form a
complete page.
So once the HTML Guy has designed up a good-looking, nice page to display "a
list of top 10 selling movies on a site that sells VHS tapes", as the example
in the Advo article suggests, the page now contains the following templates:
-
overall page template
-
top-10 page content
-
top-10 list table template
-
one-row-of-the-table template (which could in turn be broken down
into 2 templates: one for odd rows, one for even, etc.)
So someone has to go and cut up the page the HTML Guy has created, into
components (template and content items, in WebMake terminology). What a pain.
How do we deal with this problem?
Scraping
WebMake has some features which help here:
-
Content "src" attribute: templates can be loaded from a named
file (or even a remote webpage). Multiple templates or content
items can be loaded from the same file.
-
Pre-processing: Using the preproc attribute, you can specify
a block of perl code to execute over each content item's text.
-
Scraping: The
scrape_xml() and scrape_out_xml() perl code
library functions allows you to easily cut out the bits of the page you
want, based on patterns in the page text or HTML.
What you need to do is isolate -- or specify to the HTML Guy -- some patterns
in the text that delimit the areas of the page, which you will be turning
into templates. You then set up WebMake commands which will scrape the
templates from the designer-provided page.
Let's go with the 'top-10 videos on VHS' list page example from the Advogato
thread. That contains the following templates:
-
overall page template
-
top-10 page content (text, images maybe etc.)
-
top-10 list table template
-
one-row-of-the-table template (which could in turn be broken down
into 2 templates: one for odd rows, one for even, etc.)
Let's say the designer has provided you with this page, called "top10.htm"
(hopefully he's filled in the ... bits, of course!):
<html>
<head>
<title>Top 10 Movies on VHS</title>
</head><body>
.... blah blah navigation, other generic-page-template stuff ...
<!-- start of top-10 page content -->
Lorem ipsum dolor sit amet, consectetaur adipisicing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. ...
<!-- start of top-10 table -->
<table bgcolor=nice etc.>
<!-- start of even row -->
<tr>
<td>....</td> <td>....</td> <td>....</td>
</tr>
<!-- end of even row -->
<!-- start of odd row -->
<tr>
<td>....</td> <td>....</td> <td>....</td>
</tr>
<!-- end of odd row -->
</table>
<!-- end of top-10 table -->
<!-- end of top-10 page content -->
.... blah blah more generic-page-template stuff ....
</body>
</html>
We can see that the following content or template items can be scraped
out:
-
overall page template: everything between the
html tags, but with
text from start of top-10 page content to end of top-10 page
content stripped out
-
top-10 page content:
start of top-10 page content to end of
top-10 page content , strip out top-10 table section
-
top-10 list template:
top-10 table , strip out even row
and odd row sections
-
even-table-row template:
even row
-
odd-table-row template:
odd row
That translates into this WebMake code:
<{perl # define the scraping functions we will use.
sub scrape_page_template {
return scrape_out_xml (shift
qr/start of top-10 page content/i, qr/end of top-10 page content/i);
}
sub scrape_top10_content {
my $text = scrape_xml (shift,
qr/start of top-10 page content/i, qr/end of top-10 page content/i);
return scrape_out_xml ($text,
qr/start of top-10 table/i, qr/end of top-10 table/i);
}
sub scrape_top10_list_template {
my $text = scrape_xml (shift,
qr/start of top-10 table/i, qr/end of top-10 table/i);
$text = scrape_out_xml ($text,
qr/start of even row/i, qr/end of even row/i);
return scrape_out_xml ($text,
qr/start of odd row/i, qr/end of odd row/i);
}
sub scrape_top10_even_row_template {
return scrape_xml (shift, qr/start of even row/i, qr/end of even row/i);
}
sub scrape_top10_odd_row_template {
return scrape_xml (shift, qr/start of odd row/i, qr/end of odd row/i);
}
# (Note the qr// for the search patterns use the 'i' modifier;
# non-programmers love to mess with capitalisation ;)
''; # replace this perl block with an empty string
}>
<!-- and now define the templates, using those functions: -->
<template name="page_template" src="top10.htm"
preproc=scrape_page_template></template>
<content name="top10_content" src="top10.htm"
preproc=scrape_top10_content></content>
<template name="top10_list_template" src="top10.htm"
preproc=scrape_top10_list_template></template>
<template name="top10_even_row_template" src="top10.htm"
preproc=scrape_top10_even_row_template></template>
<template name="top10_odd_row_template" src="top10.htm"
preproc=scrape_top10_odd_row_template></template>
That's it. Those templates can now be used safely in the site logic,
and will work as long as the page designer doesn't muck about with
the comments too much.
You don't have to use comments, by the way; if your HTML Guy's editor allows
him to mark out "zones" of a page in some way, then just use whatever zone
markers it provides instead, or even just use patterns in the HTML tags or
text.
|