WebMake
WebMake Documentation (version 2.4)

Metadata

What Is Metadata?

Everyone is familiar with data, but the term meta-data is not so familiar. Here's a brief primer.

To illustrate, I'll use an example familiar to most readers. Most computer operating systems nowadays have the concept of files in a filesystem. If you consider the files as data, then details such as file size, modification times, username of the owner etc. are metadata, ie. data about the files.

In WebMake, metadata is used to refer to properties of textual content items. For example, a newspaper article may have a title, an abstract (ie. a brief summary), etc.

This kind of data is very useful for building indices and catalogues, in the same way that Windows Explorer or the UNIX ls(1) command uses filesystem metadata to display file listings. As a result, a good way to think of it is as "catalog data", as opposed to "narrative data", which is what a normal content item is. (thanks to Vaibhav Arya, vaibhav /at/ mymcomm.com, for that analogy.)

To extend this metaphor, you should use metadata for anything that would be used to describe your pages in a catalog. For example, given the page title, a quick abstract of the page, and a number to indicate its importance relative to other pages, one could easily create a list of pages automatically. In fact, this is how the indexes in the WebMake documentation are generated, and it's how sitemaps, breadcrumb trails and site trees are implemented.

How to Define Metadata

WebMake can load metadata from a number of sources:

  • Inferred from the content text itself: WebMake supports "magic" metadata, which contains some inferred data about the content, such as its last modification date (which can be inferred from the filesystem storage of the content file itself). In addition, title metadata can be inferred from several sources, such as the <title> tag in HTML, or =head1 tags in POD text.
  • Tags embedded within the content text: This is handled using the <wmmeta> tag.
  • Set as defaults before the content items are defined: the <metadefault> WebMake tag.
  • Defined in bulk and "attached" to the content items: the <metatable> tag.

Referring to Metadata

Metadata is referred to using the deferred content ref format:

$[content.metaname]

Where content is the name of the content item, and metaname is the name of the metadatum. So, for example, $[blurb.txt.title] would return the title metadatum from the content item blurb.txt.

Meta tag names are case-insensitive, for compatibility with HTML meta tags.

Any content chunk can access metadata from other content chunks within the same <out> tag, using this as the content name, i.e. $[this.title] . This is handy, for example, in setting the page title in the main content chunk, and accessing it from the header chunk.

If more than one content item sets the same item of metadata inside the <out> tag, the first one will take precedence.

The example files "news_site.wmk" and "news_site_with_sections.wmk" demonstrate how meta tags can be used to generate a SlashDot or Wired News-style news site. The index pages in those sites are generated dynamically, using the metadata to decide which pages to link to, their ordering, and the titles and abstracts to use.

How Do I Use Metadata In WebMake?

WebMake provides extra support for metadata in an efficient way. A metadatum is like a normal content item, except it is exposed to all other pages in the WebMake file. This data is accessible, both to other pages in the site (as $[contentname.metaname]), and to other content items within the same page (as
$[this.metaname]).

In addition, WebMake caches metadata in the site cache file between runs, so that a subsequent partial site build will not require loading all the content text, just to read a page title.

Note that content items representing metadata cannot, themselves, have metadata.

What Metadata Should I Use?

The items marked (built-in) are supported directly inside WebMake, and used internally for functionality like building site maps and indices. All the other suggested metadata names here are just that, suggestions, which support commonly-required functionality.

Also note that the names are case-insensitive, they're just capitalised here for presentation.

Title
the title of a content item. The default title for content items is inferred from the content text where possible, or (Untitled) if no title can be found. (built-in)
Score
a number representing the "priority" of a content item; used to affect how the item should be ranked in a list of stories. The default value is 50. Items with the same score will be ranked alphabetically by title. (built-in)
Abstract
a short summary of a content item.
Up
used to map the site's content; this metadata indicates the content item that is the parent of the current content item. This metadatum is used to generate dynamic sitemaps. (built-in)
Section
the section of a site under which a story should be filed.
Author
who wrote the item.
Approved
has this item been approved by an editor; used to support workflow, so that content items need to be approved before they are displayed on the site.
Visible_Start
the start of an item's "visibility window", ie. when it is listed on an index page. (TODO: define a recommended format for this, or replace with DC.Coverage.temporal)
Visible_End
the end of an item's "visibility window", ie. when it is listed on an index page.
DC.Publisher
a Dublin Core metadatum. The organisation or individual that publishes the entire site.

The Dublin Core is a whole load of suggested metadata names and formats, which can be used either to replace or supplement the optional metadata named above. Regardless of whether you replace or supplement the metadata above internally, it is definitely recommended to use the DC names for metadata that's made visible in the output HTML through conventional HTML <meta> tags.

Built-In Metadata

These are some built-in "magic" items of metadata that do not need to be defined manually. Instead, they are automatically inferred by WebMake itself:

declared
the item's declaration order. This is a number representing when the content item was first encountered in the WebMake file; earlier content items have a lower declaration order. Useful for sorting.
url

the first <out> URL which contains that content item (you should order your <out> tags to ensure each stories' "primary" page is listed first, or set ismainurl=false on the "alternative" output pages, if you plan to use this). See also the get_url() method on the HTML::WebMake::Content object.

is_generated
0 for items loaded from a <content> or <contents> tag, 1 for items created by Perl code using the add_content() function.
mtime
The modification date, in UNIX time_t seconds-since-the-epoch format, of the file the content item was loaded from. Handy for sorting.

Why Use Metadata

Support for metadata is an important CMS feature.

It is used by Midgard and Microsoft's SiteServer, and is available as user-contributed code for Manila. It provides copious benefits for flexible index and sitemap generation, and, with the addition of an Approved tag, adds initial support for workflow.

It allows the efficient generation of site maps, back/forward navigation links, and breadcrumb trails, and enables index pages to be generated using Perl code easily and in a well-defined way.

WebMake Documentation (version 2.4)
Built With WebMake