SourceForge project


1995 Paper

Crawdad provides HTML generation and CGI response support to Python programmers. See the Crawdad project on SourceForge for downloads and current activity.

There are two common approaches to generating HTML: write HTML from program code (Perl, C, ...) or embed commands with HTML pages (PHP, Cold Fusion, ...).

HTML tags represented as a treeThis system takes a third approach. A Python program builds a DOM-like, in-memory tree that represents an HTML document. Tree nodes represent tagged HTML elements; subtrees represent the contents ot those tags. HTML is generated by walking the tree.

The main advantage of this approach is that tree nodes can be added in any order. For example, the page title can be added to the tree after the body is complete. This is useful for presenting summary information: "Phone Book: Coster to Dunning." A second advantage is that the HTML is always correct and properly nested. (A disadvantage is that the tree must be completed before any output is sent the browser.)

The system can generate static HTML pages or dynamic HTML pages responding to CGI requests.

HtmlDoc: An HTML Generator

My early web applications (1993-1994) generated dubious HTML. Although the HTML 1.0 standard was extremely simple, these applications generated HTML with improper nesting, unclosed tags, and incorrect attributes. The problem, as ususal, was that treating HTML as text left too many opportunities for errors:
print "<p>This is <i>wrong</b>"
In the summer of 1995, I developed htmlDoc, a Python module that constructed an in-memory tree representation of an HTML document. Each HTML tag was represented by an object in the tree. Factory methods assured that only standard attributes would be allowed. The tree structure guaranteed correct nesting of tags. Tag objects knew whether to generate closing tags as they were visited by the HTML generator tree walker. The system was introduced with a paper at the December 1995 Python workshop.

The abstract class HtmlDocument represents complete HTML documents. Web applications define subclasses of HtmlDocument to implement common design elements.

Since 1995, HTML and Python have improved. The htmlDoc code was revised to include new Python features and to support new versions of HTML. I reached the "good enough" level at Python 1.5.2 and HTML 3.2. 

Future plans include:

  • Documentation - both docstrings and a manual (in work)
  • Examples
  • Support for HTML 4
  • Elimination of text wrapping, which eased debugging, but slows HTML generation

CgiResponder - A Simple Web Application Framework

Although htmlDoc can generate static web pages, it is more commonly used to generate dynamic web pages in response to user requests. The cgiResponder class adds capabilities to Python's cgi module and knows how to generate htmlDoc documents.

CgiResponder provides an action based framework. Applications subclass cgiResponder to define action methods. For example,

def action_list (self):
    if self.validate_list ():
        doc = self.makeListDocument ()
        doc = self.forbidden ()
    self.deliverHtml (doc)
The CgiMaintResponder subclass supports common actions needed to maintain database records: list, review, edit, add, update, delete, and new. To implement a maintenance application, subclass cgiMaintResponder and define methods like makeListDocument and validate_list.

Why Crawdad?

  • CGI Responder Automatic Web Dynamic Automatic D...
  • I just visited New Orleans
  • Several projects are named HtmlDoc
  • All the good names were taken
Many alternative approaches to web applications have been introduced since htmlDoc and cgiResponder went into daily production in 1995. Zope, Webware, and Quixote are among the Python-based approaches.

The possibility of a DOM-like HTML generator was mentioned recently on the Quixote mailing list. I created the Crawdad project to expose my early work along those lines.

Copyright 2002 by Joel Shprentz. All rights reserved.

SourceForge Logo