|
HTML::Paragraphs --- inserts paragraph markers and transform HTML documents |
HTML::Paragraphs --- inserts paragraph markers and transform HTML documents
HTML::Transform
use HTML::Paragraphs;
$p = new HTML::Paragraphs;
$p->parse(\<<STOP);
<h1>Test document</h1>
Paragraph markers will be automatically inserted
into this text document.
This saves you some headache.
STOP
HTML::Paragraphs is a HTML parser/transformation module that is able to
detect paragraphs and do automatic <p>...</p> insertion. If you run a
document such as
<H1>Test document</H1>
Here we have one
paragraph.
And here we have another.
through the paragraph transformer the result will be.
<H1>Test document</H1>
<P>Here we have one
paragrah.</P>
<P>And here we have another.</P>
HTML::Paragraphs is a subclass of HTML::Transform. See the HTML::Transform manpage
for a full description of the functions supported by this module. This document
only describes the difference between HTML::Paragraphs and
HTML::Transform.
doc according to the parser's transformation
rules. The default behavior is to insert <P>...</P> around all
paragraphs in the document. If you want to add additional rules you must create
a new class or use the set_handler() function.
See the HTML::Transform manpage for more information.
<p>...</p>
areound each paragraph in the text (each block separated by double
newlines). If you want to change the behaviour you can override
the function.
TAGS are block level tags.
Block level tags are tags such as <h1> and <table>
that should not be enclosed in <p>...</p> blocks.
To understand the difference between block and non-block tags, note that
<b>Bold text</b>
should be converted to
<p><b>Bold text</b></p>
while
<h1>Header</h1>
should not be converted to
<p><h1>Header</h1></p>
HTML::Paragraphs recognizes all block level tags in the HTML standard,
so you do not need to call set_block() for those tags.
true if TAG is a block level tag. If you create
a subclass of this class you can override block() to return the
right value for the tags you have defined as an alternative to
calling set_block().
The block() implementation in HTML::Paragraphs returns the
correct value for all standard HTML tags so you probably want to call
it for the tags you do not handle.
sub block {
my ($self, $tag) = @_;
return ($tag eq "myblock") || $self->SUPER::block($tag);
}
TAGS are block containers,
i. e. tags that can contain <P>...</P> blocks. A typical
example is <TD>...</TD>, since each table cell can contain
several paragraphs.
HTML::Paragraphs can correctly handle all tags in the HTML standard,
so you only need to call set_block_container() for the tags you have
defined.
true if TAG is a block container tag. If you create
a subclass you can override this function to specify which tags
are block containers, as an alternative to calling the set_block_container()
function.
You should probably call the method in the superclass for all tags you do not handle.
<p>...</p> around each paragraph in the
text (each block separated by double newlines).
One of the most annoying things about writing HTML documents is
having to insert <P>...</P> tags around each paragraph document.
The Paragraphs lets you avoid this hassle. You can pass a document
such as
<H1>My story</H1>
I was born many years ago. I was very small
then I don't remember very much of it.
Later I grow up. I don't remember much of
that either, but it seemed to involve a lot
of ants.
And paragraph markers will be automatically added
<H1>My story</H1>
<p>I was born many years ago. I was very small
then I don't remember very much of it.</p>
<p>Later I grow up. I don't remember much of
that either, but it seemed to involve a lot
of ants.</p>
If you want to do additional parsing you have to create a subclass of the Paragraphs. See the HTML::Transform manpage for more information on this.
Note that you shouldn't use the autofix() function together with the
HTML::Paragraphs module. They will interfere destructively.
To be fair, when you have created a subclass and introduced your own tags, the insertion of paragraph markers is not completely automatic. You have to provide the parser with some information about the tags in the document. The reason is that you want different tags to be treated differently. Consider the example:
<H1>Test</H1>
<B>This is a test</B>
You want this to be transformed to:
<H1>Test</H1>
<P><B>This is a test</B></P>
So the <H1> tag needs to be treated differently from the <B> tag.
For the tags in the HTML standard this does not pose a problem, because we can
enumerate them and define their behavior, but for the tags you define in your
rule document, the parser will not know how to treat them unless you tell it.
You specify that a tag is a block level tag by calling set_block() or
overriding block(). Block level tags are tags that should not be wrapped up in
<P> tags. For example, <H1> is a block level tag, since you do
not want it replaced with
<P><H1>Header</H1></P>
Other typical block level tags are: <ADDRESS>, <BLOCKQUOTE>,
<PRE>, <DL>, <OL>, <UL> and <P> (since you
do not want <P><P>...</P></P>). If you have defined your own tag that
behaves like these tags, you need to override block() or call set_block().
Note that the implementation of block() in HTML::Paragraphs handles all
the standard HTML tags, so you probably want to call it in your overriding
method.
The second thing you need to do is call set_block_container() or override is
block_container(). This method should return true for every tag that can
contain <P>...</P> blocks. For example, you will probably want to do
paragraph parsing inside <TD> tags, to make sure that
<TD>
Paragraph 1.
Paragraph 2.
</TD>
is replaced by
<TD>
<P>Paragraph 1.</P>
<P>Paragraph 2.</P>
</TD>
But you probably do not want to do paragraph parsing inside <PRE> tags.
Just as before, call the superclass method block_container() to get the default
behavior for all the standard tags.
Using set_block() your code may look like this:
$p->set_block(qw(program block));
$p->set_block_container("block");
Using overrides, your code may look like this:
sub block {
my ($self, $tag) = @_;
return (grep {/^$tag$/} qw(program block) or
$self->SUPER::block($tag));
}
sub block_container {
my ($self, $tag) = @_;
($tag eq "block") || $self->SUPER::block_container($tag);
}
Current version: 1.0 beta 3
is_block_level() and has_paragrahs() functions.
set_block() and set_block_container() to support non-subclass style
programming.
Niklas Frykholm, niklas@kagi.com
This program can be used and distributed freely.
|
HTML::Paragraphs --- inserts paragraph markers and transform HTML documents |