genshi.input

Support for constructing markup streams from files, strings, or other sources.

genshi.input.ET(element)

Convert a given ElementTree element to a markup stream.

Parameters:element – an ElementTree element
Returns:a markup stream
exception genshi.input.ParseError(message, filename=None, lineno=-1, offset=-1)

Exception raised when fatal syntax errors are found in the input being parsed.

class genshi.input.XMLParser(source, filename=None, encoding=None)

Generator-based XML parser based on roughly equivalent code in Kid/ElementTree.

The parsing is initiated by iterating over the parser object:

>>> parser = XMLParser(StringIO('<root id="2"><child>Foo</child></root>'))
>>> for kind, data, pos in parser:
...     print('%s %s' % (kind, data))
START (QName('root'), Attrs([(QName('id'), u'2')]))
START (QName('child'), Attrs())
TEXT Foo
END child
END root
parse()

Generator that parses the XML source, yielding markup events.

Returns:a markup event stream
Raises ParseError:
 if the XML text is not well formed
genshi.input.XML(text)

Parse the given XML source and return a markup stream.

Unlike with XMLParser, the returned stream is reusable, meaning it can be iterated over multiple times:

>>> xml = XML('<doc><elem>Foo</elem><elem>Bar</elem></doc>')
>>> print(xml)
<doc><elem>Foo</elem><elem>Bar</elem></doc>
>>> print(xml.select('elem'))
<elem>Foo</elem><elem>Bar</elem>
>>> print(xml.select('elem/text()'))
FooBar
Parameters:text – the XML source
Returns:the parsed XML event stream
Raises ParseError:
 if the XML text is not well-formed
class genshi.input.HTMLParser(source, filename=None, encoding=None)

Parser for HTML input based on the Python HTMLParser module.

This class provides the same interface for generating stream events as XMLParser, and attempts to automatically balance tags.

The parsing is initiated by iterating over the parser object:

>>> parser = HTMLParser(BytesIO(u'<UL compact><LI>Foo</UL>'.encode('utf-8')), encoding='utf-8')
>>> for kind, data, pos in parser:
...     print('%s %s' % (kind, data))
START (QName('ul'), Attrs([(QName('compact'), u'compact')]))
START (QName('li'), Attrs())
TEXT Foo
END li
END ul
parse()

Generator that parses the HTML source, yielding markup events.

Returns:a markup event stream
Raises ParseError:
 if the HTML text is not well formed
genshi.input.HTML(text, encoding=None)

Parse the given HTML source and return a markup stream.

Unlike with HTMLParser, the returned stream is reusable, meaning it can be iterated over multiple times:

>>> html = HTML('<body><h1>Foo</h1></body>', encoding='utf-8')
>>> print(html)
<body><h1>Foo</h1></body>
>>> print(html.select('h1'))
<h1>Foo</h1>
>>> print(html.select('h1/text()'))
Foo
Parameters:text – the HTML source
Returns:the parsed XML event stream
Raises ParseError:
 if the HTML text is not well-formed, and error recovery fails