genshi.core¶

Core classes for markup processing.

class genshi.core.Stream(events, serializer=None)¶

Represents a stream of markup events.

This class is basically an iterator over the events.

Stream events are tuples of the form:

(kind, data, position)

where kind is the event kind (such as START, END, TEXT, etc), data depends on the kind of event, and position is a (filename, line, offset) tuple that contains the location of the original element or text in the input. If the original location is unknown, position is (None, -1, -1).

Also provided are ways to serialize the stream to text. The serialize() method will return an iterator over generated strings, while render() returns the complete generated text at once. Both accept various parameters that impact the way the stream is serialized.

filter(*filters)¶

Apply filters to the stream.

This method returns a new stream with the given filters applied. The filters must be callables that accept the stream object as parameter, and return the filtered stream.

The call:

stream.filter(filter1, filter2)

is equivalent to:

stream | filter1 | filter2

Parameters:	filters – one or more callable objects that should be applied as filters
Returns:	the filtered stream
Return type:	Stream

render(method=None, encoding=None, out=None, **kwargs)¶

Return a string representation of the stream.

Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.

Parameters:	method – determines how the stream is serialized; can be either “xml”, “xhtml”, “html”, “text”, or a custom serializer class; if None, the default serialization method of the stream is used encoding – how the output string should be encoded; if set to None, this method returns a unicode object out – a file-like object that the output should be written to instead of being returned as one big string; note that if this is a file or socket (or similar), the encoding must not be None (that is, the output must be encoded)
Returns:	a str or unicode object (depending on the encoding parameter), or None if the out parameter is provided
Return type:	basestring
See:	XMLSerializer, XHTMLSerializer, HTMLSerializer, TextSerializer
Note:	Changed in 0.5: added the out parameter

select(path, namespaces=None, variables=None)¶

Return a new stream that contains the events matching the given XPath expression.

>>> from genshi import HTML
>>> stream = HTML('<doc><elem>foo</elem><elem>bar</elem></doc>', encoding='utf-8')
>>> print(stream.select('elem'))
<elem>foo</elem><elem>bar</elem>
>>> print(stream.select('elem/text()'))
foobar

Note that the outermost element of the stream becomes the context node for the XPath test. That means that the expression “doc” would not match anything in the example above, because it only tests against child elements of the outermost element:

>>> print(stream.select('doc'))

You can use the ”.” expression to match the context node itself (although that usually makes little sense):

>>> print(stream.select('.'))
<doc><elem>foo</elem><elem>bar</elem></doc>

Raises PathSyntaxError:
Parameters:	path – a string containing the XPath expression namespaces – mapping of namespace prefixes used in the path variables – mapping of variable names to values
Returns:	the selected substream
Return type:	Stream
	if the given path expression is invalid or not supported

serialize(method='xml', **kwargs)¶

Generate strings corresponding to a specific serialization of the stream.

Unlike the render() method, this method is a generator that returns the serialized output incrementally, as opposed to returning a single string.

Any additional keyword arguments are passed to the serializer, and thus depend on the method parameter value.

Parameters:	method – determines how the stream is serialized; can be either “xml”, “xhtml”, “html”, “text”, or a custom serializer class; if None, the default serialization method of the stream is used
Returns:	an iterator over the serialization results (Markup or unicode objects, depending on the serialization method)
Return type:	`iterator`
See:	XMLSerializer, XHTMLSerializer, HTMLSerializer, TextSerializer

class genshi.core.Markup¶

Marks a string as being safe for inclusion in HTML/XML output without needing to be escaped.

classmethod escape(text, quotes=True)¶

Create a Markup instance from a string and escape special characters it may contain (<, >, & and ”).

>>> escape('"1 < 2"')
<Markup u'&#34;1 &lt; 2&#34;'>

If the quotes parameter is set to False, the ” character is left as is. Escaping quotes is generally only required for strings that are to be used in attribute values.

>>> escape('"1 < 2"', quotes=False)
<Markup u'"1 &lt; 2"'>

Parameters:	text – the text to escape quotes – if `True`, double quote characters are escaped in addition to the other special characters
Returns:	the escaped Markup string
Return type:	Markup

join(seq, escape_quotes=True)¶

Return a Markup object which is the concatenation of the strings in the given sequence, where this Markup object is the separator between the joined elements.

Any element in the sequence that is not a Markup instance is automatically escaped.

Parameters:	seq – the sequence of strings to join escape_quotes – whether double quote characters in the elements should be escaped
Returns:	the joined Markup object
Return type:	Markup
See:	escape

stripentities(keepxmlentities=False)¶

Return a copy of the text with any character or numeric entities replaced by the equivalent UTF-8 characters.

If the keepxmlentities parameter is provided and evaluates to True, the core XML entities (&, ', >, < and ") are not stripped.

Returns:	a Markup instance with entities removed
Return type:	Markup
See:	genshi.util.stripentities

striptags()¶

Return a copy of the text with all XML/HTML tags removed.

Returns:	a Markup instance with all tags removed
Return type:	Markup
See:	genshi.util.striptags

unescape()¶

Reverse-escapes &, <, >, and ” and returns a unicode object.

>>> Markup('1 &lt; 2').unescape()
u'1 < 2'

Returns:	the unescaped string
Return type:	unicode
See:	genshi.core.unescape

genshi.core.unescape(text)¶

Reverse-escapes &, <, >, and ” and returns a unicode object.

>>> unescape(Markup('1 &lt; 2'))
u'1 < 2'

If the provided text object is not a Markup instance, it is returned unchanged.

>>> unescape('1 &lt; 2')
'1 &lt; 2'

Parameters:	text – the text to unescape
Returns:	the unescsaped string
Return type:	unicode

class genshi.core.Attrs¶

Immutable sequence type that stores the attributes of an element.

Ordering of the attributes is preserved, while access by name is also supported.

>>> attrs = Attrs([('href', '#'), ('title', 'Foo')])
>>> attrs
Attrs([('href', '#'), ('title', 'Foo')])

>>> 'href' in attrs
True
>>> 'tabindex' in attrs
False
>>> attrs.get('title')
'Foo'

Instances may not be manipulated directly. Instead, the operators | and - can be used to produce new instances that have specific attributes added, replaced or removed.

To remove an attribute, use the - operator. The right hand side can be either a string or a set/sequence of strings, identifying the name(s) of the attribute(s) to remove:

>>> attrs - 'title'
Attrs([('href', '#')])
>>> attrs - ('title', 'href')
Attrs()

The original instance is not modified, but the operator can of course be used with an assignment:

>>> attrs
Attrs([('href', '#'), ('title', 'Foo')])
>>> attrs -= 'title'
>>> attrs
Attrs([('href', '#')])

To add a new attribute, use the | operator, where the right hand value is a sequence of (name, value) tuples (which includes Attrs instances):

>>> attrs | [('title', 'Bar')]
Attrs([('href', '#'), ('title', 'Bar')])

If the attributes already contain an attribute with a given name, the value of that attribute is replaced:

>>> attrs | [('href', 'http://example.org/')]
Attrs([('href', 'http://example.org/')])

get(name, default=None)¶

Return the value of the attribute with the specified name, or the value of the default parameter if no such attribute is found.

Parameters:	name – the name of the attribute default – the value to return when the attribute does not exist
Returns:	the attribute value, or the default value if that attribute does not exist
Return type:	object

totuple()¶

Return the attributes as a markup event.

The returned event is a TEXT event, the data is the value of all attributes joined together.

>>> Attrs([('href', '#'), ('title', 'Foo')]).totuple()
('TEXT', '#Foo', (None, -1, -1))

Returns:	a TEXT event
Return type:	tuple

class genshi.core.Namespace(uri)¶

Utility class creating and testing elements with a namespace.

Internally, namespace URIs are encoded in the QName of any element or attribute, the namespace URI being enclosed in curly braces. This class helps create and test these strings.

A Namespace object is instantiated with the namespace URI.

>>> html = Namespace('http://www.w3.org/1999/xhtml')
>>> html
Namespace('http://www.w3.org/1999/xhtml')
>>> html.uri
u'http://www.w3.org/1999/xhtml'

The Namespace object can than be used to generate QName objects with that namespace:

>>> html.body
QName('http://www.w3.org/1999/xhtml}body')
>>> html.body.localname
u'body'
>>> html.body.namespace
u'http://www.w3.org/1999/xhtml'

The same works using item access notation, which is useful for element or attribute names that are not valid Python identifiers:

>>> html['body']
QName('http://www.w3.org/1999/xhtml}body')

A Namespace object can also be used to test whether a specific QName belongs to that namespace using the in operator:

>>> qname = html.body
>>> qname in html
True
>>> qname in Namespace('http://www.w3.org/2002/06/xhtml2')
False

class genshi.core.QName¶

A qualified element or attribute name.

The unicode value of instances of this class contains the qualified name of the element or attribute, in the form {namespace-uri}local-name. The namespace URI can be obtained through the additional namespace attribute, while the local name can be accessed through the localname attribute.

>>> qname = QName('foo')
>>> qname
QName('foo')
>>> qname.localname
u'foo'
>>> qname.namespace

>>> qname = QName('http://www.w3.org/1999/xhtml}body')
>>> qname
QName('http://www.w3.org/1999/xhtml}body')
>>> qname.localname
u'body'
>>> qname.namespace
u'http://www.w3.org/1999/xhtml'