genshi.util¶

Various utility classes and functions.

class genshi.util.LRUCache(capacity)¶

A dictionary-like object that stores only a certain number of items, and discards its least recently used item when full.

>>> cache = LRUCache(3)
>>> cache['A'] = 0
>>> cache['B'] = 1
>>> cache['C'] = 2
>>> len(cache)
3

>>> cache['A']
0

Adding new items to the cache does not increase its size. Instead, the least recently used item is dropped:

>>> cache['D'] = 3
>>> len(cache)
3
>>> 'B' in cache
False

Iterating over the cache returns the keys, starting with the most recently used:

>>> for key in cache:
...     print(key)
D
A
C

This code is based on the LRUCache class from myghtyutils.util, written by Mike Bayer and released under the MIT license. See:

http://svn.myghty.org/myghtyutils/trunk/lib/myghtyutils/util.py

genshi.util.flatten(items)¶

Flattens a potentially nested sequence into a flat list.

Parameters:	items – the sequence to flatten

>>> flatten((1, 2))
[1, 2]
>>> flatten([1, (2, 3), 4])
[1, 2, 3, 4]
>>> flatten([1, (2, [3, 4]), 5])
[1, 2, 3, 4, 5]

genshi.util.plaintext(text, keeplinebreaks=True)¶

Return the text with all entities and tags removed.

>>> plaintext('<b>1 &lt; 2</b>')
u'1 < 2'

The keeplinebreaks parameter can be set to False to replace any line breaks by simple spaces:

>>> plaintext('''<b>1
... &lt;
... 2</b>''', keeplinebreaks=False)
u'1 < 2'

Parameters:	text – the text to convert to plain text keeplinebreaks – whether line breaks in the text should be kept intact
Returns:	the text with tags and entities removed

genshi.util.stripentities(text, keepxmlentities=False)¶

Return a copy of the given text with any character or numeric entities replaced by the equivalent UTF-8 characters.

>>> stripentities('1 &lt; 2')
u'1 < 2'
>>> stripentities('more &hellip;')
u'more \u2026'
>>> stripentities('&#8230;')
u'\u2026'
>>> stripentities('&#x2026;')
u'\u2026'

If the keepxmlentities parameter is provided and is a truth value, the core XML entities (&, ', >, < and ") are left intact.

>>> stripentities('1 &lt; 2 &hellip;', keepxmlentities=True)
u'1 &lt; 2 \u2026'

genshi.util.striptags(text)¶

Return a copy of the text with any XML/HTML tags removed.

>>> striptags('<span>Foo</span> bar')
'Foo bar'
>>> striptags('<span class="bar">Foo</span>')
'Foo'
>>> striptags('Foo<br />')
'Foo'

HTML/XML comments are stripped, too:

>>> striptags('<!-- <blub>hehe</blah> -->test')
'test'

Parameters:	text – the string to remove tags from
Returns:	the text with tags removed