Sean O'Donnells Weblog
I've been working on an application that needs to convert old fashioned HTML to XHTML. Almost all of the libraries I have come across that do a good job are just wrappers around HTMLTidy, and very few of them can simply be installed with a simple apt get. I did not feel particularly good requiring them as a result. I hate to think of someone tearing their hair out just trying to get my stuff installed before they can use it.
twisted.web came to my rescue. Its microdom module does a good job, and is a simple apt-get away on Ubuntu and installs easily on Debian Sarge when it comes to installing it.
Check out the following example
>>> from twisted.web import microdom
>>> x = microdom.parseString("<div>hello<br>world</DIV>",beExtremelyLenient=1)
>>> x.toprettyxml()
'<?xml version="1.0"?><div>hello<br />world</div>'
Short and sweet. There is an excellent introduction article on XML.com. So thank you to the twisted.web people.
Share on Twitter Share on Facebook
Comments
New Comment