Parsing .asx playlist files in python

Michael L Torrie torriem at chem.byu.edu
Fri Sep 28 21:29:30 MDT 2007


So I'm trying to parse asx playlist files in python and then, using
player, download each stream in the playlist and dump it to a wav file
which I will encode to mp3 or ogg.  I've run into problems.  Rather than
brute-force parse the playlist, I thought I'd just use an XML parser,
since it appears to be xml. Turns out it is not.  It's
almost-but-not-quite-xml.  Totem appears to have a generic playlist
parser in library form that I can use that would be very slick, since it
handles all the urls automatically.  There's an example program in
python that shows how it works:

http://www.koders.com/python/fid0BEF6F4DF8160B66BE169648C2085F4963D3A694.aspx?s=%22smtp+server%22

Sadly, though, totem's parser appears to either be thinking the playlist
really is xml, or else they expect the playlist to be a bit more
well-formed than it is.  Because totem's parser fails miserably for
NPR's playlists.  Even Totem the player won't play them.  However
mplayer and xine will (must have their own parsers), however those
parsers aren't in library form for use from python.

Here's an example that fails on totem on my machine:
$ totem
"http://www.npr.org/templates/dmg/dmg_wmref_em.php?id=2&type=2&date=28-Sep-2007&au=1&pid=36064625&random=3317012573&guid=00056CB93AD1065B53CE3F6861626364&upf=Linux%20i686&v1st=B593B169B3D5A462&mtype=WM&ssid=&topicName=&subtopicName=&prgCode=ATC&hubId=&thingId=&tableModifier="Entity:
line 2: parser error : EntityRef: expecting ';'
<ENTRYREF
href="http://www.npr.org/templates/dmg/dmg_em.php?id=2&type=2&date=28-
                                                                     ^
Entity: line 2: parser error : EntityRef: expecting ';'
<ENTRYREF
href="http://www.npr.org/templates/dmg/dmg_em.php?id=2&type=2&date=28-

Besides ad-hoc parsing the thing, does anyone have any better ideas?

thanks.

Michael




More information about the PLUG mailing list