import re
def process_html(str):
pattern = re.compile('<object ([\w="\d+"]|\s)+>([\x20-\x7E\s])+</object>')
match = pattern.match(str)
return match.group()
Refactorings
No refactoring yet !
nicerobot
March 17, 2010, March 17, 2010 12:46, permalink
1. Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses http://wiki.tcl.tk/4164
2. Your code refers to matches as 'm' (line 6) but the matches are named 'match' (line 5).
3. Groups are referenced by specifying the (one-based) group number.
Note: I didn't change your re. I just changed lines 5 and 6.
import re
def process_html(str):
pattern = re.compile('<object ([\w="\d+"]|\s)+>([\x20-\x7E\s])+</object>')
m = pattern.match(str)
return m.group(1)
rullon.myopenid.com
March 17, 2010, March 17, 2010 13:33, permalink
2nicerobot, thx for reply!
goal was to clean vimeo(or any other service) embed player code. so i decided to not parse anything.
we have:
--------
<object width="400" height="300"><param name="allowfullscreen" value="true" />
<param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=9851483&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1" />
<embed src="http://vimeo.com/moogaloop.swf?clip_id=9851483&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed>
</object>
<p><a href="http://vimeo.com/9851483">Gorillaz - Stylo</a> from <a href="http://vimeo.com/uccimaru">mario ucci</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
we want:
--------
<object width="400" height="300"><param name="allowfullscreen" value="true" />
<param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=9851483&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1" />
<embed src="http://vimeo.com/moogaloop.swf?clip_id=9851483&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed>
</object>
Is it good way to extract pattern match from string?