1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#!/usr/bin/python # -*- coding: utf-8 -*- from urllib import urlencode from urllib2 import urlopen from base64 import encode from hashlib import sha224 ERROR_MSG = "No image' IMG_URL = 'http://eur.i1.yimg.com/us.yimg.com/i/us/we/intl/26.gif' def hash_img(img): return sha224(img).hexdigest() def get_img(img_url): try: response = urlopen(img_url) img = response.read() return {'base64_img': img.encode('base64'), 'content-type': response.info()['Content-Type'], 'hash': hash_img(img)} except: return ERROR_MSG print get_img(IMG_URL)
Refactorings
No refactoring yet !
akaihola
October 9, 2008, October 09, 2008 11:22, permalink
You should probably be more specific when catching exceptions on line 23. Only catch exceptions which urllib2 is known to be able to throw in this case, and let other exceptions propagate back to the caller.
Whether to assign values to variables before constructing the dictionary is a matter of taste, since you're using the values only once.
See http://en.wikipedia.org/wiki/SHA for a comparison of different SHA algorithms. If there's no specific reason to do otherwise, I'd use either SHA-1 or MD5.
In case of error I'd like more to return a similar dictionary with None as values. An additional 'error' key could be used to indicate the error (None=no error, string=error description).
No need to import base64.encode or urllib.urlencode.
Since the functions are not specific to images, I'd rename identifiers accordingly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
#!/usr/bin/python # -*- coding: utf-8 -*- from urllib2 import urlopen, URLError from hashlib import sha1 ERROR_MSG = 'Image not found' IMG_URL = 'http://eur.i1.yimg.com/us.yimg.com/i/us/we/intl/26.gif' def create_hash(data): return sha1(data).hexdigest() def get_doc_info(url): result = dict.fromkeys(('base64_data', 'content-type', 'hash', 'error')) try: response = urlopen(url) data = response.read() result.update({'base64_data': data.encode('base64'), 'content-type': response.info()['Content-Type'], 'hash': create_hash(data)}) except URLError: result['error'] = ERROR_MSG return result print get_doc_info(IMG_URL)
Is this a correct a way to get an image, its content-type, encode it in base64 and get a hash from it ?
Should I assign variables to 'img.encode('base64')', 'response.info()['Content-Type']', 'hash_img(img)' and then pass them as dictionary values ?
Should I use a specific function to get the hash ?