Pythonurllib2.openurl()

356 views 0

2016-01-29 Python 356 views 0

Python 爬取网页

url = 'http://mimvp.com'
req = urllib2.Request(url)
content = urllib2.urlopen(req, timeout=600).read()
content = bs4.BeautifulSoup(content)
content = content.prettify()

�7�e����0*"I߷�G�H����F������9-������;��E�YÞBs���������㔶?�4i���)�����^W�����`w�Ke��%��*9�.

异常提示信息：

/usr/local/lib/python2.7/dist-packages/bs4/dammit.py:231: UnicodeWarning: Some characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.
  "Some characters could not be decoded, and were "

解决方案：

headers = {     
                'Use-Agent'          :   'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36',
                'Accept-Encoding'    :   'gzip, deflate, sdch',
		  }

url = 'http://mimvp.com'
req = urllib2.Request(url, headers=headers)
content = urllib2.urlopen(req, timeout=600).read()

try:
    content = gzip.GzipFile('', 'rb', 9, StringIO.StringIO(content))
    content = content.read()
except:
    content = StringIO.StringIO(zlib.decompress(content))
    content = content.read()

content = bs4.BeautifulSoup(content)
content = content.prettify()

参考推荐：

Url open encoding （stackoverflow）

侵权处理: 本个人博客，不盈利，若侵犯了您的作品权，请联系博主删除，莫恶意，索钱财，感谢！

转载注明: Python urllib2.openurl() （米扑博客）

原文链接: https://blog.mimvp.com/article/12414.html

分享到：

米扑博客

Most Valuable Package of Mobile Internet

标签云

打赏赞助

访客统计

分类 (24)

归档 (192)

友情链接

Python urllib2.openurl()

发表评论