Stephen的SEM博客

python究极精简查询百度关键词排名代码9行

2019年12月11日 | 标签:

———————————————————————————————————————–

内容：

#-*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import re,urllib2,time
key=”杀价”
url=”elong.com”
key=urllib2.quote(key)
t=time.time()
html=urllib2.urlopen(“http://www.baidu.com/s?word=%s” %key).read()
soup=BeautifulSoup(html)
cache=soup.find(“span”,text=re.compile(“%s” %url))
print cache.find_previous(“table”).get(“id”)
print cache.find_previous(“a”).get(“href”)
print urllib2.urlopen(cache.find_previous(“a”).get(“href”)).geturl()
print cache.get_text().split(” “)[3]
print time.time()-t

窍门使用find查找span标签中位置包含网址的然后使用find previous 查找table 使用get 返回id值

python 查找跳转后的url 使用urllib2.open(url).geturl()这个方法 open是打开geturl是得到url

使用变量替换是%S 然后在引号外面紧接些 % 变量名称

查文字是get_text()就可以再用split分开

窍门使用find查找span标签中位置包含网址的然后使用find previous 查找table 使用get 返回id值

python 查找跳转后的url 使用urllib2.open(url).geturl()这个方法 open是打开geturl是得到url

使用变量替换是%S 然后在引号外面紧接些 % 变量名称

查文字是get_text()就可以再用split分开

发表评论 | Trackback

目前还没有任何评论.

« python抓取51job公司名称招聘职位以及网址极速版 python抓取百度结果中的排名和网址极速版图书馆泡8个小时的成果 »

置顶

python究极精简查询百度关键词排名代码9行

近期评论

近期文章

归档

分类