数据采集概述
爬虫:批量化自动化从特定网页获取数据的脚本程序
Python爬虫技术
Python爬虫技能:
-
静态网页数据抓取(urllib/requests/BeautifulSoup/lxml)
-
动态网页数据抓取(ajax/phantomjs/selenium)
-
爬虫框架(scrapy)
-
补充知识:前端知识、数据库知识、文本处理技术
Python爬虫环境配置 -
平台:windows7/10
-
Python开发套件:anaconda 3.5以上(Python3.6)
-
MySQL数据库
-
mongoDB数据库
-
Navicat数据库客户端
-
PyCharm集成开发环境
-
chrome浏览器
Python爬虫四步基本框架 -
请求 urllib/requests
-
解析 BeautifulSoup/lxml
-
提取 css选择器/xpath表达式/正则表达式
-
存储 csv/MySQL/mongoDB等
urllib: python的标准库,提供了一系列操作URL的功能
直接使用urllib请求页面
from urllib.request import urlopen
url = "https://www.python.org/"
response = urlopen(url)
content = response.read()
# 需要解码
content = content.decode('utf-8')
print(content)
# 直接urlopen打开的方式太直白,有时候我们需要委婉一点进行请求
import urllib.request
url = "https://www.python.org/"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')
#print(content)
print(response.geturl())
print(response.info())
### 打印请求状态码
print(response.getcode())
print(type(response))
requests请求库
import requests
res = requests.get('https://www.python.org/')
print(res.status_code)
print(res.text)
#print(res.content)
设置请求头headers
url = 'https://www.python.org/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'
}
res = requests.get(url, headers=headers)
print(res)
requests请求方法
- get
- post
解析库 BeautifulSoup
Beautiful Soup 是一个可以从HTML或XML文件中提取数据的Python库.它能够通过你喜欢的转换器实现惯用的文档导航,查找,修改文档的方式.Beautiful Soup会帮你节省数小时甚至数天的工作时间.
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'
}
url = 'http://news.qq.com/'
Soup = BeautifulSoup(requests.get(url=url, headers=headers).text.encode("utf-8"), 'lxml')
em = Soup.find_all('em', attrs={'class': 'f14 l24'})
for i in em:
title = i.a.get_text()
link = i.a['href']
print({'标题': title,
'链接': link
})
{‘标题’: ‘人民日报:美国贸易政策给世界经济增添下行风险’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A06RBP.html’}
{‘标题’: ‘外媒关注广深港高铁开通 尝鲜乘客:我给它打9分’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A0QKU4.html’}
{‘标题’: ‘哈梅内伊:阅兵式袭击黑手是美在中东“傀儡国”’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A0CAG0.html’}
{‘标题’: ‘美智库公布2018全球军力排行:中国名列第几名?’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A0EKRA.html’}
{‘标题’: ‘南航一毕业生用300架无人机表白母校照亮南京上空’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923V0TUVP.html’}
{‘标题’: ‘升级!外交部、文化和旅游部提醒中国公民在瑞典注意安全’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0MZXW.html’}
{‘标题’: ‘美国发生一起入室行凶案 致2名中国留学生一死一伤’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923006000.html’}
{‘标题’: ‘中国驻澳大利亚使馆提醒赴澳中国公民注意换汇安全’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0ZWZN.html’}
{‘标题’: ‘鸿茅药酒风波后销量回升 “神药”的命为啥这么硬?’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0SZ68.html’}
{‘标题’: ‘iPhone XS四次摔落实验结果:前后玻璃完好无损’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/TEC2018092300765100’}
{‘标题’: ‘这两天为了捍卫领土主权,英国人叫嚣要与这个欧洲大国开战’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0MSEC.html’}
{‘标题’: ‘定性!俄国防部:俄伊尔-20被击落,以色列空军应负全责’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A16T9E.html’}
{‘标题’: ‘因为发了本国总统和特朗普的这张合影 电视台小编被开除’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0ORF5.html’}
{‘标题’: ‘20:30视频直播西汉姆vs切尔西 23时阿森纳vs埃弗顿’, ‘链接’: ‘http://new.qq.com/zt/template/?id=SPO2018072003223400’}
{‘标题’: ‘杨振宁:对中国科学家贡献的记载工作“一塌糊涂”’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A12ZA6.html’}
{‘标题’: ‘较真|放过码农吧!“代码不规范导致枪击案”是自媒体瞎编的’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A1A5KV.html’}
{‘标题’: ‘一口气吃完17家互联网公司的月饼 我真的想家了’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0IVKK.html’}
{‘标题’: ‘男子地铁里赤脚横躺座椅 乘客看到后默默将其鞋子踢出车厢’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A06XVJ.html’}
{‘标题’: ‘河南南召一村民羁押期间死亡 警方回应:初步认定因病死亡’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923B03ZYS.html’}
{‘标题’: ‘庄河一小区发生重大刑事案件 警方悬赏万元通缉嫌疑人’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0P2Y4.html’}
{‘标题’: ‘湖南安化三中学生出现疑似感染性腹泻病例 累计报告发病55例’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923007573.html’}
{‘标题’: ‘重庆农民揣大学文凭 20年间未摆脱做搬运工命运’, ‘链接’: ‘http://new.qq.com/omn/20180922/20180922A1624C.html’}
{‘标题’: ‘大家丨月薪4K妹子不愿嫁月薪15K外卖小哥 是有道理的’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923006755.html’}
{‘标题’: ‘今日话题丨“颜值即正义”的时代:是什么将女性推上求美之路’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923005457.html’}
{‘标题’: ‘贾樟柯:拍了二十年电影,才知道人为什么活着’, ‘链接’: ‘http://new.qq.com/omn/20180922/20180922A1OKHY.html’}
…
{‘标题’: ‘金鹰奖投票又反转,热巴破150万胡歌又进前三,李易峰有望第一’, ‘链接’: ‘http://new.qq.com/omn/20180923A0U23O.html’}
{‘标题’: ‘“富察皇后”秦岚再度抵港人气旺,帮捡话筒罩超暖心’, ‘链接’: ‘http://new.qq.com/omn/20180923A0U3EX.html’}
{‘标题’: ‘《如懿传》嘉贵妃失宠,挑唆儿子造反,被皇帝扇耳光’, ‘链接’: ‘http://new.qq.com/omn/20180923A0T1CS.html’}
{‘标题’: ‘暑期电视剧最红的6个配角,海兰排第3,第1无可争议’, ‘链接’: ‘http://new.qq.com/omn/20180923A0T69T.html’}
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings…
解析库 lxml
import requests
from lxml import etree
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
url = 'http://news.qq.com/'
html = requests.get(url = url, headers = headers)
con = etree.HTML(html.text)
title = con.xpath('//em[@class="f14 l24"]/a/text()')
link = con.xpath('//em[@class="f14 l24"]/a/@href')
for i in zip(title, link):
print({'标题': i[0],
'链接': i[1]
})
{‘标题’: ‘人民日报:美国贸易政策给世界经济增添下行风险’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A06RBP.html’}
{‘标题’: ‘外媒关注广深港高铁开通 尝鲜乘客:我给它打9分’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A0QKU4.html’}
{‘标题’: ‘哈梅内伊:阅兵式袭击黑手是美在中东“傀儡国”’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A0CAG0.html’}
{‘标题’: ‘美智库公布2018全球军力排行:中国名列第几名?’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923A0EKRA.html’}
{‘标题’: ‘南航一毕业生用300架无人机表白母校照亮南京上空’, ‘链接’: ‘https://new.qq.com/omn/20180923/20180923V0TUVP.html’}
{‘标题’: ‘升级!外交部、文化和旅游部提醒中国公民在瑞典注意安全’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0MZXW.html’}
{‘标题’: ‘美国发生一起入室行凶案 致2名中国留学生一死一伤’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923006000.html’}
{‘标题’: ‘中国驻澳大利亚使馆提醒赴澳中国公民注意换汇安全’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0ZWZN.html’}
{‘标题’: ‘鸿茅药酒风波后销量回升 “神药”的命为啥这么硬?’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0SZ68.html’}
{‘标题’: ‘iPhone XS四次摔落实验结果:前后玻璃完好无损’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/TEC2018092300765100’}
{‘标题’: ‘这两天为了捍卫领土主权,英国人叫嚣要与这个欧洲大国开战’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0MSEC.html’}
{‘标题’: ‘定性!俄国防部:俄伊尔-20被击落,以色列空军应负全责’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A16T9E.html’}
{‘标题’: ‘因为发了本国总统和特朗普的这张合影 电视台小编被开除’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0ORF5.html’}
{‘标题’: ‘20:30视频直播西汉姆vs切尔西 23时阿森纳vs埃弗顿’, ‘链接’: ‘http://new.qq.com/zt/template/?id=SPO2018072003223400’}
{‘标题’: ‘杨振宁:对中国科学家贡献的记载工作“一塌糊涂”’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A12ZA6.html’}
{‘标题’: ‘较真|放过码农吧!“代码不规范导致枪击案”是自媒体瞎编的’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A1A5KV.html’}
{‘标题’: ‘一口气吃完17家互联网公司的月饼 我真的想家了’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0IVKK.html’}
{‘标题’: ‘男子地铁里赤脚横躺座椅 乘客看到后默默将其鞋子踢出车厢’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A06XVJ.html’}
{‘标题’: ‘河南南召一村民羁押期间死亡 警方回应:初步认定因病死亡’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923B03ZYS.html’}
{‘标题’: ‘庄河一小区发生重大刑事案件 警方悬赏万元通缉嫌疑人’, ‘链接’: ‘http://new.qq.com/omn/20180923/20180923A0P2Y4.html’}
{‘标题’: ‘湖南安化三中学生出现疑似感染性腹泻病例 累计报告发病55例’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923007573.html’}
{‘标题’: ‘重庆农民揣大学文凭 20年间未摆脱做搬运工命运’, ‘链接’: ‘http://new.qq.com/omn/20180922/20180922A1624C.html’}
{‘标题’: ‘大家丨月薪4K妹子不愿嫁月薪15K外卖小哥 是有道理的’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923006755.html’}
{‘标题’: ‘今日话题丨“颜值即正义”的时代:是什么将女性推上求美之路’, ‘链接’: ‘http://new.qq.com/cmsn/20180923/20180923005457.html’}
{‘标题’: ‘贾樟柯:拍了二十年电影,才知道人为什么活着’, ‘链接’: ‘http://new.qq.com/omn/20180922/20180922A1OKHY.html’}
…
{‘标题’: ‘金鹰奖投票又反转,热巴破150万胡歌又进前三,李易峰有望第一’, ‘链接’: ‘http://new.qq.com/omn/20180923A0U23O.html’}
{‘标题’: ‘“富察皇后”秦岚再度抵港人气旺,帮捡话筒罩超暖心’, ‘链接’: ‘http://new.qq.com/omn/20180923A0U3EX.html’}
{‘标题’: ‘《如懿传》嘉贵妃失宠,挑唆儿子造反,被皇帝扇耳光’, ‘链接’: ‘http://new.qq.com/omn/20180923A0T1CS.html’}
{‘标题’: ‘暑期电视剧最红的6个配角,海兰排第3,第1无可争议’, ‘链接’: ‘http://new.qq.com/omn/20180923A0T69T.html’}
信息提取方式
- css选择器:select方法
- xpath表达式
- 正则表达式
# select method
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
url = 'http://news.qq.com/'
Soup = BeautifulSoup(requests.get(url=url, headers=headers).text.encode("utf-8"), 'lxml')
em = Soup.select('em[class="f14 l24"] a')
for i in em:
title = i.get_text()
link = i['href']
print({'标题': title,
'链接': link
})
# xpath表达式
import requests
import lxml.html as HTML
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
url = 'http://news.qq.com/'
con = HTML.fromstring(requests.get(url = url, headers = headers).text)
title = con.xpath('//em[@class="f14 l24"]/a/text()')
link = con.xpath('//em[@class="f14 l24"]/a/@href')
for i in zip(title, link):
print({'标题': i[0],'链接': i[1]
})
静态数据采集:拉勾网
# 导入相关库
import requests
from lxml import etree
import pandas as pd
from time import sleep
import random
# cookie
cookie = '你的cookie'
# headers
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
'Cookie': 'cookie'
}
# 查看网页结构循环页数进行采集
for i in range(1, 6):
sleep(random.randint(3, 10))
url = 'https://www.lagou.com/zhaopin/jiqixuexi/{}/?filterOption=3'.format(i)
print('正在抓取第{}页...'.format(i), url)
# 请求网页并解析
con = etree.HTML(requests.get(url=url, headers=headers).text)
# 使用xpath表达式抽取各目标字段
job_name = [i for i in con.xpath("//a[@class='position_link']/h3/text()")]
job_address = [i for i in con.xpath("//a[@class='position_link']/span/em/text()")]
job_company = [i for i in con.xpath("//div[@class='company_name']/a/text()")]
job_salary = [i for i in con.xpath("//span[@class='money']/text()")]
job_exp_edu = [i for i in con.xpath("//div[@class='li_b_l']/text()")]
job_exp_edu2 = [i for i in [i.strip() for i in job_exp_edu] if i != '']
job_industry = [i for i in con.xpath("//div[@class='industry']/text()")]
job_tempation = [i for i in con.xpath("//div[@class='list_item_bot']/div[@class='li_b_r']/text()")]
job_links = [i for i in con.xpath("//div[@class='p_top']/a/@href")]
# 获取详情页链接后采集详情页岗位描述信息
job_des = []
for link in job_links:
sleep(random.randint(3, 10))
#print(link)
con2 = etree.HTML(requests.get(url=link, headers=headers).text)
des = [[i.xpath('string(.)') for i in con2.xpath("//dd[@class='job_bt']/div/p")]]
job_des += des
break
# 对数据进行字典封装
dataset = {
'岗位名称': job_name,
'工作地址': job_address,
'公司': job_company,
'薪资': job_salary,
'经验学历': job_exp_edu2,
'所属行业': job_industry,
'岗位福利': job_tempation,
'任职要求': job_des
}
# 转化为数据框并存为csv
data = pd.DataFrame(dataset)
data.to_csv('machine_learning_hz_job2.csv')
# 函数化封装
import requests
from lxml import etree
import pandas as pd
from time import sleep
import random
def static_crawl():
cookie = '你的cookie'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
'Cookie': 'cookie'
}
for i in range(1, 7):
sleep(random.randint(3, 10))
url = 'https://www.lagou.com/zhaopin/jiqixuexi/{}/?filterOption=3'.format(i)
print('正在抓取第{}页...'.format(i), url)
con = etree.HTML(requests.get(url=url, headers=headers).text)
job_name = [i for i in con.xpath("//a[@class='position_link']/h3/text()")]
job_address = [i for i in con.xpath("//a[@class='position_link']/span/em/text()")]
job_company = [i for i in con.xpath("//div[@class='company_name']/a/text()")]
job_salary = [i for i in con.xpath("//span[@class='money']/text()")]
job_exp_edu = [i for i in con.xpath("//div[@class='li_b_l']/text()")]
job_exp_edu2 = [i for i in [i.strip() for i in job_exp_edu] if i != '']
job_industry = [i for i in con.xpath("//div[@class='industry']/text()")]
job_tempation = [i for i in con.xpath("//div[@class='list_item_bot']/div[@class='li_b_r']/text()")]
job_links = [i for i in con.xpath("//div[@class='p_top']/a/@href")]
job_des = []
for link in job_links:
sleep(random.randint(3, 10))
#print(link)
con2 = etree.HTML(requests.get(url=link, headers=headers).text)
des = [[i.xpath('string(.)') for i in con2.xpath("//dd[@class='job_bt']/div/p")]]
job_des += des
lagou_dict = {
'岗位名称': job_name,
'工作地址': job_address,
'公司': job_company,
'薪资': job_salary,
'经验学历': job_exp_edu2,
'所属行业': job_industry,
'岗位福利': job_tempation,
'任职要求': job_des
}
crawl_data = pd.DataFrame(lagou_dict)
data.to_csv('machine_learning_hz_job2.csv')
return crawl_data
动态数据采集:拉勾网
import json
import time
import requests
from bs4 import BeautifulSoup
import pandas as pd
#定义抓取主函数
def lagou_dynamic_crawl():
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
'Host':'www.lagou.com',
'Referer':'https://www.lagou.com/jobs/list_%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0?px=default&city=%E5%85%A8%E5%9B%BD',
'X-Anit-Forge-Code':'0',
'X-Anit-Forge-Token':None,
'X-Requested-With':'XMLHttpRequest',
'Cookie': '你的cookie'
}
#创建一个职位列表容器
positions = []
#30页循环遍历抓取
for page in range(1, 31):
print('正在抓取第{}页...'.format(page))
#构建请求表单参数
params = {
'first':'true',
'pn':page,
'kd':'数据挖掘'
}
#构造请求并返回结果
result = requests.post('https://www.lagou.com/jobs/positionAjax.json?px=default&needAddtionalResult=false',
headers=headers, data=params)
#将请求结果转为json
json_result = result.json()
#解析json数据结构获取目标信息
position_info = json_result['content']['positionResult']['result']
#循环当前页每一个职位信息,再去爬职位详情页面
for position in position_info:
#把我们要爬取信息放入字典
position_dict = {
'position_name':position['positionName'],
'work_year':position['workYear'],
'education':position['education'],
'salary':position['salary'],
'city':position['city'],
'company_name':position['companyFullName'],
'address':position['businessZones'],
'label':position['companyLabelList'],
'stage':position['financeStage'],
'size':position['companySize'],
'advantage':position['positionAdvantage'],
'industry':position['industryField'],
'industryLables':position['industryLables']
}
#找到职位 ID
position_id = position['positionId']
#根据职位ID调用岗位描述函数获取职位JD
position_dict['position_detail'] = recruit_detail(position_id)
positions.append(position_dict)
time.sleep(4)
print('全部数据采集完毕。')
return positions
#定义抓取岗位描述函数
def recruit_detail(position_id):
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
'Host':'www.lagou.com',
'Referer':'https://www.lagou.com/jobs/list_%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0?labelWords=&fromSearch=true&suginput=',
'Upgrade-Insecure-Requests':'1',
'Cookie': '你的cookie'
}
url = 'https://www.lagou.com/jobs/%s.html' % position_id
result = requests.get(url, headers=headers)
time.sleep(5)
#解析职位要求text
soup = BeautifulSoup(result.text, 'html.parser')
job_jd = soup.find(class_="job_bt")
#通过尝试发现部分记录描述存在空的情况
#所以这里需要判断处理一下
if job_jd != None:
job_jd = job_jd.text
else:
job_jd = 'null'
return job_jd
if __name__ == '__main__':
positions = lagou_dynamic_crawl()
正在抓取第1页…
正在抓取第2页…
正在抓取第3页…
正在抓取第4页…
正在抓取第5页…
正在抓取第6页…
正在抓取第7页…
正在抓取第8页…
正在抓取第9页…
正在抓取第10页…
正在抓取第11页…
正在抓取第12页…
正在抓取第13页…
正在抓取第14页…
正在抓取第15页…
正在抓取第16页…
正在抓取第17页…
正在抓取第18页…
正在抓取第19页…
正在抓取第20页…
正在抓取第21页…
正在抓取第22页…
正在抓取第23页…
正在抓取第24页…
正在抓取第25页…
…
正在抓取第28页…
正在抓取第29页…
正在抓取第30页…
全部数据采集完毕。
df = pd.DataFrame(positions)
df.shape
df.to_csv('data_mining_hz.csv')
——数据采集与爬取&spm=1001.2101.3001.5002&articleId=139905751&d=1&t=3&u=c4a07507d4a34fa192a6b3db0f1fa0fb)
1262

被折叠的 条评论
为什么被折叠?



