天气7天温度爬取
天气网址 "http://www.weather.com.cn/weather/101200101.shtml"
报错与python版本无关
晚上6点以后,今天的温度变成了一个,会报错超出范围
#!/usr/bin/env python
# -*- coding:utf-8 -*-
if __name__ == "__main__":
import requests
import json
from lxml import etree
url = "http://www.weather.com.cn/weather/101200101.shtml"
a = requests.get(url)
b = a.content.decode()
html = etree.HTML(b)
li_list = html.xpath(".//ul[@class = 't clearfix']/li")
#print(li_list)
weather_list = []
for li in li_list:
item = {}
#日期
dates = li.xpath("./h1/text()")[0]
# 温度
max_tem = li.xpath("./p[@class = 'tem']/span/text()")[0]
min_tem = li.xpath("./p[@class = 'tem']/i/text()")[0]
#替换
item["min_tem"] = int(min_tem.replace("℃",""))
item["max_tem"] = int(max_tem)
item["dates"] = dates
#print(itme)
weather_list.append(item)
#print(weather_list)
with open("weater.json","w",encoding="utf-8") as f:
json.dump(weather_list,f,ensure_ascii=False,indent=2)
print("保存成功")
第一次修改,添加try语法,跳过超出范围的值。没有当天的值
import json
import requests
from lxml import etree
url = "http://www.weather.com.cn/weather/101200101.shtml"
ua = {'User-Agent': 'Mozilla/5.0' '(Windows NT 10.0; WOW64)' 'AppleWebKit/537.36' '(KHTML, like Gecko)' 'Chrome/86.0.4240.198 Safari/537.36'}
response=requests.get(url,headers=ua)
#print(response.content.decode())
content = response.content.decode()
html = etree.HTML(content)
li_list = html.xpath("//ul[@class='t clearfix']")
#print(li_list)
weather_list = []
for li in li_list:
try:
item = {}
dates = li.xpath("./h1/text()")[0]
max_tem = li.xpath("./p[@class='tem']/span/text()")[0]
min_tem = li.xpath("./p[@class='tem']/i/text()")[0]
item['min_tem'] = int(min_tem.replace("℃",""))
item['max_tem'] = int(max_tem.replace("℃",""))
item["dates"] = dates
except IndexError:pass
#print(item)
weather_list.append(item)
print(weather_list)
with open("weather.json", "w", encoding="utf_8") as f:
json.dump(weather_list, f, ensure_ascii=False, indent=2)
print("天气文件保存成功")
第二次修改,添加判断语句if 可以实现当天一个值,可以正常运行
#!/usr/bin/env python
# -*- coding:utf-8 -*-
if __name__ == "__main__":
import requests
import json
from lxml import etree
url = "http://www.weather.com.cn/weather/101200101.shtml"
a = requests.get(url)
b = a.content.decode()
html = etree.HTML(b)
li_list = html.xpath(".//ul[@class = 't clearfix']/li")
# print(li_list)
weather_list = []
for li in li_list:
item = {}
#日期
dates = li.xpath("./h1/text()")
# 温度
if not (li.xpath("./p[@class = 'tem']/span/text()")):
min_tem = li.xpath("./p[@class = 'tem']/i/text()")
# 替换
item["min_tem"] = min_tem
item["dates"] = dates
else:
max_tem = li.xpath("./p[@class = 'tem']/span/text()")
min_tem = li.xpath("./p[@class = 'tem']/i/text()")
#替换
item["min_tem"] = min_tem
item["max_tem"] = max_tem
item["dates"] = dates
#print(itme)
weather_list.append(item)
#print(weather_list)
with open("weater2.json","w",encoding="utf-8") as f:
json.dump(weather_list,f,ensure_ascii=False,indent=2)
print("保存成功")
主要给自己记录
本文介绍了一个简单的天气预报爬虫程序,该程序使用Python的requests和lxml库从指定网站抓取未来7天的天气信息,并处理了特殊情况下数据缺失的问题。

6016

被折叠的 条评论
为什么被折叠?



