7天温度爬取

最新推荐文章于 2026-02-25 08:56:28 发布

原创最新推荐文章于 2026-02-25 08:56:28 发布 · 286 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#python #开发语言 #爬虫

本文介绍了一个简单的天气预报爬虫程序，该程序使用Python的requests和lxml库从指定网站抓取未来7天的天气信息，并处理了特殊情况下数据缺失的问题。

天气7天温度爬取

天气网址 "http://www.weather.com.cn/weather/101200101.shtml"

报错与python版本无关

晚上6点以后，今天的温度变成了一个，会报错超出范围

#!/usr/bin/env python
# -*- coding:utf-8 -*-
if __name__ == "__main__":
    import requests
    import json
    from lxml import etree
    url = "http://www.weather.com.cn/weather/101200101.shtml"
    a = requests.get(url)
    b = a.content.decode()
    html = etree.HTML(b)
    li_list = html.xpath(".//ul[@class = 't clearfix']/li")
    #print(li_list)
    weather_list = []

    for li in li_list:
        item = {}
       #日期
        dates = li.xpath("./h1/text()")[0]
    # 温度
        max_tem = li.xpath("./p[@class = 'tem']/span/text()")[0]
        min_tem = li.xpath("./p[@class = 'tem']/i/text()")[0]
        #替换
        item["min_tem"] = int(min_tem.replace("℃",""))
        item["max_tem"] = int(max_tem)
        item["dates"] = dates
        #print(itme)
    weather_list.append(item)
    #print(weather_list)

    with open("weater.json","w",encoding="utf-8") as f:
        json.dump(weather_list,f,ensure_ascii=False,indent=2)
    print("保存成功")

第一次修改，添加try语法，跳过超出范围的值。没有当天的值

import json
import requests
from lxml import etree
url = "http://www.weather.com.cn/weather/101200101.shtml"
ua = {'User-Agent': 'Mozilla/5.0' '(Windows NT 10.0; WOW64)' 'AppleWebKit/537.36' '(KHTML, like Gecko)' 'Chrome/86.0.4240.198 Safari/537.36'}
response=requests.get(url,headers=ua)
#print(response.content.decode())
content = response.content.decode()
html = etree.HTML(content)
li_list = html.xpath("//ul[@class='t clearfix']")
#print(li_list)
weather_list = []
for li in li_list:
    try:
        item = {}
        dates = li.xpath("./h1/text()")[0]
        max_tem = li.xpath("./p[@class='tem']/span/text()")[0]
        min_tem = li.xpath("./p[@class='tem']/i/text()")[0]
        item['min_tem'] = int(min_tem.replace("℃",""))
        item['max_tem'] = int(max_tem.replace("℃",""))
        item["dates"] = dates
    except IndexError:pass
    #print(item)
    weather_list.append(item)
print(weather_list)
with open("weather.json", "w", encoding="utf_8") as f:
    json.dump(weather_list, f, ensure_ascii=False, indent=2)
print("天气文件保存成功")

第二次修改，添加判断语句if 可以实现当天一个值，可以正常运行

#!/usr/bin/env python
# -*- coding:utf-8 -*-
if __name__ == "__main__":
    import requests
    import json
    from lxml import etree
    url = "http://www.weather.com.cn/weather/101200101.shtml"
    a = requests.get(url)
    b = a.content.decode()
    html = etree.HTML(b)
    li_list = html.xpath(".//ul[@class = 't clearfix']/li")
    # print(li_list)
    weather_list = []

    for li in li_list:
        item = {}
       #日期
        dates = li.xpath("./h1/text()")
    # 温度
        if not (li.xpath("./p[@class = 'tem']/span/text()")):
            min_tem = li.xpath("./p[@class = 'tem']/i/text()")
            # 替换
            item["min_tem"] = min_tem
            item["dates"] = dates
        else:
            max_tem = li.xpath("./p[@class = 'tem']/span/text()")
            min_tem = li.xpath("./p[@class = 'tem']/i/text()")
            #替换
            item["min_tem"] = min_tem
            item["max_tem"] = max_tem
            item["dates"] = dates
            #print(itme)
        weather_list.append(item)
    #print(weather_list)

    with open("weater2.json","w",encoding="utf-8") as f:
        json.dump(weather_list,f,ensure_ascii=False,indent=2)
    print("保存成功")

主要给自己记录