7天温度爬取

本文介绍了一个简单的天气预报爬虫程序,该程序使用Python的requests和lxml库从指定网站抓取未来7天的天气信息,并处理了特殊情况下数据缺失的问题。

天气7天温度爬取

天气网址 "http://www.weather.com.cn/weather/101200101.shtml"

报错与python版本无关

晚上6点以后,今天的温度变成了一个,会报错超出范围

#!/usr/bin/env python
# -*- coding:utf-8 -*-
if __name__ == "__main__":
    import requests
    import json
    from lxml import etree
    url = "http://www.weather.com.cn/weather/101200101.shtml"
    a = requests.get(url)
    b = a.content.decode()
    html = etree.HTML(b)
    li_list = html.xpath(".//ul[@class = 't clearfix']/li")
    #print(li_list)
    weather_list = []

    for li in li_list:
        item = {}
       #日期
        dates = li.xpath("./h1/text()")[0]
    # 温度
        max_tem = li.xpath("./p[@class = 'tem']/span/text()")[0]
        min_tem = li.xpath("./p[@class = 'tem']/i/text()")[0]
        #替换
        item["min_tem"] = int(min_tem.replace("℃",""))
        item["max_tem"] = int(max_tem)
        item["dates"] = dates
        #print(itme)
    weather_list.append(item)
    #print(weather_list)

    with open("weater.json","w",encoding="utf-8") as f:
        json.dump(weather_list,f,ensure_ascii=False,indent=2)
    print("保存成功")

第一次修改,添加try语法,跳过超出范围的值。没有当天的值

import json
import requests
from lxml import etree
url = "http://www.weather.com.cn/weather/101200101.shtml"
ua = {'User-Agent': 'Mozilla/5.0' '(Windows NT 10.0; WOW64)' 'AppleWebKit/537.36' '(KHTML, like Gecko)' 'Chrome/86.0.4240.198 Safari/537.36'}
response=requests.get(url,headers=ua)
#print(response.content.decode())
content = response.content.decode()
html = etree.HTML(content)
li_list = html.xpath("//ul[@class='t clearfix']")
#print(li_list)
weather_list = []
for li in li_list:
    try:
        item = {}
        dates = li.xpath("./h1/text()")[0]
        max_tem = li.xpath("./p[@class='tem']/span/text()")[0]
        min_tem = li.xpath("./p[@class='tem']/i/text()")[0]
        item['min_tem'] = int(min_tem.replace("℃",""))
        item['max_tem'] = int(max_tem.replace("℃",""))
        item["dates"] = dates
    except IndexError:pass
    #print(item)
    weather_list.append(item)
print(weather_list)
with open("weather.json", "w", encoding="utf_8") as f:
    json.dump(weather_list, f, ensure_ascii=False, indent=2)
print("天气文件保存成功")

第二次修改,添加判断语句if 可以实现当天一个值,可以正常运行

#!/usr/bin/env python
# -*- coding:utf-8 -*-
if __name__ == "__main__":
    import requests
    import json
    from lxml import etree
    url = "http://www.weather.com.cn/weather/101200101.shtml"
    a = requests.get(url)
    b = a.content.decode()
    html = etree.HTML(b)
    li_list = html.xpath(".//ul[@class = 't clearfix']/li")
    # print(li_list)
    weather_list = []

    for li in li_list:
        item = {}
       #日期
        dates = li.xpath("./h1/text()")
    # 温度
        if not (li.xpath("./p[@class = 'tem']/span/text()")):
            min_tem = li.xpath("./p[@class = 'tem']/i/text()")
            # 替换
            item["min_tem"] = min_tem
            item["dates"] = dates
        else:
            max_tem = li.xpath("./p[@class = 'tem']/span/text()")
            min_tem = li.xpath("./p[@class = 'tem']/i/text()")
            #替换
            item["min_tem"] = min_tem
            item["max_tem"] = max_tem
            item["dates"] = dates
            #print(itme)
        weather_list.append(item)
    #print(weather_list)

    with open("weater2.json","w",encoding="utf-8") as f:
        json.dump(weather_list,f,ensure_ascii=False,indent=2)
    print("保存成功")

主要给自己记录

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值