Python爬虫笔记三:微博登录(出师未捷身先死 长使英雄泪满襟)

Python3.8

Python3.8

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

学习地址:https://www.cnblogs.com/xiao-apple36/articles/8768270.html

完整地址:https://www.cnblogs.com/xiao-apple36/p/8747351.html

以下是学习内容:

具体信息:

https://login.sina.com.cn/sso/prelogin.php

#常规
请求 URL: https://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=d2hiZXN0c29mdCU0MDE2My5jb20%3D&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.19)&_=1629269267519
请求方法: GET
状态代码: 200 OK
远程地址: 58.63.236.212:443
引用站点策略: strict-origin-when-cross-origin

#响应头
Cache-Control: no-cache, must-revalidate
Connection: keep-alive
Content-Type: application/javascript; charset=utf-8
Date: Wed, 18 Aug 2021 06:47:46 GMT
DPOOL_HEADER: dryad61
Expires: Sat, 26 Jul 1997 05:00:00 GMT
P3P: CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"
Pragma: no-cache
Pragma: no-cache
Server: nginx/1.6.1
Transfer-Encoding: chunked

#请求头
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Connection: keep-alive
Host: login.sina.com.cn
Referer: https://weibo.com/
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"
sec-ch-ua-mobile: ?0
Sec-Fetch-Dest: script
Sec-Fetch-Mode: no-cors
Sec-Fetch-Site: cross-site
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73

#查询字符串数
entry: weibo
callback: sinaSSOController.preloginCallBack
su: d2hiZXN0c29mdCU0MDE2My5jb20=
rsakt: mod
checkpin: 1
client: ssologin.js(v1.4.19)
_: 1629269267519

https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)

#常规
请求 URL: https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)
请求方法: POST
状态代码: 200 OK
远程地址: 58.63.236.212:443
引用站点策略: strict-origin-when-cross-origin

#响应头
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: https://weibo.com
Connection: keep-alive
Content-Encoding: gzip
Content-Type: text/html
Date: Wed, 18 Aug 2021 06:47:50 GMT
DPOOL_HEADER: dryad52
P3P: CP="CURa ADMa DEVa PSAo PSDo OUR BUS UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"
Pragma: no-cache
Server: nginx/1.6.1
Transfer-Encoding: chunked
Vary: Accept-Encoding

#请求头
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 636
Content-Type: application/x-www-form-urlencoded
Host: login.sina.com.cn
Origin: https://weibo.com
Referer: https://weibo.com/
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"
sec-ch-ua-mobile: ?0
Sec-Fetch-Dest: iframe
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: cross-site
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73

#查询字符串数
client: ssologin.js(v1.4.19)

#表单数据
entry: weibo
gateway: 1
from: 
savestate: 7
qrcode_flag: false
useticket: 1
pagerefer: 
vsnf: 1
su: d2hiZXN0c29mdCU0MDE2My5jb20=
service: miniblog
servertime: 1629269272
nonce: XL9ZA7
pwencode: rsa2
rsakv: 1330428213
sp: c7aeb0d1a4212ec69daa2943c1eef5ecae9bf04c490657b18b7759a62cfb193667446933d75af0c96e69a4328e63b842256bd9ed6a2fff933caf5ac7bc6d2b91d4fa3d5ed8e609690d8cfb7074ddecd71ebcec8050797b037ac1ecd1f3ee64f52ed721df53434e2993da42fd66248af2137a3de26bbc956371e24e38a214dcb7
sr: 2048*1152
encoding: UTF-8
prelt: 28
url: https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack
returntype: META

这个是反回信息, 目当可能进行不下去了,现在微博加了一个短信或扫码验证,好在验证了上面的都OK

		<html>
		<head>
		<title>新浪通行证</title>
		<meta http-equiv="refresh" content="0; url=&#39;https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=2071&reason=%C7%EB%CA%B9%D3%C3%C9%A8%C2%EB%B5%C7%C2%BC&protection_url=https%3A%2F%2Fpassport.weibo.com%2Fprotection%2Findex%3Ftoken%3D2OTFhHLTcAFCwq_w1anCKzI5t8F1gYph1CnByb3RlY3Rpb24.&#39;"/>
		<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
		</head>
		<body bgcolor="#ffffff" text="#000000" link="#0000cc" vlink="#551a8b" alink="#ff0000">
		<script type="text/javascript" language="javascript">
		location.replace("https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=2071&reason=%C7%EB%CA%B9%D3%C3%C9%A8%C2%EB%B5%C7%C2%BC&protection_url=https%3A%2F%2Fpassport.weibo.com%2Fprotection%2Findex%3Ftoken%3D2OTFhHLTcAFCwq_w1anCKzI5t8F1gYph1CnByb3RlY3Rpb24.");
		</script>
		</body>
		</html>

 https://weibo.com/ajaxlogin.php

#常规
请求 URL: https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3
请求方法: GET
状态代码: 200 
远程地址: 180.149.153.187:443
引用站点策略: strict-origin-when-cross-origin

#响应头
cache-control: no-cache, must-revalidate
content-encoding: gzip
content-security-policy: block-all-mixed-content;
content-type: text/html; charset=utf-8
date: Wed, 18 Aug 2021 06:47:51 GMT
dpool_header: mapi-weibocom-ug-1-79db94d59-zv67v
expires: Mon, 26 Jul 1997 05:00:00 GMT
last-modified: Wed, 18 Aug 2021 06:47:51 GMT
lb: 180.149.153.187
pramga: no-cache
server: nginx
ssl_node: mweibo-172-16-138-207.yf.intra.weibo.cn
vary: Accept-Encoding

#请求头
:authority: weibo.com
:method: GET
:path: /ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
cookie: SUB=_2AkMWQCjGf8NxqwJRmf0XyG7ib4lwzw3EieKgHNkdJRMxHRl-yj8XqlBatRB6PcAGLUW3iSNyJZOD3zft19l_o65mAXlU
if-modified-since: Wed, 18 Aug 2021 06:43:50 GMT
referer: https://login.sina.com.cn/
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"
sec-ch-ua-mobile: ?0
sec-fetch-dest: iframe
sec-fetch-mode: navigate
sec-fetch-site: cross-site
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73

#查询字符串数
framelogin: 1
callback: parent.sinaSSOController.feedBackUrlCallBack
sudaref: weibo.com
display: 0
retcode: 101
reason: (无法对值进行解码)

完整代码

#!/usr/bin/env python
import re
import requests
import time
import urllib3
import base64
import json
import rsa
from binascii import b2a_hex
from urllib.parse import quote_plus,unquote_plus
from bs4 import BeautifulSoup


class Weibo_login():

    def __init__(self, user, pwd):
        urllib3.disable_warnings()  # 关闭警告
        self.session = requests.Session()
        self.session.verify = False  # 忽略证书认证
        self.session.headers = {
            'Accept': '*/*',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
            'Connection': 'keep-alive',
            'Host': 'login.sina.com.cn',
            'Referer': 'https://weibo.com/',
            'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"',
            'sec-ch-ua-mobile': '?0',
            'Sec-Fetch-Dest': 'script',
            'Sec-Fetch-Mode': 'no-cors',
            'Sec-Fetch-Site': 'cross-site',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73',
        }
        self.session.headers2 = {
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
            'Cache-Control': 'max-age=0',
            'Connection': 'keep-alive',
            #'Content-Length': '636',
            'Content-Type': 'application/x-www-form-urlencoded',
            'Host': 'login.sina.com.cn',
            'Origin': 'https://weibo.com',
            'Referer': 'https://weibo.com/',
            'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"',
            'sec-ch-ua-mobile': '?0',
            'Sec-Fetch-Dest': 'iframe',
            'Sec-Fetch-Mode': 'navigate',
            'Sec-Fetch-Site': 'cross-site',
            'Sec-Fetch-User': '?1',
            'Upgrade-Insecure-Requests': '1',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73',
        }
        self.session.headers3 = {
            #':authority': 'weibo.com',
            #':method': 'GET',
            #':path': '/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&sudaref=weibo.com&display=0&retcode=101&reason=%B5%C7%C2%BC%C3%FB%BB%F2%C3%DC%C2%EB%B4%ED%CE%F3',
            #':scheme': 'https',
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
            #'cookie': 'SUB=_2AkMWQCjGf8NxqwJRmf0XyG7ib4lwzw3EieKgHNkdJRMxHRl-yj8XqlBatRB6PcAGLUW3iSNyJZOD3zft19l_o65mAXlU',
            'if-modified-since': 'Wed, 18 Aug 2021 06:43:50 GMT',
            'referer': 'https://login.sina.com.cn/',
            'sec-ch-ua': '"Chromium";v="92", " Not A;Brand";v="99", "Microsoft Edge";v="92"',
            'sec-ch-ua-mobile': '?0',
            'sec-fetch-dest': 'iframe',
            'sec-fetch-mode': 'navigate',
            'sec-fetch-site': 'cross-site',
            'upgrade-insecure-requests': '1',
            'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 Edg/92.0.902.73',

        }
        self.user = user
        self.pwd = pwd

        pass

    def get_Time(self):
        '''
        get time str
        :return:
        '''
        return str(int(time.time() * 1000))

    def get_server_data(self):
        '''
         access pre_login_url
         get

        :return:
        '''
        data_dict = {
            'entry': 'weibo',
            'callback': 'sinaSSOController.preloginCallBack',
            'su': self.get_username(),
            'rsakt': 'mod',
            'checkpin': '1',
            'client': 'ssologin.js(v1.4.19)',
            '_': self.get_Time()   #类似:1629268542865
        }

        pre_login_url = 'https://login.sina.com.cn/sso/prelogin.php?'
        response = self.session.get(pre_login_url, headers=self.session.headers, params=data_dict,
                                    verify=self.session.verify)
        # print(response.text)
        if response.status_code == 200:
            html = response.text
            if html:
                json_data = re.findall(r'sinaSSOController.preloginCallBack\((.*?)\)', html)
                # 正则匹配sinaSSOController.preloginCallBack()
                json_dict = json.loads(json_data[0])  # 把json str转换为字典
                # print(json_dict)
                self.servertime = json_dict['servertime']
                self.nonce = json_dict['nonce']
                self.rsakv = json_dict['rsakv']
                self.exectime = json_dict['exectime']
                self.pubkey = json_dict['pubkey']

                print('get_server_data servertime={} nonce={} rsakv={}'.format(self.servertime, self.nonce, self.rsakv))
            else:
                print('data is null')

        else:
            print('get_server_data response html error !!!')

    def login(self):
        """
        login weibo
        :return:
        """
        # preloginTimeStart = int(time.time()*1000)
        # temp_url = 'https://passport.weibo.com/visitor/visitor?entry=miniblog&a=enter&url=https%3A%2F%2Fweibo.com%2F&domain=.weibo.com&ua=php-sso_sdk_client-0.6.23&_rand=1523284754.9734'
        # parse_url = quote_plus(temp_url)  # 解码url
        # print(parse_url)
        # preloginTime = abs((int(time.time()*1000) - preloginTimeStart - self.exectime))  # 得到prelt

        login_url = 'https://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.19)'
        # login url
        username = self.get_username()  # get user name
        print('username base64=', username)

        pwd = self.get_pwd()
        print('pwd rsa =', pwd)

        data_dict = {
            'entry': 'weibo',
            'gateway': '1',
            'from': '',
            'savestate': '7',
            'qrcode_flag': 'false',
            'useticket': '1',
            # 'pagerefer':parse_url,
            'vsnf': '1',
            'su': username,
            'service': 'miniblog',
            'servertime': self.servertime,
            'nonce': self.nonce,
            'pwencode': 'rsa2',
            'rsakv': self.rsakv,
            'sp': pwd,
            'sr': '2048*1152',
            'encoding': 'UTF-8',
            'prelt': 28,
            'url': 'https://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack',
            'returntype': 'META'
        }
        logining_page = self.session.post(login_url, data=data_dict, headers=self.session.headers2)
        # logining_page.encoding = 'GBK'
        # print(logining_page.content.decode('GBK')) # <title>新浪通行证</title>
        login_loop = logining_page.content.decode('GBK')
        pa = r'location\.replace\([\'"](.*?)[\'"]\)'
        loc = re.findall(pa, login_loop)

        ret=re.findall(r'retcode=(\d+)',loc[0])
        if len(ret)>0 and ret[0]!='0':
            print('无法对值进行解码!')
            print('重新验证网址:'+unquote_plus(re.findall(r'protection_url=([^&]+)', loc[0])[0]))
            return
        #目当可能进行不下去了,现在微博加了一个短信或扫码验证,好在验证了上面的都OK
        
        login_html = self.session.get(loc[0], headers=self.session.headers3)
        login_content = login_html.content.decode('GBK')  # "正在登录 ..."
        if '正在登录' in login_content or 'Signing in' in login_html:
            pa = r'location\.replace\([\'](.*?)[\']\)'
            print('正在登录')
            cross_loc = re.findall(pa, login_content)
            # print(loc1)
            cross_html = self.session.get(cross_loc[0],  headers=self.session.headers3)
            cross_data = cross_html.content.decode('GBK')
            pa = r'parent.sinaSSOController\.feedBackUrlCallBack\((.*?)\)'
            feedback_data = json.loads(re.findall(pa, cross_data)[0])
            print(feedback_data)
            if feedback_data['result']:
                print("return result True")
                uniqueid = feedback_data['userinfo']['uniqueid']
                # print(uniqueid)
                main_html = self.session.get('https://weibo.com/u/{}/home'.format(uniqueid),
                                             verify=False).content.decode()
                soup = BeautifulSoup(main_html, 'lxml')
                main_title = soup.title.string
                print(main_title)  # 我的首页 微博-随时随地发现新鲜事
        else:
            print('用户登录失败')

    def get_username(self):
        """
        get base64 username
        返回必须是字符串
        :return:
        """
        username_quote = quote_plus(str(self.user))
        username_base64 = base64.b64encode(username_quote.encode('utf-8'))  # base64编码
        return username_base64.decode('utf-8')

    def get_pwd(self):
        """
         返回rsa加密的密码串
         返回必须是字符串
        :return:
        """
        rsa_publickey = int(self.pubkey, 16)  # 函数用于将一个字符串或数字转换为整型,把16进制字符转换为整型
        key = rsa.PublicKey(rsa_publickey, 65537)
        message = str(self.servertime) + '\t' + str(self.nonce) + '\n' + str(self.pwd)
        message = message.encode('utf-8')
        passwd = rsa.encrypt(message, key)
        passwd = b2a_hex(passwd).decode()  # 转换为16进制
        return passwd


if __name__ == '__main__':
    user_name = 'myuserid'  # 用自己的用户和密码
    pwd = 'mypassword'
    wo = Weibo_login(user_name, pwd)
    wo.get_server_data()
    wo.login()

只能得到如下结果:

 https://passport.weibo.com/protection/index?token=2YzlhHMfQAFDPgDZJ289RL-k7NgViHxkjCnByb3RlY3Rpb24.

您可能感兴趣的与本文相关的镜像

Python3.8

Python3.8

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值