B站API避坑指南：如何合法合规地获取用户数据（附Python代码示例）

最新推荐文章于 2026-03-06 05:25:56 发布

原创

最新推荐文章于 2026-03-06 05:25:56 发布 · 409 阅读

标签

#Bilibili #API #用户数据 #Python

B站API合规应用实战：从数据获取到风险规避的完整指南

最近在帮一个内容分析团队搭建数据监控系统时，我遇到了一个典型问题：他们需要定期追踪多个B站UP主的成长数据，但手动记录效率低下，而直接使用爬虫又担心触发平台限制。这让我重新审视了B站API的合规使用场景——实际上，只要遵循正确的技术路径和规范，开发者完全可以合法地获取所需数据，构建有价值的应用。

对于开发者而言，B站API提供了丰富的接口能力，从用户基础信息到视频互动数据，再到动态更新，几乎覆盖了内容生态的各个维度。但关键在于，如何在不违反平台规则的前提下，将这些接口能力转化为实际的应用价值。我见过不少项目因为忽视合规细节而中途夭折，也见证过那些精心设计的系统如何平稳运行数年。

1. 理解B站API的开放边界与合规框架

B站作为国内领先的内容社区，其API体系设计体现了平台对数据开放与用户隐私保护的平衡考量。与完全封闭或完全开放的极端情况不同，B站采用了一种分层授权的模式，不同接口对应不同的访问权限要求。

1.1 接口权限分类与适用场景

根据我的实践经验，B站API大致可以分为三类：

接口类型	典型示例	访问要求	数据范围	适用场景
公开接口	`/x/relation/stat`	无需认证	基础统计数据	粉丝数、关注数等公开数据监控
用户相关接口	`/x/space/acc/info`	可能需要Cookie	用户基础信息	UP主资料展示、用户画像分析
互动接口	`/x/web-interface/archive/like`	必须登录且授权	用户行为数据	点赞、投币等互动操作

注意：即使是公开接口，也应当遵守合理的请求频率限制。B站虽然没有公开具体的限流阈值，但根据社区反馈，单IP每秒请求数超过5次就可能触发临时限制。

1.2 合规获取用户数据的核心原则

在开始技术实现之前，有几个基本原则需要牢记：

目的正当性：数据获取应当服务于明确的合法目的，如数据分析、内容推荐优化、学术研究等
最小必要原则：只获取实现目的所必需的最小数据量，避免过度采集
用户知情同意：如果涉及非公开数据，应确保获得用户的明确授权
数据安全存储：妥善保管获取的数据，防止泄露或滥用

我曾在项目中遇到过这样的场景：客户希望批量获取UP主的联系方式用于商务合作。这明显超出了合规边界，我最终建议他们通过B站官方的创作者服务平台进行联系，既合规又高效。

2. 基础API接口实战：从请求到解析

让我们从最基础的公开接口开始，逐步构建一个完整的API调用体系。这里我选择Python作为示例语言，因为它有丰富的HTTP请求和JSON处理库。

2.1 环境准备与依赖安装

首先创建一个干净的Python环境，安装必要的依赖包：

# 创建虚拟环境（可选但推荐）
python -m venv bilibili-api-env
source bilibili-api-env/bin/activate  # Linux/Mac
# 或 bilibili-api-env\Scripts\activate  # Windows

# 安装核心依赖
pip install requests>=2.28.0
pip install aiohttp>=3.8.0  # 异步请求支持
pip install pandas>=1.5.0   # 数据处理
pip install python-dotenv>=0.21.0  # 环境变量管理

我习惯将API配置信息存储在环境变量中，这样既安全又便于不同环境切换。创建一个.env文件：

# .env 配置文件
BILIBILI_API_BASE=https://api.bilibili.com
REQUEST_TIMEOUT=10
MAX_RETRIES=3
USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

2.2 构建稳健的API请求客户端

直接使用requests.get()虽然简单，但在生产环境中缺乏健壮性。下面是我在多个项目中验证过的客户端实现：

import requests
import time
import logging
from typing import Optional, Dict, Any
from dataclasses import dataclass
from tenacity import retry, stop_after_attempt, wait_exponential

# 配置日志
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

@dataclass
class APIConfig:
    """API配置数据类"""
    base_url: str = "https://api.bilibili.com"
    timeout: int = 10
    max_retries: int = 3
    user_agent: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    default_headers: Dict[str, str] = None
    
    def __post_init__(self):
        if self.default_headers is None:
            self.default_headers = {
                "User-Agent": self.user_agent,
                "Accept": "application/json",
                "Accept-Encoding": "gzip, deflate",
                "Connection": "keep-alive"
            }

class BilibiliAPIClient:
    """B站API客户端"""
    
    def __init__(self, config: Optional[APIConfig] = None):
        self.config = config or APIConfig()
        self.session = requests.Session()
        self.session.headers.update(self.config.default_headers)
        
        # 请求统计
        self.request_count = 0
        self.last_request_time = 0
        
    def _rate_limit(self):
        """简单的速率限制，避免请求过快"""
        current_time = time.time()
        time_since_last = current_time - self.last_request_time
        
        # 确保至少间隔200ms
        if time_since_last < 0.2:
            time.sleep(0.2 - time_since_last)
        
        self.last_request_time = time.time()
    
    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10)
    )
    def request(self, endpoint: str, params: Optional[Dict] = None, 
                method: str = "GET", **kwargs) -> Dict[str, Any]:
        """发送API请求"""
        self._rate_limit()
        
        url = f"{self.config.base_url}{endpoint}"
        headers = kwargs.pop("headers", {})
        
        logger.debug(f"请求 {method} {url}，参数: {params}")
        
        try:
            if method.upper() == "GET":
                response = self.session.get(
                    url, params=params, 
                    timeout=self.config.timeout, **kwargs
                )
            elif method.upper() == "POST":
                response = self.session.post(
                    url, json=params, 
                    timeout=self.config.timeout, **kwargs
                )
            else:
                raise ValueError(f"不支持的HTTP方法: {method}")
            
            response.raise_for_status()
            self.request_count += 1
            
            result = response.json()
            
            # 检查API返回码
            if result.get("code") != 0:
                logger.warning(f"API返回非零码: {result.get('code')}, 消息: {result.get('message')}")
            
            return result
            
        except requests.exceptions.RequestException as e:
            logger.error(f"请求失败: {str(e)}")
            raise
        except ValueError as e:
            logger.error(f"JSON解析失败: {str(e)}")
            raise
    
    def get_user_stats(self, mid: int) -> Optional[Dict]:
        """获取用户统计信息"""
        endpoint = "/x/relation/stat"
        params = {"vmid": mid}
        
        try:
            result = self.request(endpoint, params=params)
            return result.get("data")
        except Exception as e:
            logger.error(f"获取用户{mid}统计信息失败: {e}")
            return None
    
    def get_up_stat(self, mid: int) -> Optional[Dict]:
        """获取UP主作品统计"""
        endpoint = "/x/space/upstat"
        params = {"mid": mid}
        
        try:
            result = self.request(endpoint, params=params)
            return result.get("data")
        except Exception as e:
            logger.error(f"获取UP主{mid}作品统计失败: {e}")
            return None

# 使用示例
if __name__ == "__main__":
    client = BilibiliAPIClient()
    
    # 获取用户数据
    user_data = client.get_user_stats(35199034)
    if user_data:
        print(f"粉丝数: {user_data.get('follower')}")
        print(f"关注数: {user_data.get('following')}")

这个客户端实现了几个关键特性：

自动重试机制：使用tenacity库实现指数退避重试
速率限制：避免触发API限流
错误处理：详细的日志记录和异常处理
会话复用：提高连接效率

2.3 异步请求优化

对于需要批量获取数据的场景，同步请求效率较低。下面是异步版本的实现：

import aiohttp
import asyncio
from typing import List
import async_timeout

class AsyncBilibiliAPIClient:
    """异步B站API客户端"""
    
    def __init__(self, config: Optional[APIConfig] = None):
        self.config = config or APIConfig()
        self.semaphore = asyncio.Semaphore(5)  # 并发限制
    
    async def fetch_user_stats(self, session: aiohttp.ClientSession, 
                              mid: int) -> Optional[Dict]:
        """异步获取用户统计"""
        endpoint = "/x/relation/stat"
        url = f"{self.config.base_url}{endpoint}"
        params = {"vmid": mid}
        
        async with self.semaphore:
            try:
                async with async_timeout.timeout(self.config.timeout):
                    async with session.get(url, params=params) as response:
                        response.raise_for_status()
                        result = await response.json()
                        return result.get("data")
            except Exception as e:
                logger.error(f"异步获取用户{mid}数据失败: {e}")
                return None
    
    async def batch_fetch_users(self, mids: List[int]) -> List[Optional[Dict]]:
        """批量获取用户数据"""
        connector = aiohttp.TCPConnector(limit=10)  # 连接池限制
        headers = self.config.default_headers.copy()
        
        async with aiohttp.ClientSession(
            connector=connector, 
            headers=headers
        ) as session:
            tasks = [self.fetch_user_stats(session, mid) for mid in mids]
            results = await asyncio.gather(*tasks, return_exceptions=True)
            
            # 处理异常结果
            processed_results = []
            for result in results:
                if isinstance(result, Exception):
                    logger.error(f"任务执行异常: {result}")
                    processed_results.append(None)
                else:
                    processed_results.append(result)
            
            return processed_results

# 使用示例
async def main():
    client = AsyncBilibiliAPIClient()
    mids = [35199034, 546

最低0.47元/天解锁文章