亚马逊Listing流量分析系统实战:从数据采集到异常检测的完整实现

Python3.8

Python3.8

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

前言

作为一名从事电商数据分析多年的开发者,我经常遇到这样的场景:卖家的listing出单量突然翻倍,但完全不知道流量从哪来。亚马逊官方后台只提供笼统的Sessions数据,无法精准归因到具体的流量渠道。

本文将从技术角度,详细讲解如何构建一个完整的亚马逊listing流量分析系统,包括数据采集、存储、分析和异常检测的全流程实现。

技术栈

  • Python 3.8+
  • Requests (HTTP请求)
  • Pandas (数据处理)
  • PostgreSQL (数据存储)
  • APScheduler (定时任务)
  • Matplotlib/Plotly (数据可视化)

源码地址:文末提供完整代码示例

在这里插入图片描述

一、需求分析与系统架构

1.1 业务需求

亚马逊卖家需要解决以下核心问题:

  1. 流量来源归因:区分自然搜索、付费广告、站外引流的流量占比
  2. 关键词排名监控:追踪核心关键词的实时排名变化
  3. 竞品动态监控:监控竞品的价格、库存、排名波动
  4. 异常检测告警:流量突增/突降时自动告警

1.2 系统架构设计

┌─────────────────────────────────────────────────────────┐
│                   数据采集层 (Data Collection)           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ 搜索结果采集  │  │ 产品详情采集  │  │ 榜单数据采集  │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
│           Pangolinfo Scrape API / Amazon SP-API         │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│                   数据存储层 (Data Storage)              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ PostgreSQL   │  │ Redis Cache  │  │ Time Series  │  │
│  │ (结构化数据)  │  │ (实时缓存)    │  │ DB (时序数据) │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│                   数据分析层 (Data Analysis)             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ 排名变化分析  │  │ 流量归因算法  │  │ 异常检测算法  │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└────────────────────────┬────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────┐
│                   可视化层 (Visualization)               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │
│  │ Web Dashboard│  │ 报表生成      │  │ 告警通知      │  │
│  │ (Flask/Django)│  │ (PDF/Excel)  │  │ (Email/Slack) │  │
│  └──────────────┘  └──────────────┘  └──────────────┘  │
└─────────────────────────────────────────────────────────┘

二、数据采集模块实现

2.1 核心数据采集类

import requests
import json
from datetime import datetime
from typing import List, Dict, Optional
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class AmazonDataCollector:
    """
    亚马逊数据采集器
    使用Pangolinfo Scrape API进行数据采集
    """
    
    def __init__(self, api_key: str, base_url: str = "https://api.pangolinfo.com/v1"):
        self.api_key = api_key
        self.base_url = base_url
        self.session = requests.Session()
        self.session.headers.update({
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        })
    
    def search_products(self, 
                       keyword: str, 
                       marketplace: str = "US",
                       page: int = 1) -> Dict:
        """
        采集搜索结果页数据
        
        Args:
            keyword: 搜索关键词
            marketplace: 市场代码 (US, UK, DE等)
            page: 页码
            
        Returns:
            包含搜索结果的字典
        """
        endpoint = f"{self.base_url}/scrape/amazon/search"
        
        payload = {
            "keyword": keyword,
            "marketplace": marketplace,
            "page": page,
            "include_sponsored": True,  # 包含广告位
            "include_organic": True     # 包含自然结果
        }
        
        try:
            response = self.session.post(endpoint, json=payload, timeout=30)
            response.raise_for_status()
            
            data = response.json()
            logger.info(f"成功采集关键词 '{keyword}' 的搜索结果,共 {len(data.get('results', []))} 条")
            
            return data
            
        except requests.exceptions.RequestException as e:
            logger.error(f"采集搜索结果失败: {e}")
            return {}
    
    def get_product_details(self, asin: str, marketplace: str = "US") -> Dict:
        """
        采集产品详情页数据
        
        Args:
            asin: 产品ASIN
            marketplace: 市场代码
            
        Returns:
            产品详情字典
        """
        endpoint = f"{self.base_url}/scrape/amazon/product"
        
        payload = {
            "asin": asin,
            "marketplace": marketplace,
            "include_reviews": False,  # 不包含评论(评论单独采集)
            "include_qa": False
        }
        
        try:
            response = self.session.post(endpoint, json=payload, timeout=30)
            response.raise_for_status()
            
            data = response.json()
            logger.info(f"成功采集产品 {asin} 的详情数据")
            
            return data
            
        except requests.exceptions.RequestException as e:
            logger.error(f"采集产品详情失败: {e}")
            return {}
    
    def get_bestsellers_rank(self, 
                            category: str, 
                            marketplace: str = "US") -> List[Dict]:
        """
        采集Best Sellers榜单数据
        
        Args:
            category: 类目ID或名称
            marketplace: 市场代码
            
        Returns:
            榜单产品列表
        """
        endpoint = f"{self.base_url}/scrape/amazon/bestsellers"
        
        payload = {
            "category": category,
            "marketplace": marketplace
        }
        
        try:
            response = self.session.post(endpoint, json=payload, timeout=30)
            response.raise_for_status()
            
            data = response.json()
            logger.info(f"成功采集类目 '{category}' 的榜单数据,共 {len(data.get('products', []))} 个产品")
            
            return data.get('products', [])
            
        except requests.exceptions.RequestException as e:
            logger.error(f"采集榜单数据失败: {e}")
            return []

2.2 关键词排名追踪

class KeywordRankTracker:
    """
    关键词排名追踪器
    """
    
    def __init__(self, collector: AmazonDataCollector):
        self.collector = collector
    
    def track_keyword_ranking(self, 
                             target_asin: str,
                             keywords: List[str],
                             marketplace: str = "US") -> List[Dict]:
        """
        追踪目标ASIN在多个关键词下的排名
        
        Args:
            target_asin: 目标产品ASIN
            keywords: 关键词列表
            marketplace: 市场代码
            
        Returns:
            排名数据列表
        """
        ranking_data = []
        
        for keyword in keywords:
            # 采集搜索结果
            search_results = self.collector.search_products(
                keyword=keyword,
                marketplace=marketplace
            )
            
            # 查找目标ASIN的排名
            organic_rank = self._find_product_rank(
                search_results.get('organic_results', []),
                target_asin
            )
            
            sponsored_rank = self._find_product_rank(
                search_results.get('sponsored_results', []),
                target_asin
            )
            
            ranking_data.append({
                'keyword': keyword,
                'asin': target_asin,
                'organic_rank': organic_rank,
                'sponsored_rank': sponsored_rank,
                'timestamp': datetime.now().isoformat(),
                'marketplace': marketplace
            })
            
            logger.info(f"关键词 '{keyword}': 自然排名={organic_rank}, 广告排名={sponsored_rank}")
        
        return ranking_data
    
    def _find_product_rank(self, results: List[Dict], target_asin: str) -> Optional[int]:
        """
        在搜索结果中查找产品排名
        
        Args:
            results: 搜索结果列表
            target_asin: 目标ASIN
            
        Returns:
            排名位置(从1开始),未找到返回None
        """
        for idx, product in enumerate(results):
            if product.get('asin') == target_asin:
                return idx + 1
        return None

2.3 竞品监控

class CompetitorMonitor:
    """
    竞品监控器
    """
    
    def __init__(self, collector: AmazonDataCollector):
        self.collector = collector
    
    def monitor_competitors(self, 
                           competitor_asins: List[str],
                           marketplace: str = "US") -> List[Dict]:
        """
        监控竞品的关键指标
        
        Args:
            competitor_asins: 竞品ASIN列表
            marketplace: 市场代码
            
        Returns:
            竞品数据列表
        """
        competitor_data = []
        
        for asin in competitor_asins:
            product_details = self.collector.get_product_details(
                asin=asin,
                marketplace=marketplace
            )
            
            if product_details:
                competitor_data.append({
                    'asin': asin,
                    'title': product_details.get('title'),
                    'price': product_details.get('price'),
                    'currency': product_details.get('currency'),
                    'rating': product_details.get('rating'),
                    'review_count': product_details.get('review_count'),
                    'in_stock': product_details.get('availability', {}).get('in_stock'),
                    'bsr': product_details.get('best_sellers_rank'),
                    'timestamp': datetime.now().isoformat(),
                    'marketplace': marketplace
                })
                
                logger.info(f"竞品 {asin}: 价格={product_details.get('price')}, 库存={product_details.get('availability', {}).get('in_stock')}")
        
        return competitor_data

三、数据存储模块

3.1 数据库设计

-- 关键词排名历史表
CREATE TABLE keyword_rankings (
    id SERIAL PRIMARY KEY,
    asin VARCHAR(20) NOT NULL,
    keyword VARCHAR(255) NOT NULL,
    organic_rank INTEGER,
    sponsored_rank INTEGER,
    marketplace VARCHAR(10) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_asin_keyword (asin, keyword),
    INDEX idx_created_at (created_at)
);

-- 竞品监控历史表
CREATE TABLE competitor_data (
    id SERIAL PRIMARY KEY,
    asin VARCHAR(20) NOT NULL,
    title TEXT,
    price DECIMAL(10, 2),
    currency VARCHAR(10),
    rating DECIMAL(3, 2),
    review_count INTEGER,
    in_stock BOOLEAN,
    bsr INTEGER,
    marketplace VARCHAR(10) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_asin (asin),
    INDEX idx_created_at (created_at)
);

-- 流量异常告警表
CREATE TABLE traffic_alerts (
    id SERIAL PRIMARY KEY,
    asin VARCHAR(20) NOT NULL,
    alert_type VARCHAR(50) NOT NULL,
    severity VARCHAR(20) NOT NULL,
    message TEXT,
    details JSONB,
    is_resolved BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    resolved_at TIMESTAMP,
    INDEX idx_asin (asin),
    INDEX idx_alert_type (alert_type),
    INDEX idx_is_resolved (is_resolved)
);

3.2 数据持久化

import psycopg2
from psycopg2.extras import execute_batch
from typing import List, Dict

class DataStorage:
    """
    数据存储管理器
    """
    
    def __init__(self, db_config: Dict):
        self.conn = psycopg2.connect(**db_config)
        self.cursor = self.conn.cursor()
    
    def save_keyword_rankings(self, rankings: List[Dict]) -> int:
        """
        批量保存关键词排名数据
        
        Args:
            rankings: 排名数据列表
            
        Returns:
            插入的记录数
        """
        sql = """
            INSERT INTO keyword_rankings 
            (asin, keyword, organic_rank, sponsored_rank, marketplace, created_at)
            VALUES (%(asin)s, %(keyword)s, %(organic_rank)s, %(sponsored_rank)s, 
                    %(marketplace)s, %(timestamp)s)
        """
        
        try:
            execute_batch(self.cursor, sql, rankings)
            self.conn.commit()
            logger.info(f"成功保存 {len(rankings)} 条排名数据")
            return len(rankings)
        except Exception as e:
            self.conn.rollback()
            logger.error(f"保存排名数据失败: {e}")
            return 0
    
    def save_competitor_data(self, competitors: List[Dict]) -> int:
        """
        批量保存竞品数据
        
        Args:
            competitors: 竞品数据列表
            
        Returns:
            插入的记录数
        """
        sql = """
            INSERT INTO competitor_data 
            (asin, title, price, currency, rating, review_count, 
             in_stock, bsr, marketplace, created_at)
            VALUES (%(asin)s, %(title)s, %(price)s, %(currency)s, %(rating)s,
                    %(review_count)s, %(in_stock)s, %(bsr)s, %(marketplace)s, %(timestamp)s)
        """
        
        try:
            execute_batch(self.cursor, sql, competitors)
            self.conn.commit()
            logger.info(f"成功保存 {len(competitors)} 条竞品数据")
            return len(competitors)
        except Exception as e:
            self.conn.rollback()
            logger.error(f"保存竞品数据失败: {e}")
            return 0
    
    def get_historical_rankings(self, 
                               asin: str, 
                               keyword: str, 
                               days: int = 30) -> List[Dict]:
        """
        获取历史排名数据
        
        Args:
            asin: 产品ASIN
            keyword: 关键词
            days: 查询天数
            
        Returns:
            历史排名列表
        """
        sql = """
            SELECT keyword, organic_rank, sponsored_rank, created_at
            FROM keyword_rankings
            WHERE asin = %s AND keyword = %s 
                  AND created_at >= NOW() - INTERVAL '%s days'
            ORDER BY created_at ASC
        """
        
        self.cursor.execute(sql, (asin, keyword, days))
        results = self.cursor.fetchall()
        
        return [
            {
                'keyword': row[0],
                'organic_rank': row[1],
                'sponsored_rank': row[2],
                'timestamp': row[3]
            }
            for row in results
        ]
    
    def close(self):
        """关闭数据库连接"""
        self.cursor.close()
        self.conn.close()

四、流量归因分析算法

4.1 流量来源归因

import pandas as pd
import numpy as np
from typing import Dict, List, Tuple

class TrafficAttributionAnalyzer:
    """
    流量归因分析器
    """
    
    def __init__(self, storage: DataStorage):
        self.storage = storage
    
    def analyze_traffic_source(self,
                               asin: str,
                               sales_change: float,
                               date_range: Tuple[str, str]) -> Dict:
        """
        分析流量来源变化
        
        Args:
            asin: 产品ASIN
            sales_change: 销量变化百分比
            date_range: 分析日期范围 (start_date, end_date)
            
        Returns:
            归因分析结果
        """
        # 1. 获取排名变化数据
        ranking_changes = self._analyze_ranking_changes(asin, date_range)
        
        # 2. 获取竞品变化数据
        competitor_changes = self._analyze_competitor_changes(asin, date_range)
        
        # 3. 计算各渠道贡献度
        attribution = self._calculate_attribution(
            sales_change,
            ranking_changes,
            competitor_changes
        )
        
        return attribution
    
    def _analyze_ranking_changes(self, 
                                 asin: str, 
                                 date_range: Tuple[str, str]) -> Dict:
        """
        分析排名变化
        """
        sql = """
            WITH ranked_data AS (
                SELECT 
                    keyword,
                    organic_rank,
                    created_at,
                    LAG(organic_rank) OVER (PARTITION BY keyword ORDER BY created_at) as prev_rank
                FROM keyword_rankings
                WHERE asin = %s 
                      AND created_at BETWEEN %s AND %s
            )
            SELECT 
                keyword,
                AVG(organic_rank) as avg_rank,
                AVG(prev_rank) as prev_avg_rank,
                COUNT(*) as data_points
            FROM ranked_data
            WHERE prev_rank IS NOT NULL
            GROUP BY keyword
        """
        
        self.storage.cursor.execute(sql, (asin, date_range[0], date_range[1]))
        results = self.storage.cursor.fetchall()
        
        ranking_changes = {}
        for row in results:
            keyword, avg_rank, prev_avg_rank, data_points = row
            if prev_avg_rank and avg_rank:
                change = prev_avg_rank - avg_rank  # 正值表示排名提升
                ranking_changes[keyword] = {
                    'current_rank': avg_rank,
                    'previous_rank': prev_avg_rank,
                    'change': change,
                    'improvement_pct': (change / prev_avg_rank * 100) if prev_avg_rank > 0 else 0
                }
        
        return ranking_changes
    
    def _analyze_competitor_changes(self, 
                                    asin: str, 
                                    date_range: Tuple[str, str]) -> List[Dict]:
        """
        分析竞品变化
        """
        sql = """
            SELECT 
                asin,
                AVG(CASE WHEN in_stock THEN 1 ELSE 0 END) as stock_rate,
                AVG(price) as avg_price,
                MIN(price) as min_price,
                MAX(price) as max_price
            FROM competitor_data
            WHERE created_at BETWEEN %s AND %s
            GROUP BY asin
        """
        
        self.storage.cursor.execute(sql, (date_range[0], date_range[1]))
        results = self.storage.cursor.fetchall()
        
        competitor_changes = []
        for row in results:
            comp_asin, stock_rate, avg_price, min_price, max_price = row
            
            # 检测断货
            if stock_rate < 0.5:  # 50%以上时间断货
                competitor_changes.append({
                    'asin': comp_asin,
                    'type': 'stockout',
                    'severity': 'high' if stock_rate < 0.2 else 'medium'
                })
            
            # 检测价格大幅波动
            if avg_price and max_price and min_price:
                price_volatility = (max_price - min_price) / avg_price
                if price_volatility > 0.2:  # 价格波动超过20%
                    competitor_changes.append({
                        'asin': comp_asin,
                        'type': 'price_change',
                        'volatility': price_volatility,
                        'severity': 'high' if price_volatility > 0.3 else 'medium'
                    })
        
        return competitor_changes
    
    def _calculate_attribution(self,
                               sales_change: float,
                               ranking_changes: Dict,
                               competitor_changes: List[Dict]) -> Dict:
        """
        计算流量归因
        """
        attribution = {
            'primary_source': None,
            'confidence': 0.0,
            'contributing_factors': [],
            'recommendations': []
        }
        
        # 评分系统
        scores = {
            'organic_ranking': 0,
            'competitor_impact': 0,
            'paid_advertising': 0,
            'external_traffic': 0
        }
        
        # 1. 评估排名变化的影响
        significant_ranking_improvements = [
            k for k, v in ranking_changes.items() 
            if v['change'] > 5  # 排名提升超过5位
        ]
        
        if significant_ranking_improvements:
            # 排名提升越多,有机流量贡献越大
            avg_improvement = np.mean([
                ranking_changes[k]['change'] 
                for k in significant_ranking_improvements
            ])
            scores['organic_ranking'] = min(avg_improvement * 10, 100)
            
            attribution['contributing_factors'].append({
                'factor': 'organic_ranking_improvement',
                'keywords': significant_ranking_improvements,
                'impact_score': scores['organic_ranking']
            })
        
        # 2. 评估竞品影响
        high_severity_competitor_issues = [
            c for c in competitor_changes 
            if c.get('severity') == 'high'
        ]
        
        if high_severity_competitor_issues:
            scores['competitor_impact'] = len(high_severity_competitor_issues) * 20
            
            attribution['contributing_factors'].append({
                'factor': 'competitor_issues',
                'details': high_severity_competitor_issues,
                'impact_score': scores['competitor_impact']
            })
        
        # 3. 确定主要来源
        max_score_source = max(scores, key=scores.get)
        attribution['primary_source'] = max_score_source
        attribution['confidence'] = min(scores[max_score_source] / 100, 0.95)
        
        # 4. 生成建议
        if max_score_source == 'organic_ranking':
            attribution['recommendations'].append(
                "排名提升是主要流量来源,建议增加广告预算巩固排名位置"
            )
        elif max_score_source == 'competitor_impact':
            attribution['recommendations'].append(
                "竞品出现问题导致流量转移,建议加快补货并优化listing以抓住机会"
            )
        
        return attribution

五、异常检测与告警

5.1 异常检测算法

from scipy import stats
from typing import Optional

class AnomalyDetector:
    """
    异常检测器
    使用统计方法检测流量异常
    """
    
    def __init__(self, storage: DataStorage, threshold: float = 2.0):
        """
        Args:
            storage: 数据存储对象
            threshold: Z-score阈值(默认2.0,即2个标准差)
        """
        self.storage = storage
        self.threshold = threshold
    
    def detect_ranking_anomaly(self, 
                               asin: str, 
                               keyword: str,
                               current_rank: int,
                               lookback_days: int = 30) -> Optional[Dict]:
        """
        检测排名异常
        
        Args:
            asin: 产品ASIN
            keyword: 关键词
            current_rank: 当前排名
            lookback_days: 回溯天数
            
        Returns:
            异常信息字典,无异常返回None
        """
        # 获取历史排名数据
        historical_data = self.storage.get_historical_rankings(
            asin, keyword, lookback_days
        )
        
        if len(historical_data) < 7:  # 数据不足
            return None
        
        # 提取排名值
        ranks = [d['organic_rank'] for d in historical_data if d['organic_rank']]
        
        if not ranks:
            return None
        
        # 计算统计指标
        mean_rank = np.mean(ranks)
        std_rank = np.std(ranks)
        
        if std_rank == 0:  # 排名完全稳定
            return None
        
        # 计算Z-score
        z_score = abs((current_rank - mean_rank) / std_rank)
        
        if z_score > self.threshold:
            # 检测到异常
            anomaly_type = 'ranking_drop' if current_rank > mean_rank else 'ranking_improvement'
            
            return {
                'type': anomaly_type,
                'keyword': keyword,
                'current_rank': current_rank,
                'historical_mean': mean_rank,
                'z_score': z_score,
                'severity': 'high' if z_score > 3.0 else 'medium',
                'message': f"关键词 '{keyword}' 排名异常: 当前{current_rank}位,历史平均{mean_rank:.1f}位"
            }
        
        return None
    
    def detect_competitor_anomaly(self, 
                                  competitor_asin: str,
                                  current_data: Dict,
                                  lookback_days: int = 30) -> List[Dict]:
        """
        检测竞品异常
        
        Args:
            competitor_asin: 竞品ASIN
            current_data: 当前竞品数据
            lookback_days: 回溯天数
            
        Returns:
            异常列表
        """
        anomalies = []
        
        # 检测断货
        if not current_data.get('in_stock'):
            anomalies.append({
                'type': 'competitor_stockout',
                'asin': competitor_asin,
                'severity': 'high',
                'message': f"竞品 {competitor_asin} 断货"
            })
        
        # 检测价格异常
        sql = """
            SELECT AVG(price) as avg_price, STDDEV(price) as std_price
            FROM competitor_data
            WHERE asin = %s 
                  AND created_at >= NOW() - INTERVAL '%s days'
                  AND price IS NOT NULL
        """
        
        self.storage.cursor.execute(sql, (competitor_asin, lookback_days))
        result = self.storage.cursor.fetchone()
        
        if result and result[0] and result[1]:
            avg_price, std_price = result
            current_price = current_data.get('price')
            
            if current_price and std_price > 0:
                price_z_score = abs((current_price - avg_price) / std_price)
                
                if price_z_score > self.threshold:
                    anomaly_type = 'price_drop' if current_price < avg_price else 'price_increase'
                    
                    anomalies.append({
                        'type': anomaly_type,
                        'asin': competitor_asin,
                        'current_price': current_price,
                        'historical_avg': avg_price,
                        'z_score': price_z_score,
                        'severity': 'high' if price_z_score > 3.0 else 'medium',
                        'message': f"竞品 {competitor_asin} 价格异常: 当前${current_price:.2f},历史平均${avg_price:.2f}"
                    })
        
        return anomalies

5.2 告警通知

import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import requests

class AlertNotifier:
    """
    告警通知器
    支持Email和Slack通知
    """
    
    def __init__(self, email_config: Dict = None, slack_webhook: str = None):
        self.email_config = email_config
        self.slack_webhook = slack_webhook
    
    def send_email_alert(self, subject: str, body: str, recipients: List[str]):
        """
        发送邮件告警
        """
        if not self.email_config:
            logger.warning("邮件配置未设置,跳过邮件发送")
            return
        
        msg = MIMEMultipart()
        msg['From'] = self.email_config['from']
        msg['To'] = ', '.join(recipients)
        msg['Subject'] = subject
        
        msg.attach(MIMEText(body, 'html'))
        
        try:
            with smtplib.SMTP(self.email_config['smtp_host'], self.email_config['smtp_port']) as server:
                server.starttls()
                server.login(self.email_config['username'], self.email_config['password'])
                server.send_message(msg)
            
            logger.info(f"成功发送邮件告警: {subject}")
        except Exception as e:
            logger.error(f"发送邮件失败: {e}")
    
    def send_slack_alert(self, message: str, severity: str = 'medium'):
        """
        发送Slack告警
        """
        if not self.slack_webhook:
            logger.warning("Slack Webhook未设置,跳过Slack通知")
            return
        
        # 根据严重程度设置颜色
        color_map = {
            'low': '#36a64f',      # 绿色
            'medium': '#ff9900',   # 橙色
            'high': '#ff0000'      # 红色
        }
        
        payload = {
            'attachments': [{
                'color': color_map.get(severity, '#808080'),
                'title': '亚马逊流量异常告警',
                'text': message,
                'footer': 'Amazon Traffic Monitor',
                'ts': int(datetime.now().timestamp())
            }]
        }
        
        try:
            response = requests.post(self.slack_webhook, json=payload)
            response.raise_for_status()
            logger.info("成功发送Slack告警")
        except Exception as e:
            logger.error(f"发送Slack告警失败: {e}")

六、定时任务调度

6.1 使用APScheduler实现定时采集

from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger

class MonitoringScheduler:
    """
    监控任务调度器
    """
    
    def __init__(self, 
                 collector: AmazonDataCollector,
                 storage: DataStorage,
                 detector: AnomalyDetector,
                 notifier: AlertNotifier):
        self.collector = collector
        self.storage = storage
        self.detector = detector
        self.notifier = notifier
        self.scheduler = BlockingScheduler()
    
    def setup_jobs(self, config: Dict):
        """
        设置定时任务
        
        Args:
            config: 配置字典,包含监控目标和调度时间
        """
        # 每天凌晨2点采集关键词排名
        self.scheduler.add_job(
            func=self.daily_ranking_job,
            trigger=CronTrigger(hour=2, minute=0),
            args=[config['target_asin'], config['keywords']],
            id='daily_ranking',
            name='每日关键词排名采集'
        )
        
        # 每6小时监控竞品
        self.scheduler.add_job(
            func=self.competitor_monitoring_job,
            trigger=CronTrigger(hour='*/6'),
            args=[config['competitor_asins']],
            id='competitor_monitoring',
            name='竞品监控'
        )
        
        # 每小时检测异常
        self.scheduler.add_job(
            func=self.anomaly_detection_job,
            trigger=CronTrigger(minute=0),
            args=[config['target_asin'], config['keywords']],
            id='anomaly_detection',
            name='异常检测'
        )
    
    def daily_ranking_job(self, target_asin: str, keywords: List[str]):
        """
        每日排名采集任务
        """
        logger.info("开始执行每日排名采集任务")
        
        tracker = KeywordRankTracker(self.collector)
        rankings = tracker.track_keyword_ranking(target_asin, keywords)
        
        # 保存到数据库
        self.storage.save_keyword_rankings(rankings)
        
        logger.info(f"每日排名采集完成,共采集 {len(rankings)} 条数据")
    
    def competitor_monitoring_job(self, competitor_asins: List[str]):
        """
        竞品监控任务
        """
        logger.info("开始执行竞品监控任务")
        
        monitor = CompetitorMonitor(self.collector)
        competitor_data = monitor.monitor_competitors(competitor_asins)
        
        # 保存到数据库
        self.storage.save_competitor_data(competitor_data)
        
        # 检测竞品异常
        for data in competitor_data:
            anomalies = self.detector.detect_competitor_anomaly(
                data['asin'], 
                data
            )
            
            for anomaly in anomalies:
                self._handle_anomaly(anomaly)
        
        logger.info(f"竞品监控完成,共监控 {len(competitor_data)} 个竞品")
    
    def anomaly_detection_job(self, target_asin: str, keywords: List[str]):
        """
        异常检测任务
        """
        logger.info("开始执行异常检测任务")
        
        # 获取最新排名数据
        tracker = KeywordRankTracker(self.collector)
        current_rankings = tracker.track_keyword_ranking(target_asin, keywords)
        
        # 检测排名异常
        for ranking in current_rankings:
            if ranking['organic_rank']:
                anomaly = self.detector.detect_ranking_anomaly(
                    target_asin,
                    ranking['keyword'],
                    ranking['organic_rank']
                )
                
                if anomaly:
                    self._handle_anomaly(anomaly)
        
        logger.info("异常检测完成")
    
    def _handle_anomaly(self, anomaly: Dict):
        """
        处理检测到的异常
        """
        # 记录到数据库
        sql = """
            INSERT INTO traffic_alerts 
            (asin, alert_type, severity, message, details)
            VALUES (%s, %s, %s, %s, %s)
        """
        
        self.storage.cursor.execute(sql, (
            anomaly.get('asin', 'unknown'),
            anomaly['type'],
            anomaly['severity'],
            anomaly['message'],
            json.dumps(anomaly)
        ))
        self.storage.conn.commit()
        
        # 发送告警通知
        if anomaly['severity'] == 'high':
            self.notifier.send_slack_alert(
                message=anomaly['message'],
                severity=anomaly['severity']
            )
    
    def start(self):
        """
        启动调度器
        """
        logger.info("监控调度器启动")
        self.scheduler.start()

七、数据可视化

7.1 使用Matplotlib生成趋势图

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime, timedelta

class DataVisualizer:
    """
    数据可视化器
    """
    
    def __init__(self, storage: DataStorage):
        self.storage = storage
        plt.rcParams['font.sans-serif'] = ['SimHei']  # 支持中文
        plt.rcParams['axes.unicode_minus'] = False
    
    def plot_ranking_trend(self, 
                          asin: str, 
                          keyword: str, 
                          days: int = 30,
                          save_path: str = None):
        """
        绘制排名趋势图
        """
        # 获取历史数据
        data = self.storage.get_historical_rankings(asin, keyword, days)
        
        if not data:
            logger.warning(f"没有找到 {keyword} 的历史数据")
            return
        
        # 准备数据
        dates = [d['timestamp'] for d in data]
        organic_ranks = [d['organic_rank'] if d['organic_rank'] else None for d in data]
        
        # 创建图表
        fig, ax = plt.subplots(figsize=(12, 6))
        
        ax.plot(dates, organic_ranks, marker='o', linestyle='-', linewidth=2, 
                markersize=4, label='自然排名')
        
        # 设置标题和标签
        ax.set_title(f'关键词 "{keyword}" 排名趋势 (ASIN: {asin})', fontsize=14, fontweight='bold')
        ax.set_xlabel('日期', fontsize=12)
        ax.set_ylabel('排名位置', fontsize=12)
        
        # 反转Y轴(排名越小越好)
        ax.invert_yaxis()
        
        # 格式化日期
        ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
        ax.xaxis.set_major_locator(mdates.DayLocator(interval=3))
        plt.xticks(rotation=45)
        
        # 添加网格
        ax.grid(True, alpha=0.3, linestyle='--')
        
        # 添加图例
        ax.legend(loc='best')
        
        plt.tight_layout()
        
        if save_path:
            plt.savefig(save_path, dpi=300, bbox_inches='tight')
            logger.info(f"排名趋势图已保存到 {save_path}")
        else:
            plt.show()
        
        plt.close()

八、完整使用示例

8.1 主程序

def main():
    """
    主程序入口
    """
    # 配置参数
    config = {
        'api_key': 'your_pangolinfo_api_key',
        'db_config': {
            'host': 'localhost',
            'port': 5432,
            'database': 'amazon_monitor',
            'user': 'postgres',
            'password': 'your_password'
        },
        'target_asin': 'B08XYZ1234',
        'keywords': [
            'silicone baking mat',
            'non-stick baking sheet',
            'reusable baking liner'
        ],
        'competitor_asins': [
            'B07ABC1234',
            'B09DEF5678',
            'B06GHI9012'
        ],
        'email_config': {
            'from': 'monitor@example.com',
            'smtp_host': 'smtp.gmail.com',
            'smtp_port': 587,
            'username': 'your_email@gmail.com',
            'password': 'your_app_password'
        },
        'slack_webhook': 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
    }
    
    # 初始化组件
    collector = AmazonDataCollector(config['api_key'])
    storage = DataStorage(config['db_config'])
    detector = AnomalyDetector(storage, threshold=2.0)
    notifier = AlertNotifier(
        email_config=config['email_config'],
        slack_webhook=config['slack_webhook']
    )
    
    # 设置调度器
    scheduler = MonitoringScheduler(collector, storage, detector, notifier)
    scheduler.setup_jobs(config)
    
    # 启动监控
    try:
        scheduler.start()
    except (KeyboardInterrupt, SystemExit):
        logger.info("监控系统已停止")
    finally:
        storage.close()

if __name__ == '__main__':
    main()

九、性能优化建议

9.1 数据采集优化

  1. 并发请求:使用asyncioaiohttp实现异步并发采集
  2. 请求限流:避免触发API速率限制
  3. 缓存机制:使用Redis缓存热点数据
  4. 增量更新:只采集变化的数据,减少API调用

9.2 数据库优化

  1. 索引优化:在常用查询字段上建立索引
  2. 分区表:按时间分区存储历史数据
  3. 定期归档:将旧数据归档到冷存储
  4. 连接池:使用数据库连接池提高性能

9.3 监控优化

  1. 采样策略:对于大量SKU,采用分批采样
  2. 优先级队列:重要产品优先监控
  3. 自适应频率:根据变化频率动态调整采集间隔

十、常见问题与解决方案

Q1: API调用频率限制怎么办?

A: Pangolinfo API支持较高的并发,但建议:

  • 使用请求队列控制并发数
  • 实现指数退避重试机制
  • 合理设置采集间隔

Q2: 数据库存储空间不足?

A: 实施数据生命周期管理:

  • 保留最近3个月的详细数据
  • 3-12个月的数据按天聚合
  • 12个月以上的数据归档或删除

Q3: 如何提高异常检测准确率?

A:

  • 增加历史数据积累时间(至少30天)
  • 调整Z-score阈值(根据业务特点)
  • 结合多个指标综合判断
  • 引入机器学习模型(如Isolation Forest)

十一、总结与展望

本文详细介绍了如何构建一个完整的亚马逊listing流量分析系统,涵盖了数据采集、存储、分析、异常检测和可视化的全流程实现。

核心要点

  1. 使用Pangolinfo Scrape API获取准确的实时数据
  2. 建立时间序列数据库存储历史数据
  3. 实现统计学方法进行异常检测
  4. 通过流量归因算法定位流量来源
  5. 自动化监控和及时告警

下一步优化方向

  • 引入机器学习模型提升预测准确性
  • 增加更多数据源(如Google Trends、社交媒体)
  • 开发Web Dashboard实现可视化管理
  • 集成A/B测试功能优化listing

作者简介:资深电商数据工程师,专注于亚马逊数据分析和自动化系统开发。

原创声明:本文为原创技术文章,转载请注明出处。

#亚马逊 #数据分析 #Python #API #流量监控 #电商技术

您可能感兴趣的与本文相关的镜像

Python3.8

Python3.8

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值