前言
作为一名从事电商数据分析多年的开发者,我经常遇到这样的场景:卖家的listing出单量突然翻倍,但完全不知道流量从哪来。亚马逊官方后台只提供笼统的Sessions数据,无法精准归因到具体的流量渠道。
本文将从技术角度,详细讲解如何构建一个完整的亚马逊listing流量分析系统,包括数据采集、存储、分析和异常检测的全流程实现。
技术栈:
- Python 3.8+
- Requests (HTTP请求)
- Pandas (数据处理)
- PostgreSQL (数据存储)
- APScheduler (定时任务)
- Matplotlib/Plotly (数据可视化)
源码地址:文末提供完整代码示例

一、需求分析与系统架构
1.1 业务需求
亚马逊卖家需要解决以下核心问题:
- 流量来源归因:区分自然搜索、付费广告、站外引流的流量占比
- 关键词排名监控:追踪核心关键词的实时排名变化
- 竞品动态监控:监控竞品的价格、库存、排名波动
- 异常检测告警:流量突增/突降时自动告警
1.2 系统架构设计
┌─────────────────────────────────────────────────────────┐
│ 数据采集层 (Data Collection) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ 搜索结果采集 │ │ 产品详情采集 │ │ 榜单数据采集 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ Pangolinfo Scrape API / Amazon SP-API │
└────────────────────────┬────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────┐
│ 数据存储层 (Data Storage) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ PostgreSQL │ │ Redis Cache │ │ Time Series │ │
│ │ (结构化数据) │ │ (实时缓存) │ │ DB (时序数据) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────┬────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────┐
│ 数据分析层 (Data Analysis) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ 排名变化分析 │ │ 流量归因算法 │ │ 异常检测算法 │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└────────────────────────┬────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────┐
│ 可视化层 (Visualization) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Web Dashboard│ │ 报表生成 │ │ 告警通知 │ │
│ │ (Flask/Django)│ │ (PDF/Excel) │ │ (Email/Slack) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
二、数据采集模块实现
2.1 核心数据采集类
import requests
import json
from datetime import datetime
from typing import List, Dict, Optional
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class AmazonDataCollector:
"""
亚马逊数据采集器
使用Pangolinfo Scrape API进行数据采集
"""
def __init__(self, api_key: str, base_url: str = "https://api.pangolinfo.com/v1"):
self.api_key = api_key
self.base_url = base_url
self.session = requests.Session()
self.session.headers.update({
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
})
def search_products(self,
keyword: str,
marketplace: str = "US",
page: int = 1) -> Dict:
"""
采集搜索结果页数据
Args:
keyword: 搜索关键词
marketplace: 市场代码 (US, UK, DE等)
page: 页码
Returns:
包含搜索结果的字典
"""
endpoint = f"{self.base_url}/scrape/amazon/search"
payload = {
"keyword": keyword,
"marketplace": marketplace,
"page": page,
"include_sponsored": True, # 包含广告位
"include_organic": True # 包含自然结果
}
try:
response = self.session.post(endpoint, json=payload, timeout=30)
response.raise_for_status()
data = response.json()
logger.info(f"成功采集关键词 '{keyword}' 的搜索结果,共 {len(data.get('results', []))} 条")
return data
except requests.exceptions.RequestException as e:
logger.error(f"采集搜索结果失败: {e}")
return {}
def get_product_details(self, asin: str, marketplace: str = "US") -> Dict:
"""
采集产品详情页数据
Args:
asin: 产品ASIN
marketplace: 市场代码
Returns:
产品详情字典
"""
endpoint = f"{self.base_url}/scrape/amazon/product"
payload = {
"asin": asin,
"marketplace": marketplace,
"include_reviews": False, # 不包含评论(评论单独采集)
"include_qa": False
}
try:
response = self.session.post(endpoint, json=payload, timeout=30)
response.raise_for_status()
data = response.json()
logger.info(f"成功采集产品 {asin} 的详情数据")
return data
except requests.exceptions.RequestException as e:
logger.error(f"采集产品详情失败: {e}")
return {}
def get_bestsellers_rank(self,
category: str,
marketplace: str = "US") -> List[Dict]:
"""
采集Best Sellers榜单数据
Args:
category: 类目ID或名称
marketplace: 市场代码
Returns:
榜单产品列表
"""
endpoint = f"{self.base_url}/scrape/amazon/bestsellers"
payload = {
"category": category,
"marketplace": marketplace
}
try:
response = self.session.post(endpoint, json=payload, timeout=30)
response.raise_for_status()
data = response.json()
logger.info(f"成功采集类目 '{category}' 的榜单数据,共 {len(data.get('products', []))} 个产品")
return data.get('products', [])
except requests.exceptions.RequestException as e:
logger.error(f"采集榜单数据失败: {e}")
return []
2.2 关键词排名追踪
class KeywordRankTracker:
"""
关键词排名追踪器
"""
def __init__(self, collector: AmazonDataCollector):
self.collector = collector
def track_keyword_ranking(self,
target_asin: str,
keywords: List[str],
marketplace: str = "US") -> List[Dict]:
"""
追踪目标ASIN在多个关键词下的排名
Args:
target_asin: 目标产品ASIN
keywords: 关键词列表
marketplace: 市场代码
Returns:
排名数据列表
"""
ranking_data = []
for keyword in keywords:
# 采集搜索结果
search_results = self.collector.search_products(
keyword=keyword,
marketplace=marketplace
)
# 查找目标ASIN的排名
organic_rank = self._find_product_rank(
search_results.get('organic_results', []),
target_asin
)
sponsored_rank = self._find_product_rank(
search_results.get('sponsored_results', []),
target_asin
)
ranking_data.append({
'keyword': keyword,
'asin': target_asin,
'organic_rank': organic_rank,
'sponsored_rank': sponsored_rank,
'timestamp': datetime.now().isoformat(),
'marketplace': marketplace
})
logger.info(f"关键词 '{keyword}': 自然排名={organic_rank}, 广告排名={sponsored_rank}")
return ranking_data
def _find_product_rank(self, results: List[Dict], target_asin: str) -> Optional[int]:
"""
在搜索结果中查找产品排名
Args:
results: 搜索结果列表
target_asin: 目标ASIN
Returns:
排名位置(从1开始),未找到返回None
"""
for idx, product in enumerate(results):
if product.get('asin') == target_asin:
return idx + 1
return None
2.3 竞品监控
class CompetitorMonitor:
"""
竞品监控器
"""
def __init__(self, collector: AmazonDataCollector):
self.collector = collector
def monitor_competitors(self,
competitor_asins: List[str],
marketplace: str = "US") -> List[Dict]:
"""
监控竞品的关键指标
Args:
competitor_asins: 竞品ASIN列表
marketplace: 市场代码
Returns:
竞品数据列表
"""
competitor_data = []
for asin in competitor_asins:
product_details = self.collector.get_product_details(
asin=asin,
marketplace=marketplace
)
if product_details:
competitor_data.append({
'asin': asin,
'title': product_details.get('title'),
'price': product_details.get('price'),
'currency': product_details.get('currency'),
'rating': product_details.get('rating'),
'review_count': product_details.get('review_count'),
'in_stock': product_details.get('availability', {}).get('in_stock'),
'bsr': product_details.get('best_sellers_rank'),
'timestamp': datetime.now().isoformat(),
'marketplace': marketplace
})
logger.info(f"竞品 {asin}: 价格={product_details.get('price')}, 库存={product_details.get('availability', {}).get('in_stock')}")
return competitor_data
三、数据存储模块
3.1 数据库设计
-- 关键词排名历史表
CREATE TABLE keyword_rankings (
id SERIAL PRIMARY KEY,
asin VARCHAR(20) NOT NULL,
keyword VARCHAR(255) NOT NULL,
organic_rank INTEGER,
sponsored_rank INTEGER,
marketplace VARCHAR(10) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_asin_keyword (asin, keyword),
INDEX idx_created_at (created_at)
);
-- 竞品监控历史表
CREATE TABLE competitor_data (
id SERIAL PRIMARY KEY,
asin VARCHAR(20) NOT NULL,
title TEXT,
price DECIMAL(10, 2),
currency VARCHAR(10),
rating DECIMAL(3, 2),
review_count INTEGER,
in_stock BOOLEAN,
bsr INTEGER,
marketplace VARCHAR(10) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_asin (asin),
INDEX idx_created_at (created_at)
);
-- 流量异常告警表
CREATE TABLE traffic_alerts (
id SERIAL PRIMARY KEY,
asin VARCHAR(20) NOT NULL,
alert_type VARCHAR(50) NOT NULL,
severity VARCHAR(20) NOT NULL,
message TEXT,
details JSONB,
is_resolved BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
resolved_at TIMESTAMP,
INDEX idx_asin (asin),
INDEX idx_alert_type (alert_type),
INDEX idx_is_resolved (is_resolved)
);
3.2 数据持久化
import psycopg2
from psycopg2.extras import execute_batch
from typing import List, Dict
class DataStorage:
"""
数据存储管理器
"""
def __init__(self, db_config: Dict):
self.conn = psycopg2.connect(**db_config)
self.cursor = self.conn.cursor()
def save_keyword_rankings(self, rankings: List[Dict]) -> int:
"""
批量保存关键词排名数据
Args:
rankings: 排名数据列表
Returns:
插入的记录数
"""
sql = """
INSERT INTO keyword_rankings
(asin, keyword, organic_rank, sponsored_rank, marketplace, created_at)
VALUES (%(asin)s, %(keyword)s, %(organic_rank)s, %(sponsored_rank)s,
%(marketplace)s, %(timestamp)s)
"""
try:
execute_batch(self.cursor, sql, rankings)
self.conn.commit()
logger.info(f"成功保存 {len(rankings)} 条排名数据")
return len(rankings)
except Exception as e:
self.conn.rollback()
logger.error(f"保存排名数据失败: {e}")
return 0
def save_competitor_data(self, competitors: List[Dict]) -> int:
"""
批量保存竞品数据
Args:
competitors: 竞品数据列表
Returns:
插入的记录数
"""
sql = """
INSERT INTO competitor_data
(asin, title, price, currency, rating, review_count,
in_stock, bsr, marketplace, created_at)
VALUES (%(asin)s, %(title)s, %(price)s, %(currency)s, %(rating)s,
%(review_count)s, %(in_stock)s, %(bsr)s, %(marketplace)s, %(timestamp)s)
"""
try:
execute_batch(self.cursor, sql, competitors)
self.conn.commit()
logger.info(f"成功保存 {len(competitors)} 条竞品数据")
return len(competitors)
except Exception as e:
self.conn.rollback()
logger.error(f"保存竞品数据失败: {e}")
return 0
def get_historical_rankings(self,
asin: str,
keyword: str,
days: int = 30) -> List[Dict]:
"""
获取历史排名数据
Args:
asin: 产品ASIN
keyword: 关键词
days: 查询天数
Returns:
历史排名列表
"""
sql = """
SELECT keyword, organic_rank, sponsored_rank, created_at
FROM keyword_rankings
WHERE asin = %s AND keyword = %s
AND created_at >= NOW() - INTERVAL '%s days'
ORDER BY created_at ASC
"""
self.cursor.execute(sql, (asin, keyword, days))
results = self.cursor.fetchall()
return [
{
'keyword': row[0],
'organic_rank': row[1],
'sponsored_rank': row[2],
'timestamp': row[3]
}
for row in results
]
def close(self):
"""关闭数据库连接"""
self.cursor.close()
self.conn.close()
四、流量归因分析算法
4.1 流量来源归因
import pandas as pd
import numpy as np
from typing import Dict, List, Tuple
class TrafficAttributionAnalyzer:
"""
流量归因分析器
"""
def __init__(self, storage: DataStorage):
self.storage = storage
def analyze_traffic_source(self,
asin: str,
sales_change: float,
date_range: Tuple[str, str]) -> Dict:
"""
分析流量来源变化
Args:
asin: 产品ASIN
sales_change: 销量变化百分比
date_range: 分析日期范围 (start_date, end_date)
Returns:
归因分析结果
"""
# 1. 获取排名变化数据
ranking_changes = self._analyze_ranking_changes(asin, date_range)
# 2. 获取竞品变化数据
competitor_changes = self._analyze_competitor_changes(asin, date_range)
# 3. 计算各渠道贡献度
attribution = self._calculate_attribution(
sales_change,
ranking_changes,
competitor_changes
)
return attribution
def _analyze_ranking_changes(self,
asin: str,
date_range: Tuple[str, str]) -> Dict:
"""
分析排名变化
"""
sql = """
WITH ranked_data AS (
SELECT
keyword,
organic_rank,
created_at,
LAG(organic_rank) OVER (PARTITION BY keyword ORDER BY created_at) as prev_rank
FROM keyword_rankings
WHERE asin = %s
AND created_at BETWEEN %s AND %s
)
SELECT
keyword,
AVG(organic_rank) as avg_rank,
AVG(prev_rank) as prev_avg_rank,
COUNT(*) as data_points
FROM ranked_data
WHERE prev_rank IS NOT NULL
GROUP BY keyword
"""
self.storage.cursor.execute(sql, (asin, date_range[0], date_range[1]))
results = self.storage.cursor.fetchall()
ranking_changes = {}
for row in results:
keyword, avg_rank, prev_avg_rank, data_points = row
if prev_avg_rank and avg_rank:
change = prev_avg_rank - avg_rank # 正值表示排名提升
ranking_changes[keyword] = {
'current_rank': avg_rank,
'previous_rank': prev_avg_rank,
'change': change,
'improvement_pct': (change / prev_avg_rank * 100) if prev_avg_rank > 0 else 0
}
return ranking_changes
def _analyze_competitor_changes(self,
asin: str,
date_range: Tuple[str, str]) -> List[Dict]:
"""
分析竞品变化
"""
sql = """
SELECT
asin,
AVG(CASE WHEN in_stock THEN 1 ELSE 0 END) as stock_rate,
AVG(price) as avg_price,
MIN(price) as min_price,
MAX(price) as max_price
FROM competitor_data
WHERE created_at BETWEEN %s AND %s
GROUP BY asin
"""
self.storage.cursor.execute(sql, (date_range[0], date_range[1]))
results = self.storage.cursor.fetchall()
competitor_changes = []
for row in results:
comp_asin, stock_rate, avg_price, min_price, max_price = row
# 检测断货
if stock_rate < 0.5: # 50%以上时间断货
competitor_changes.append({
'asin': comp_asin,
'type': 'stockout',
'severity': 'high' if stock_rate < 0.2 else 'medium'
})
# 检测价格大幅波动
if avg_price and max_price and min_price:
price_volatility = (max_price - min_price) / avg_price
if price_volatility > 0.2: # 价格波动超过20%
competitor_changes.append({
'asin': comp_asin,
'type': 'price_change',
'volatility': price_volatility,
'severity': 'high' if price_volatility > 0.3 else 'medium'
})
return competitor_changes
def _calculate_attribution(self,
sales_change: float,
ranking_changes: Dict,
competitor_changes: List[Dict]) -> Dict:
"""
计算流量归因
"""
attribution = {
'primary_source': None,
'confidence': 0.0,
'contributing_factors': [],
'recommendations': []
}
# 评分系统
scores = {
'organic_ranking': 0,
'competitor_impact': 0,
'paid_advertising': 0,
'external_traffic': 0
}
# 1. 评估排名变化的影响
significant_ranking_improvements = [
k for k, v in ranking_changes.items()
if v['change'] > 5 # 排名提升超过5位
]
if significant_ranking_improvements:
# 排名提升越多,有机流量贡献越大
avg_improvement = np.mean([
ranking_changes[k]['change']
for k in significant_ranking_improvements
])
scores['organic_ranking'] = min(avg_improvement * 10, 100)
attribution['contributing_factors'].append({
'factor': 'organic_ranking_improvement',
'keywords': significant_ranking_improvements,
'impact_score': scores['organic_ranking']
})
# 2. 评估竞品影响
high_severity_competitor_issues = [
c for c in competitor_changes
if c.get('severity') == 'high'
]
if high_severity_competitor_issues:
scores['competitor_impact'] = len(high_severity_competitor_issues) * 20
attribution['contributing_factors'].append({
'factor': 'competitor_issues',
'details': high_severity_competitor_issues,
'impact_score': scores['competitor_impact']
})
# 3. 确定主要来源
max_score_source = max(scores, key=scores.get)
attribution['primary_source'] = max_score_source
attribution['confidence'] = min(scores[max_score_source] / 100, 0.95)
# 4. 生成建议
if max_score_source == 'organic_ranking':
attribution['recommendations'].append(
"排名提升是主要流量来源,建议增加广告预算巩固排名位置"
)
elif max_score_source == 'competitor_impact':
attribution['recommendations'].append(
"竞品出现问题导致流量转移,建议加快补货并优化listing以抓住机会"
)
return attribution
五、异常检测与告警
5.1 异常检测算法
from scipy import stats
from typing import Optional
class AnomalyDetector:
"""
异常检测器
使用统计方法检测流量异常
"""
def __init__(self, storage: DataStorage, threshold: float = 2.0):
"""
Args:
storage: 数据存储对象
threshold: Z-score阈值(默认2.0,即2个标准差)
"""
self.storage = storage
self.threshold = threshold
def detect_ranking_anomaly(self,
asin: str,
keyword: str,
current_rank: int,
lookback_days: int = 30) -> Optional[Dict]:
"""
检测排名异常
Args:
asin: 产品ASIN
keyword: 关键词
current_rank: 当前排名
lookback_days: 回溯天数
Returns:
异常信息字典,无异常返回None
"""
# 获取历史排名数据
historical_data = self.storage.get_historical_rankings(
asin, keyword, lookback_days
)
if len(historical_data) < 7: # 数据不足
return None
# 提取排名值
ranks = [d['organic_rank'] for d in historical_data if d['organic_rank']]
if not ranks:
return None
# 计算统计指标
mean_rank = np.mean(ranks)
std_rank = np.std(ranks)
if std_rank == 0: # 排名完全稳定
return None
# 计算Z-score
z_score = abs((current_rank - mean_rank) / std_rank)
if z_score > self.threshold:
# 检测到异常
anomaly_type = 'ranking_drop' if current_rank > mean_rank else 'ranking_improvement'
return {
'type': anomaly_type,
'keyword': keyword,
'current_rank': current_rank,
'historical_mean': mean_rank,
'z_score': z_score,
'severity': 'high' if z_score > 3.0 else 'medium',
'message': f"关键词 '{keyword}' 排名异常: 当前{current_rank}位,历史平均{mean_rank:.1f}位"
}
return None
def detect_competitor_anomaly(self,
competitor_asin: str,
current_data: Dict,
lookback_days: int = 30) -> List[Dict]:
"""
检测竞品异常
Args:
competitor_asin: 竞品ASIN
current_data: 当前竞品数据
lookback_days: 回溯天数
Returns:
异常列表
"""
anomalies = []
# 检测断货
if not current_data.get('in_stock'):
anomalies.append({
'type': 'competitor_stockout',
'asin': competitor_asin,
'severity': 'high',
'message': f"竞品 {competitor_asin} 断货"
})
# 检测价格异常
sql = """
SELECT AVG(price) as avg_price, STDDEV(price) as std_price
FROM competitor_data
WHERE asin = %s
AND created_at >= NOW() - INTERVAL '%s days'
AND price IS NOT NULL
"""
self.storage.cursor.execute(sql, (competitor_asin, lookback_days))
result = self.storage.cursor.fetchone()
if result and result[0] and result[1]:
avg_price, std_price = result
current_price = current_data.get('price')
if current_price and std_price > 0:
price_z_score = abs((current_price - avg_price) / std_price)
if price_z_score > self.threshold:
anomaly_type = 'price_drop' if current_price < avg_price else 'price_increase'
anomalies.append({
'type': anomaly_type,
'asin': competitor_asin,
'current_price': current_price,
'historical_avg': avg_price,
'z_score': price_z_score,
'severity': 'high' if price_z_score > 3.0 else 'medium',
'message': f"竞品 {competitor_asin} 价格异常: 当前${current_price:.2f},历史平均${avg_price:.2f}"
})
return anomalies
5.2 告警通知
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import requests
class AlertNotifier:
"""
告警通知器
支持Email和Slack通知
"""
def __init__(self, email_config: Dict = None, slack_webhook: str = None):
self.email_config = email_config
self.slack_webhook = slack_webhook
def send_email_alert(self, subject: str, body: str, recipients: List[str]):
"""
发送邮件告警
"""
if not self.email_config:
logger.warning("邮件配置未设置,跳过邮件发送")
return
msg = MIMEMultipart()
msg['From'] = self.email_config['from']
msg['To'] = ', '.join(recipients)
msg['Subject'] = subject
msg.attach(MIMEText(body, 'html'))
try:
with smtplib.SMTP(self.email_config['smtp_host'], self.email_config['smtp_port']) as server:
server.starttls()
server.login(self.email_config['username'], self.email_config['password'])
server.send_message(msg)
logger.info(f"成功发送邮件告警: {subject}")
except Exception as e:
logger.error(f"发送邮件失败: {e}")
def send_slack_alert(self, message: str, severity: str = 'medium'):
"""
发送Slack告警
"""
if not self.slack_webhook:
logger.warning("Slack Webhook未设置,跳过Slack通知")
return
# 根据严重程度设置颜色
color_map = {
'low': '#36a64f', # 绿色
'medium': '#ff9900', # 橙色
'high': '#ff0000' # 红色
}
payload = {
'attachments': [{
'color': color_map.get(severity, '#808080'),
'title': '亚马逊流量异常告警',
'text': message,
'footer': 'Amazon Traffic Monitor',
'ts': int(datetime.now().timestamp())
}]
}
try:
response = requests.post(self.slack_webhook, json=payload)
response.raise_for_status()
logger.info("成功发送Slack告警")
except Exception as e:
logger.error(f"发送Slack告警失败: {e}")
六、定时任务调度
6.1 使用APScheduler实现定时采集
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger
class MonitoringScheduler:
"""
监控任务调度器
"""
def __init__(self,
collector: AmazonDataCollector,
storage: DataStorage,
detector: AnomalyDetector,
notifier: AlertNotifier):
self.collector = collector
self.storage = storage
self.detector = detector
self.notifier = notifier
self.scheduler = BlockingScheduler()
def setup_jobs(self, config: Dict):
"""
设置定时任务
Args:
config: 配置字典,包含监控目标和调度时间
"""
# 每天凌晨2点采集关键词排名
self.scheduler.add_job(
func=self.daily_ranking_job,
trigger=CronTrigger(hour=2, minute=0),
args=[config['target_asin'], config['keywords']],
id='daily_ranking',
name='每日关键词排名采集'
)
# 每6小时监控竞品
self.scheduler.add_job(
func=self.competitor_monitoring_job,
trigger=CronTrigger(hour='*/6'),
args=[config['competitor_asins']],
id='competitor_monitoring',
name='竞品监控'
)
# 每小时检测异常
self.scheduler.add_job(
func=self.anomaly_detection_job,
trigger=CronTrigger(minute=0),
args=[config['target_asin'], config['keywords']],
id='anomaly_detection',
name='异常检测'
)
def daily_ranking_job(self, target_asin: str, keywords: List[str]):
"""
每日排名采集任务
"""
logger.info("开始执行每日排名采集任务")
tracker = KeywordRankTracker(self.collector)
rankings = tracker.track_keyword_ranking(target_asin, keywords)
# 保存到数据库
self.storage.save_keyword_rankings(rankings)
logger.info(f"每日排名采集完成,共采集 {len(rankings)} 条数据")
def competitor_monitoring_job(self, competitor_asins: List[str]):
"""
竞品监控任务
"""
logger.info("开始执行竞品监控任务")
monitor = CompetitorMonitor(self.collector)
competitor_data = monitor.monitor_competitors(competitor_asins)
# 保存到数据库
self.storage.save_competitor_data(competitor_data)
# 检测竞品异常
for data in competitor_data:
anomalies = self.detector.detect_competitor_anomaly(
data['asin'],
data
)
for anomaly in anomalies:
self._handle_anomaly(anomaly)
logger.info(f"竞品监控完成,共监控 {len(competitor_data)} 个竞品")
def anomaly_detection_job(self, target_asin: str, keywords: List[str]):
"""
异常检测任务
"""
logger.info("开始执行异常检测任务")
# 获取最新排名数据
tracker = KeywordRankTracker(self.collector)
current_rankings = tracker.track_keyword_ranking(target_asin, keywords)
# 检测排名异常
for ranking in current_rankings:
if ranking['organic_rank']:
anomaly = self.detector.detect_ranking_anomaly(
target_asin,
ranking['keyword'],
ranking['organic_rank']
)
if anomaly:
self._handle_anomaly(anomaly)
logger.info("异常检测完成")
def _handle_anomaly(self, anomaly: Dict):
"""
处理检测到的异常
"""
# 记录到数据库
sql = """
INSERT INTO traffic_alerts
(asin, alert_type, severity, message, details)
VALUES (%s, %s, %s, %s, %s)
"""
self.storage.cursor.execute(sql, (
anomaly.get('asin', 'unknown'),
anomaly['type'],
anomaly['severity'],
anomaly['message'],
json.dumps(anomaly)
))
self.storage.conn.commit()
# 发送告警通知
if anomaly['severity'] == 'high':
self.notifier.send_slack_alert(
message=anomaly['message'],
severity=anomaly['severity']
)
def start(self):
"""
启动调度器
"""
logger.info("监控调度器启动")
self.scheduler.start()
七、数据可视化
7.1 使用Matplotlib生成趋势图
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime, timedelta
class DataVisualizer:
"""
数据可视化器
"""
def __init__(self, storage: DataStorage):
self.storage = storage
plt.rcParams['font.sans-serif'] = ['SimHei'] # 支持中文
plt.rcParams['axes.unicode_minus'] = False
def plot_ranking_trend(self,
asin: str,
keyword: str,
days: int = 30,
save_path: str = None):
"""
绘制排名趋势图
"""
# 获取历史数据
data = self.storage.get_historical_rankings(asin, keyword, days)
if not data:
logger.warning(f"没有找到 {keyword} 的历史数据")
return
# 准备数据
dates = [d['timestamp'] for d in data]
organic_ranks = [d['organic_rank'] if d['organic_rank'] else None for d in data]
# 创建图表
fig, ax = plt.subplots(figsize=(12, 6))
ax.plot(dates, organic_ranks, marker='o', linestyle='-', linewidth=2,
markersize=4, label='自然排名')
# 设置标题和标签
ax.set_title(f'关键词 "{keyword}" 排名趋势 (ASIN: {asin})', fontsize=14, fontweight='bold')
ax.set_xlabel('日期', fontsize=12)
ax.set_ylabel('排名位置', fontsize=12)
# 反转Y轴(排名越小越好)
ax.invert_yaxis()
# 格式化日期
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m-%d'))
ax.xaxis.set_major_locator(mdates.DayLocator(interval=3))
plt.xticks(rotation=45)
# 添加网格
ax.grid(True, alpha=0.3, linestyle='--')
# 添加图例
ax.legend(loc='best')
plt.tight_layout()
if save_path:
plt.savefig(save_path, dpi=300, bbox_inches='tight')
logger.info(f"排名趋势图已保存到 {save_path}")
else:
plt.show()
plt.close()
八、完整使用示例
8.1 主程序
def main():
"""
主程序入口
"""
# 配置参数
config = {
'api_key': 'your_pangolinfo_api_key',
'db_config': {
'host': 'localhost',
'port': 5432,
'database': 'amazon_monitor',
'user': 'postgres',
'password': 'your_password'
},
'target_asin': 'B08XYZ1234',
'keywords': [
'silicone baking mat',
'non-stick baking sheet',
'reusable baking liner'
],
'competitor_asins': [
'B07ABC1234',
'B09DEF5678',
'B06GHI9012'
],
'email_config': {
'from': 'monitor@example.com',
'smtp_host': 'smtp.gmail.com',
'smtp_port': 587,
'username': 'your_email@gmail.com',
'password': 'your_app_password'
},
'slack_webhook': 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
}
# 初始化组件
collector = AmazonDataCollector(config['api_key'])
storage = DataStorage(config['db_config'])
detector = AnomalyDetector(storage, threshold=2.0)
notifier = AlertNotifier(
email_config=config['email_config'],
slack_webhook=config['slack_webhook']
)
# 设置调度器
scheduler = MonitoringScheduler(collector, storage, detector, notifier)
scheduler.setup_jobs(config)
# 启动监控
try:
scheduler.start()
except (KeyboardInterrupt, SystemExit):
logger.info("监控系统已停止")
finally:
storage.close()
if __name__ == '__main__':
main()
九、性能优化建议
9.1 数据采集优化
- 并发请求:使用
asyncio和aiohttp实现异步并发采集 - 请求限流:避免触发API速率限制
- 缓存机制:使用Redis缓存热点数据
- 增量更新:只采集变化的数据,减少API调用
9.2 数据库优化
- 索引优化:在常用查询字段上建立索引
- 分区表:按时间分区存储历史数据
- 定期归档:将旧数据归档到冷存储
- 连接池:使用数据库连接池提高性能
9.3 监控优化
- 采样策略:对于大量SKU,采用分批采样
- 优先级队列:重要产品优先监控
- 自适应频率:根据变化频率动态调整采集间隔
十、常见问题与解决方案
Q1: API调用频率限制怎么办?
A: Pangolinfo API支持较高的并发,但建议:
- 使用请求队列控制并发数
- 实现指数退避重试机制
- 合理设置采集间隔
Q2: 数据库存储空间不足?
A: 实施数据生命周期管理:
- 保留最近3个月的详细数据
- 3-12个月的数据按天聚合
- 12个月以上的数据归档或删除
Q3: 如何提高异常检测准确率?
A:
- 增加历史数据积累时间(至少30天)
- 调整Z-score阈值(根据业务特点)
- 结合多个指标综合判断
- 引入机器学习模型(如Isolation Forest)
十一、总结与展望
本文详细介绍了如何构建一个完整的亚马逊listing流量分析系统,涵盖了数据采集、存储、分析、异常检测和可视化的全流程实现。
核心要点:
- 使用Pangolinfo Scrape API获取准确的实时数据
- 建立时间序列数据库存储历史数据
- 实现统计学方法进行异常检测
- 通过流量归因算法定位流量来源
- 自动化监控和及时告警
下一步优化方向:
- 引入机器学习模型提升预测准确性
- 增加更多数据源(如Google Trends、社交媒体)
- 开发Web Dashboard实现可视化管理
- 集成A/B测试功能优化listing
作者简介:资深电商数据工程师,专注于亚马逊数据分析和自动化系统开发。
原创声明:本文为原创技术文章,转载请注明出处。
#亚马逊 #数据分析 #Python #API #流量监控 #电商技术

518

被折叠的 条评论
为什么被折叠?



