在dwd层关联维度表时经常会遇到数据倾斜,假如某app商店的曝光数据超百亿,在关联app维度表表时,排在前面的app肯定占了大部分数据,如抖音微信,肯定会遇到数据倾斜。这个时候该怎么解决呢?
mapjoin无疑是比较好的选择!
步骤1:先建个临时表,筛选3000万以上的数据
CREATE TABLE temp_store_exposure_dm
AS
SELECT
t1.app_id
,t2.app_name
,t2.app_category
FROM
(
SELECT
app_id
FROM dwd_store_exposure_dm
WHERE pt_d='$date'
group by app_id
having count(*)>30000000
) t1
inner join
(
SELECT
app_id
,app_name
,app_category
FROM dim_store_app_ds
WHERE pt_d='$date'
) t2
ON t1.app_id=t2.app_id
步骤2:事实表关联临时表 , 走mapjoin
SELECT
t1.app_id
,t2.app_name
,t2.app_category
FROM
(
SELECT
app_id
FROM dwd_store_exposure_dm
WHERE pt_d='$date'
) t1
LEFT JOIN
(
SELECT
app_id
,app_name
,app_category
FROM temp_store_exposure_dm
) t2
ON t1.app_id=t2.app_id
UNION ALL
SELECT
t1.app_id
,t2.app_name
,t2.app_category
FROM
(
SELECT
app_id
FROM dwd_store_exposure_dm
WHERE pt_d='$date'
) t1
LEFT JOIN
(
SELECT
app_id
,app_name
,app_category
FROM temp_store_exposure_dm
) t2
ON t1.app_id=t2.app_id
WHERE t2.app_id IS NULL

1500

被折叠的 条评论
为什么被折叠?



