原文地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView#
lateral view用于和split, explode等UDTF一起使用,它能够将一列数据拆成多行数据,在此基础上可以对拆分后的数据进行聚合。
一个简单的例子,假设我们有一张表pageAds,它有两列数据,第一列是pageid string,第二列是adid_list,即用逗号分隔的广告ID集合:
| string pageid | Array<int> adid_list |
| "front_page" | [1, 2, 3] |
| "contact_page" | [3, 4, 5] |
要统计所有广告ID在所有页面中出现的次数。
首先分拆广告ID:
- SELECT pageid, adid
- FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
执行结果如下:
| string pageid | int adid |
| "front_page" | 1 |
| "front_page" | 2 |
| "front_page" | 3 |
| "contact_page" | 3 |
| "contact_page" | 4 |
| "contact_page" | 5 |
接下来就是一个聚合的统计:
- SELECT adid, count(1)
- FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
- GROUP BY adid;
-
- 执行结果如下:
| int adid | count(1) |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
| 5 | 1 |
| Array<int> col1 | Array<string> col2 |
| [1, 2] | [a", "b", "c"] |
| [3, 4] | [d", "e", "f"] |
- SELECT myCol1, col2 FROM baseTable
- LATERAL VIEW explode(col1) myTable1 AS myCol1;
的执行结果为:| int mycol1 | Array<string> col2 |
| 1 | [a", "b", "c"] |
| 2 | [a", "b", "c"] |
| 3 | [d", "e", "f"] |
| 4 | [d", "e", "f"] |
加上一个lateral view:
- SELECT myCol1, myCol2 FROM baseTable
- LATERAL VIEW explode(col1) myTable1 AS myCol1
- LATERAL VIEW explode(col2) myTable2 AS myCol2;
它的执行结果为:
| int myCol1 | string myCol2 |
| 1 | "a" |
| 1 | "b" |
| 1 | "c" |
| 2 | "a" |
| 2 | "b" |
| 2 | "c" |
| 3 | "d" |
| 3 | "e" |
| 3 | "f" |
| 4 | "d" |
| 4 | "e" |
| 4 | "f" |
注意上面语句中,两个lateral view按照出现的次序被执行。
本文详细介绍了Hive SQL中的Lateral View功能,通过实例展示了如何使用Lateral View与UDTF如explode配合,将复杂数据结构如数组或映射转换为扁平化的表格,以便进行聚合操作。同时,文章还探讨了在同一个查询中使用多个Lateral View的方法。

995

被折叠的 条评论
为什么被折叠?



