1.创建并上传数据到HDFS
2.加载
records = LOAD 'hdfs://127.0.0.1:8020/pig/pigtestdata' AS (year:chararray,temperature:int,quality:int);
DUMP records;
DESCRIBE records
3.过滤
filter_records = FILTER records BY temperature >= 0 AND quality == 2;
DUMP filter_records
4.分组
group_records = GROUP records BY year;
DUMP group
DESCRIBE group_records
5.数据变换
max_temperature = FOREACH group_records GENERATE group ,MAX(records.temperature);
DUMP max_temperature
本文演示了如何使用Pig加载、过滤、分组和变换HDFS中的数据,包括创建记录、过滤温度和质量、分组数据以及计算最大温度。

2330

被折叠的 条评论
为什么被折叠?



