Pig Latin / SQL 對照表
很多人都是從 RDB 的程式開始寫起,在學習 Pig 的過程一定會想「SQL 裡的 xxx 功能用 Pig 應該要怎麼寫」
以下的對照表應該會對想學 Pig 的人有所幫助
1. Select
1.1 Select All
SQL
SELECT * FROM mytable;
Pig
A = LOAD 'data' USING PigStorage(); DUMP A;
1.2 Select by columns
SQL
SELECT col1, col2 FROM mytable;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = FOREACH A GENERATE col1, col2; DUMP B;
1.3 Set column alias
SQL
SELECT col1 AS new_col1, col2 AS new_col2 FROM mytable;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = FOREACH A GENERATE col1 AS new_col1, col2 AS new_col2; DUMP B;
1.4 Type Casting
SQL
SELECT col1::INTEGER, col2::VARCHAR FROM mytable;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = FOREACH A GENERATE (int)col1, (chararray)col2; DUMP B;
1.5 Limit
SQL
SELECT * FROM mytable LIMIT 10;
Pig
A = LOAD 'data' USING PigStorage(); B = LIMIT A 10; DUMP B;
1.6 Order By
SQL
SELECT * FROM mytable ORDER BY col1 ASC;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray); B = ORDER A BY col1 ASC; DUMP B;
1.7 Distinct
SQL
SELECT DISTINCT col1, col2 FROM mytable;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = FOREACH A GENERATE col1, col2; C = DISTINCT B; DUMP C;
2. Where
2.1 One Condition
SQL
SELECT * FROM mytable WHERE col1 > 20;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray); B = FILTER A BY col1 > 20; DUMP B;
2.2 AND/OR
SQL
SELECT * FROM mytable WHERE col1 > 20 AND col2 == 'red';
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray); B = FILTER A BY (col1 > 20 AND col2 == 'red'); DUMP B;
2.3 Like
SQL
SELECT * FROM mytable WHERE col1 like 'pig%';
Pig
A = LOAD 'data' USING PigStorage() AS (col1:chararray); B = FILTER A BY col1 matches 'pig.*'; DUMP B;
3. Join
3.1 Inner Join
SQL
SELECT * FROM mytable INNER JOIN othertable ON mytable.col1 = othertable.col1;
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray); B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray); C = JOIN A BY col1, B BY col1; DUMP C;
3.2 Left Outer Join
SQL
SELECT * FROM mytable LEFT OUTER JOIN othertable ON mytable.col1 = othertable.col1;
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray); B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray); C = JOIN A by col1 LEFT OUTER, B BY col1; DUMP C;
3.3 Right Outer Join
SQL
SELECT * FROM mytable RIGHT OUTER JOIN othertable ON mytable.col1 = othertable.col1;
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray); B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray); C = JOIN A by col1 RIGHT OUTER, B BY col1; DUMP C;
3.4 Full Outer Join
SQL
SELECT * FROM mytable FULL OUTER JOIN othertable ON mytable.col1 = othertable.col1;
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray); B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray); C = JOIN A by col1 FULL OUTER, B BY col1; DUMP C;
3.5 Cross
SQL
SELECT * FROM mytable, othertable;
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray); B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray); C = CROSS A, B; DUMP C;
4. Group By
4.1 Count all records
SQL
SELECT COUNT(*) FROM mytable;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = GROUP A ALL; C = FOREACH B GENERATE COUNT(A); DUMP C;
4.2 Distinct Count
SQL
SELECT COUNT(DISTINCT col1) AS COUNT FROM mytable;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = FOREACH A GENERATE col1; C = DISTINCT B; D = GROUP C BY col1; E = FOREACH D GENERATE COUNT(C) AS count; DUMP E;
4.3 Distinct Count by Column
SQL
SELECT col1, COUNT(DISTINCT col2) AS COUNT FROM mytable GROUP BY col1;
Pig
A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = GROUP A by col1; C = FOREACH B { D = DISTINCT A.col2; GENERATE group as col1, COUNT(D) as count;} DUMP C;
5. Union
SQL
SELECT mycolumn FROM mytable1 UNION SELECT mycolumn FROM mytable2
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int); B = LOAD 'data2' USING PigStorage() AS (col1:int); C = UNION A, B; DUMP C;
5.2 Union all
SQL
SELECT * FROM mytable1 UNION SELECT * FROM mytable2
Pig
A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray, col3:float); B = LOAD 'data2' USING PigStorage() AS (col4:int, col5:chararray); C = UNION A, B; DUMP C;
執行環境
CentOS 6.2
JDK 1.6.0_31
Cloudera CDH3U3
Pig 0.9.2
參考資料
Apache Pig Getting Started
Pig Latin Basics
Built In Functions
廣告
1 則迴響 »
發表迴響
廣告
[…] (转载)Pig Latin/SQL 对照 八 7 Cloud 原文地址:https://fenriswolf.me/2012/03/22/pig-latin-sql-%E5%B0%8D%E7%85%A7%E8%A1%A8/ […]
通告 由 (转载)Pig Latin/SQL 对照 - 尚岩 | 2012/08/07 |