Fenriswolf 程式筆記

奮利斯狼的地盤,小綿羊勿入

Pig Latin / SQL 對照表

很多人都是從 RDB 的程式開始寫起,在學習 Pig 的過程一定會想「SQL 裡的 xxx 功能用 Pig 應該要怎麼寫」
以下的對照表應該會對想學 Pig 的人有所幫助

1. Select
1.1 Select All
SQL

SELECT * FROM mytable;

Pig

A = LOAD 'data' USING PigStorage(); 
DUMP A;

1.2 Select by columns
SQL

SELECT col1, col2 FROM mytable;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); 
B = FOREACH A GENERATE col1, col2;
DUMP B;

1.3 Set column alias
SQL

SELECT col1 AS new_col1, col2 AS new_col2 FROM mytable;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); 
B = FOREACH A GENERATE col1 AS new_col1, col2 AS new_col2;
DUMP B;

1.4 Type Casting
SQL

SELECT col1::INTEGER, col2::VARCHAR FROM mytable;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float);
B = FOREACH A GENERATE (int)col1, (chararray)col2;
DUMP B;

1.5 Limit
SQL

SELECT * FROM mytable LIMIT 10;

Pig

A = LOAD 'data' USING PigStorage();
B = LIMIT A 10;
DUMP B;

1.6 Order By
SQL

SELECT * FROM mytable ORDER BY col1 ASC;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray);
B = ORDER A BY col1 ASC;
DUMP B;

1.7 Distinct
SQL

SELECT DISTINCT col1, col2 FROM mytable;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float);
B = FOREACH A GENERATE col1, col2;
C = DISTINCT B;
DUMP C;

 
 
2. Where
2.1 One Condition
SQL

SELECT * FROM mytable WHERE col1 > 20;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray);
B = FILTER A BY col1 > 20;
DUMP B;

2.2 AND/OR
SQL

SELECT * FROM mytable WHERE col1 > 20 AND col2 == 'red';

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray);
B = FILTER A BY (col1 > 20 AND col2 == 'red');
DUMP B;

2.3 Like
SQL

SELECT * FROM mytable WHERE col1 like 'pig%';

Pig

A = LOAD 'data' USING PigStorage() AS (col1:chararray);
B = FILTER A BY col1 matches 'pig.*';
DUMP B;

 
 
3. Join
3.1 Inner Join
SQL

SELECT * FROM mytable INNER JOIN othertable ON mytable.col1 = othertable.col1;

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray);
B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray);
C = JOIN A BY col1, B BY col1;
DUMP C;

3.2 Left Outer Join
SQL

SELECT * FROM mytable LEFT OUTER JOIN othertable ON mytable.col1 = othertable.col1;

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray);
B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray);
C = JOIN A by col1 LEFT OUTER, B BY col1;
DUMP C;

3.3 Right Outer Join
SQL

SELECT * FROM mytable RIGHT OUTER JOIN othertable ON mytable.col1 = othertable.col1;

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray);
B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray);
C = JOIN A by col1 RIGHT OUTER, B BY col1;
DUMP C;

3.4 Full Outer Join
SQL

SELECT * FROM mytable FULL OUTER JOIN othertable ON mytable.col1 = othertable.col1;

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray);
B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray);
C = JOIN A by col1 FULL OUTER, B BY col1;
DUMP C;

3.5 Cross
SQL

SELECT * FROM mytable, othertable;

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray);
B = LOAD 'data2' USING PigStorage() AS (col1:int, col2:chararray);
C = CROSS A, B;
DUMP C;

 
 
4. Group By
4.1 Count all records
SQL

SELECT COUNT(*) FROM mytable;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); 
B = GROUP A ALL;
C = FOREACH B GENERATE COUNT(A);
DUMP C;

4.2 Distinct Count
SQL

SELECT COUNT(DISTINCT col1) AS COUNT FROM mytable;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float);
B = FOREACH A GENERATE col1;
C = DISTINCT B;
D = GROUP C BY col1;
E = FOREACH D GENERATE COUNT(C) AS count;
DUMP E;

4.3 Distinct Count by Column
SQL

SELECT col1, COUNT(DISTINCT col2) AS COUNT FROM mytable GROUP BY col1;

Pig

A = LOAD 'data' USING PigStorage() AS (col1:int, col2:chararray, col3:float); 
B = GROUP A by col1;
C = FOREACH B {
        D = DISTINCT A.col2;
        GENERATE group as col1, COUNT(D) as count;}
DUMP C;

 
 
5. Union
SQL

SELECT mycolumn FROM mytable1 UNION SELECT mycolumn FROM mytable2

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int);
B = LOAD 'data2' USING PigStorage() AS (col1:int);
C = UNION A, B;
DUMP C;

5.2 Union all
SQL

SELECT * FROM mytable1 UNION SELECT * FROM mytable2

Pig

A = LOAD 'data1' USING PigStorage() AS (col1:int, col2:chararray, col3:float);
B = LOAD 'data2' USING PigStorage() AS (col4:int, col5:chararray);
C = UNION A, B;
DUMP C;

 
 
執行環境
CentOS 6.2
JDK 1.6.0_31
Cloudera CDH3U3
Pig 0.9.2

參考資料
Apache Pig Getting Started
Pig Latin Basics
Built In Functions

廣告

2012/03/22 - Posted by | Pig | ,

1 則迴響 »

  1. […] (转载)Pig Latin/SQL 对照 八 7 Cloud 原文地址:https://fenriswolf.me/2012/03/22/pig-latin-sql-%E5%B0%8D%E7%85%A7%E8%A1%A8/ […]

    通告 由 (转载)Pig Latin/SQL 对照 - 尚岩 | 2012/08/07 | 回應


發表迴響

在下方填入你的資料或按右方圖示以社群網站登入:

WordPress.com Logo

您的留言將使用 WordPress.com 帳號。 登出 /  變更 )

Google+ photo

您的留言將使用 Google+ 帳號。 登出 /  變更 )

Twitter picture

您的留言將使用 Twitter 帳號。 登出 /  變更 )

Facebook照片

您的留言將使用 Facebook 帳號。 登出 /  變更 )

w

連結到 %s

%d 位部落客按了讚: