SQL語句通常不是很容易理解,特別是你閱讀別人已經寫好的語句。因此,很多人指出我們應該遵循在其他語言中遵循的原則,像加上註釋和功能模組化。我最新注意到一個很多人都沒有使用的Postgres關鍵特性,也就是 @timonk在AWS Re:Invent 大會關於資料倉儲服務Redshift主題演講時指出的一個特性。這個特性實際上使得SQL兼具了可讀性和模組性。在以前,我回頭閱讀自己的幾個月前的SQL語句,通常很難理解,而現在我可以做到這一點。
這個特性就是CTEs,也就是公用表表示式,你有可能稱做它為WITH
語句。和資料庫中檢視一樣,它的主要好處就是,它允許你在當前事務中建立臨時表。你可以大量使用它,因為它允許你思路清晰的構建模組,別人很容易就理解你在做什麼。
讓我們舉個簡單的例子
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
WITH users_tasks AS ( SELECT users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ) |
通過這樣定義臨時表users_tasks,我就可以在後面加上對users_tasks基本查詢語句,像:
1 2 |
SELECT * FROM users_tasks; |
有趣的是你可以將它們連在一起。當我知道分配給每個使用者的任務量時,也許我想知道在一個指定的任務上,誰因為對這個任務負責超過了50%而因此造成瓶頸。為了簡化,我們可以使用多種方式,先計算每個任務的總量,然後是每人針對每個任務的負責總量。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), |
現在我們將組合一下然後發現超過50%的使用者
1 2 3 4 5 6 7 |
overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) |
最終目標,我想獲得超負荷工作這的使用者和任務的逗號分隔列表。我們只要簡單地對overloaded_users和 users_tasks的初始列表進行join操作。放在一起可能有點長,但是可讀性強。作為額外幫助,我又在每一層加了註釋。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
--- Created by Craig Kerstiens 11/18/2013 --- Query highlights users that have over 50% of tasks on a given project --- Gives comma separated list of their tasks and the project --- Initial query to grab project title and tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), --- Calculates the total tasks per each project total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), --- Calculates the projects per each user tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), --- Gets user ids that have over 50% of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE users_tasks.user_id = overloaded_users.user_id |
CTEs通常不如經過精簡優化過的SQL語句效能高。大多數差距小於一倍差距。對我而言,這種為了可讀性作出的折中是毋庸置疑的。Postgres優化器以後肯定會針對這點變的更好。
多說一句,是的我可以用大約10-15行簡短的SQL語句做同樣的事情,但是你也許不能很快的理解它。當你碰到需要保證SQL做正確的事情時,可讀性的優勢就出來了。SQL語句總是有個結果,你對此毫無疑問。確保你SQL語句容易推理是保證正確性的關鍵。