django REST fromework 序列化時多次查詢資料庫的解決方案

行行出bug發表於2018-10-18

django REST framework 在django程式中不只是幫助提供REST風格的介面;還有介面的系列化輸出;許可權管理等; 不過在序列化過程中,如果什麼都使用預設的方式,可能導致序列化時多次查詢資料庫,造成效能問題(示例來源):

  • model:
from django.db import models


class Category(models.Model):
    name = models.CharField(max_length=100)


class Blogpost(models.Model):
    title = models.CharField(max_length=100)
    categories = models.ManyToManyField(Category)
複製程式碼
  • url + view:
# urls.py
from rest_framework import routers
from . import views


router = routers.DefaultRouter()
router.register(r'blogposts', views.BlogpostViewSet)
---------------------------------------------------------
# views.py
from rest_framework import viewsets

class BlogpostViewSet(viewsets.ModelViewSet):
    queryset = Blogpost.objects.all().order_by('date')
    serializer_class = serializers.BlogpostSerializer
複製程式碼

等這些做完,呼叫介面,我們在控制檯將sql輸出的話,就會看到像這樣的東西:

# sql輸出設定
LOGGING = {
    ...
    "loggers": {
        ...
        "django.db.backends": {
            "handlers": ["console"],
            "level": "DEBUG",
            "propagate": False,
        }
    }
}
複製程式碼
-- sql 輸出結果
SELECT "app_blogpost"."id", "app_blogpost"."title" FROM "app_blogpost";

SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1025;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 193;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 757;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 853;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1116;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1126;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 964;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 591;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1112;
SELECT "app_category"."id", "app_category"."name" FROM "app_category" INNER JOIN "app_blogpost_categories" ON ("app_category"."id" = "app_blogpost_categories"."category_id") WHERE "app_blogpost_categories"."blogpost_id" = 1034;
...
複製程式碼

是不是嚇了一跳?在序列化時,原生的方式是逐條拿出來,再去查詢對應的關聯物件;筆者的業務表中有四五個外來鍵關聯物件,點完那個介面後好幾屏的sql鋪面而來;如果資料量大的話,這對資料庫絕對是個很大的壓力!後面去網上找了下,找到了比較適合的解決方案,下面分享下: 1. 序列化中需要使用到一對多的物件中的欄位的:

# views.py
queryset = BusinessTable.objects.all().select_related(
        'one_to_one_field_1',
        'one_to_one_field_2',
        ...
    )
---------------------
# serializer
class xxxSerializer(serializers.ModelSerializer):
    target_field = serializers.SerializerMethodField()

    def get_target_field(self, obj):
        ...
        return obj.one_to_one_field_1.target_field
複製程式碼

2. 序列化中需要使用到一對多,多對多的物件中的欄位的

queryset = BusinessTable.objects.all().prefetch_related(
        'many_to_many_field_1',
        'many_to_many_field_2',
        ...
    )
---------------------
# serializer
class xxxSerializer(serializers.ModelSerializer):
    target_field_set = serializers.SerializerMethodField()

    def get_target_field(self, obj):
        ...
        return [it.arget_field for it in obj.many_to_many_field_1.all()]
複製程式碼

3. 序列化中需要對多對多的欄位聚合的

queryset = BusinessTable.objects.all().annotate(
        cnt_xxx=Count("many_to_many_field")
    )
---------------------
# serializer
class xxxSerializer(serializers.ModelSerializer):
    target_field = serializers.SerializerMethodField()

    def get_target_field(self, obj):
        return obj.cnt_xxx
複製程式碼

4. 總結

queryset = BlessingBlog.objects.all().annotate(
        cnt_xxx=Count("xxx") # 需要聚合的新命名欄位,因annotate對語句的解釋,django其實會在外部語句執行聯合查詢,再分組,而不是直接走子查詢。所以這裡其實是取巧,所以要這樣用只能加一個,否則要出錯,要用在子查詢來聚合的可以用RowSql或extra!
    ).select_related( # 一對一關聯表資料
        'yyy1',
        'yyy2',
    ).prefetch_related( # 一對多,多對多關聯表資料
        'xxx1',
        'xxx3',
    ).order_by('zzz1', 'zzz2')
複製程式碼

最後,如果使用了annotate,系統給出的查詢語句對聚合欄位的處理不是通過子查詢實現的,是通過聯合查詢再分組來實現的,這對效能也有很大的影響,而且聚合多個欄位的話還要注意去重,如果對效能要求高的,可以考慮自己手寫查詢語句。

參考: 1 stack overflow 2 部落格 3 django官網 4 部落格

相關文章