在Django中實現一個高效能未讀訊息計數器

計數器(Counter)是一個非常常用的功能元件，這篇blog以未讀訊息數為例，介紹了在 Django中實現一個高效能計數器的基本要點。

故事的開始：.count()

假設你有一個Notification Model類，儲存的主要是所有的站內通知：

class Notification(models.Model):
    """一個簡化過的Notification類，擁有三個欄位：

    - `user_id`: 訊息所有人的使用者ID
    - `has_readed`: 表示訊息是否已讀
    """

    user_id = models.IntegerField(db_index=True)
    has_readed = models.BooleanField(default=False)

1

2

3

4

5

6

7

8

9

class Notification(models.Model):

"""一個簡化過的Notification類，擁有三個欄位：

- `user_id`: 訊息所有人的使用者ID

- `has_readed`: 表示訊息是否已讀

"""

user_id = models.IntegerField(db_index=True)

has_readed = models.BooleanField(default=False)

理所當然的，剛開始你會通過這樣的查詢來獲取某個使用者的未讀訊息數：

# 獲取ID為3074的使用者的未讀訊息數
Notification.objects.filter(user_id=3074, has_readed=False).count()

1 2	# 獲取ID為3074的使用者的未讀訊息數 Notification.objects.filter(user_id=3074, has_readed=False).count()

當你的Notification表比較小的時候，這樣的方式沒有任何的問題，但是慢慢的，隨著業務量的擴大。訊息表裡面有了 上億條資料 。很多懶惰的使用者的未讀訊息數都到了上千條。

這時候，你就需要實現一個計數器，讓這個計數器來統計每個使用者的未讀訊息數，這樣比起之前的count() ，我們只需要執行一條簡單的主鍵查詢（或者更優）就可以拿到實時的未讀訊息數了。

更優的方案：建立計數器

首先，讓我們得建立一個新表來儲存每個使用者的未讀訊息數。

class UserNotificationsCount(models.Model):
    """這個Model儲存著每一個使用者的未讀訊息數目"""

    user_id = models.IntegerField(primary_key=True)
    unread_count = models.IntegerField(default=0)

    def __str__(self):
        return '<UserNotificationsCount %s: %s>' % (self.user_id, self.unread_count)

1

2

3

4

5

6

7

8

class UserNotificationsCount(models.Model):

"""這個Model儲存著每一個使用者的未讀訊息數目"""

user_id = models.IntegerField(primary_key=True)

unread_count = models.IntegerField(default=0)

def __str__(self):

return '<UserNotificationsCount %s: %s>' % (self.user_id, self.unread_count)

我們為每一個註冊使用者提供了一條對應的 UserNotificationsCount 記錄來儲存他的未讀訊息數。每次獲取他的未讀訊息數的時候，只需要 UserNotificationsCount.objects.get(pk=user_id).unread_count 就可以了。

接下來，問題的重點來了，我們如何知道什麼時候應該更新我們的計數器？Django在這方面提供了什麼捷徑嗎？

挑戰：實時更新你的計數器

為了讓我們的計數器正常的工作，我們必須實時的更新它，這包括：

當有新的未讀訊息過來的時候，為計數器 +1
當訊息被異常刪除時，如果關聯的訊息為未讀，為計數器 -1
當閱讀完一個新訊息的時候，為計數器 -1

讓我們一個一個來解決這些情況。

在丟擲解決方案之前，我們需要先介紹Django中的一個功能： Signals ，Signals是django提供的一個事件通知機制，它可以讓你在監聽某些自定義或者預設的事件，當這些事件發生的時候，呼叫實現定義好的方法。

比如 django.db.models.signals.pre_save & django.db.models.signals.post_save 表示的是某個Model呼叫save方法之前和之後會觸發的事件，它和Database提供的觸發器在功能上有一點相似。

關於Signals更多的介紹可以參考官方文件，下面讓我們來看看Signals能給我們的計數器帶來什麼好處。

1. 當有新的訊息過來的時候，為計數器 +1

這個情況應該是最好處理的，使用Django的Signals，只需要短短几行程式碼，我們便可以實現這種情況下的計數器更新：

from django.db.models.signals import post_save, post_delete

def incr_notifications_counter(sender, instance, created, **kwargs):
    # 只有當這個instance是新建立，而且has_readed是預設的false才更新
    if not (created and not instance.has_readed):
        return

    # 呼叫 update_unread_count 方法來更新計數器 +1
    NotificationController(instance.user_id).update_unread_count(1)

# 監聽Notification Model的post_save訊號
post_save.connect(incr_notifications_counter, sender=Notification)

1

2

3

4

5

6

7

8

9

10

11

12

from django.db.models.signals import post_save, post_delete

def incr_notifications_counter(sender, instance, created, **kwargs):

# 只有當這個instance是新建立，而且has_readed是預設的false才更新

if not (created and not instance.has_readed):

return

# 呼叫 update_unread_count 方法來更新計數器 +1

NotificationController(instance.user_id).update_unread_count(1)

# 監聽Notification Model的post_save訊號

post_save.connect(incr_notifications_counter, sender=Notification)

這樣，每當你使用 Notification.create 或者 .save() 之類的方法建立新通知時，我們的 NotificationController 便會得到通知，為計數器 +1。

但是請注意，因為我們的計數器是基於Django的signals，如果你的程式碼裡面有地方在使用原始sql，沒有通過Django ORM方法來新增新通知的話，我們的計數器是不會得到通知的，所以，最好規範所有的新通知建立方式，比如使用同一個API。

2. 當訊息被異常刪除時，如果關聯的訊息為未讀，為計數器 -1

有了第一個的經驗，這種情況處理起來也比較簡單，只需要監控Notification的post_delete 訊號就可以了，下面是一段例項程式碼：

def decr_notifications_counter(sender, instance, **kwargs):
    # 當刪除的訊息還沒有被讀過時，計數器 -1
    if not instance.has_readed:
        NotificationController(instance.user_id).update_unread_count(-1)

post_delete.connect(decr_notifications_counter, sender=Notification)

1

2

3

4

5

6

def decr_notifications_counter(sender, instance, **kwargs):

# 當刪除的訊息還沒有被讀過時，計數器 -1

if not instance.has_readed:

NotificationController(instance.user_id).update_unread_count(-1)

post_delete.connect(decr_notifications_counter, sender=Notification)

至此，Notification的刪除事件也能正常的更新我們的計數器了。

3. 當閱讀一個新訊息的時候，為計數器 -1

接下來，當使用者閱讀某條未讀訊息的時候，我們也需要更新我們的未讀訊息計數器。你可能會說，這有什麼難的？我只要在我的閱讀訊息的方法裡面，手動更新我的計數器不就好了？

比如這樣：

class NotificationController(object):

    ... ...

    def mark_as_readed(self, notification_id):
        notification = Notification.objects.get(pk=notification_id)
        # 沒有必要重複標記一個已經讀過的通知
        if notication.has_readed:
            return

        notification.has_readed = True
        notification.save()
        # 在這裡更新我們的計數器，嗯，我感覺好極了
        self.update_unread_count(-1)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

class NotificationController(object):

... ...

def mark_as_readed(self, notification_id):

notification = Notification.objects.get(pk=notification_id)

# 沒有必要重複標記一個已經讀過的通知

if notication.has_readed:

return

notification.has_readed = True

notification.save()

# 在這裡更新我們的計數器，嗯，我感覺好極了

self.update_unread_count(-1)

通過一些簡單的測試，你可以會覺得你的計數器工作的非常好，但是，這樣的實現方式有一個非常致命的問題， 這個方式沒有辦法正常處理併發的請求 。

打一個比方，你擁有一個id為100的未讀訊息物件，這個時候同時有了兩個請求過來，都要標記這個通知為已讀：

# 因為兩個併發的請求，假設這兩個方法幾乎同時被呼叫
NotificationController(user_id).mark_as_readed(100)
NotificationController(user_id).mark_as_readed(100)

1

2

3

# 因為兩個併發的請求，假設這兩個方法幾乎同時被呼叫

NotificationController(user_id).mark_as_readed(100)

顯而易見的，這兩次方法都會成功的標記這條通知為已讀，因為在併發的情況下，if notification.has_readed 這樣的檢查無法正常工作，所以我們的計數器將會被錯誤的 -1 兩次，但其實我們只讀了一條請求。

那麼，這樣的問題應該怎麼解決呢？

基本上，解決併發請求產生的資料衝突只有一個辦法：加鎖，介紹兩種比較簡單的解決方案：

使用 `select for update` 資料庫查詢

select ... for update 是資料庫層面上專門用來解決併發取資料後再修改的場景的，主流的關聯式資料庫比如mysql、postgresql都支援這個功能，新版的Django ORM甚至直接提供了這個功能的shortcut 。關於它的更多介紹，你可以搜尋你使用的資料庫的介紹文件。

使用 select for update 後，我們的程式碼可能會變成這樣：

from django.db import transaction

class NotificationController(object):

    ... ...

    def mark_as_readed(self, notification_id):
        # 手動讓select for update和update語句發生在一個完整的事務裡面
        with transaction.commit_on_success():
            # 使用select_for_update來保證併發請求同時只有一個請求在處理，其他的請求
            # 等待鎖釋放
            notification = Notification.objects.select_for_update().get(pk=notification_id)
            # 沒有必要重複標記一個已經讀過的通知
            if notication.has_readed:
                return

            notification.has_readed = True
            notification.save()
            # 在這裡更新我們的計數器，嗯，我感覺好極了
            self.update_unread_count(-1)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

from django.db import transaction

class NotificationController(object):

... ...

def mark_as_readed(self, notification_id):

# 手動讓select for update和update語句發生在一個完整的事務裡面

with transaction.commit_on_success():

# 使用select_for_update來保證併發請求同時只有一個請求在處理，其他的請求

# 等待鎖釋放

notification = Notification.objects.select_for_update().get(pk=notification_id)

# 沒有必要重複標記一個已經讀過的通知

if notication.has_readed:

return

notification.has_readed = True

notification.save()

# 在這裡更新我們的計數器，嗯，我感覺好極了

self.update_unread_count(-1)

除了使用select for update這樣的功能，還有一個比較簡單的辦法來解決這個問題。

使用update來實現原子性修改

其實，更簡單的辦法，只要把我們的資料庫改成單條的update就可以解決併發情況下的問題了：

def mark_as_readed(self, notification_id):
        affected_rows = Notification.objects.filter(pk=notification_id, has_readed=False)\
                                            .update(has_readed=True)
        # affected_rows將會返回update語句修改的條目數
        self.update_unread_count(affected_rows)

1

2

3

4

5

def mark_as_readed(self, notification_id):

affected_rows = Notification.objects.filter(pk=notification_id, has_readed=False)\

.update(has_readed=True)

# affected_rows將會返回update語句修改的條目數

self.update_unread_count(affected_rows)

這樣，併發的標記已讀操作也可以正確的影響到我們的計數器了。

高效能？

我們在之前介紹瞭如何實現一個能夠正確更新的未讀訊息計數器，我們可能會直接使用UPDATE 語句來修改我們的計數器，就像這樣：

from django.db.models import F

def update_unread_count(self, count)
    # 使用Update語句來更新我們的計數器
    UserNotificationsCount.objects.filter(pk=self.user_id)\
                                  .update(unread_count=F('unread_count') + count)

1

2

3

4

5

6

from django.db.models import F

def update_unread_count(self, count)

# 使用Update語句來更新我們的計數器

UserNotificationsCount.objects.filter(pk=self.user_id)\

.update(unread_count=F('unread_count') + count)

但是在生產環境中，這樣的處理方式很有可能造成嚴重的效能問題，因為如果我們的計數器在頻繁更新的話，海量的Update會給資料庫造成不小的壓力。所以為了實現一個高效能的計數器，我們需要把改動暫存起來，然後批量寫入到資料庫。

使用 redis 的 sorted set ，我們可以非常輕鬆的做到這一點。

使用sorted set來快取計數器改動

redis是一個非常好用的記憶體資料庫，其中的sorted set是它提供的一種資料型別：有序集合，使用它，我們可以非常簡單的快取所有的計數器改動，然後批量回寫到資料庫。

RK_NOTIFICATIONS_COUNTER = 'ss_pending_counter_changes'

def update_unread_count(self, count):
    """修改過的update_unread_count方法"""
    redisdb.zincrby(RK_NOTIFICATIONS_COUNTER, str(self.user_id), count)

# 同時我們也需要修改獲取使用者未讀訊息數方法，使其獲取redis中那些沒有被回寫
# 到資料庫的緩衝區資料。在這裡程式碼就省略了

1

2

3

4

5

6

7

8

RK_NOTIFICATIONS_COUNTER = 'ss_pending_counter_changes'

def update_unread_count(self, count):

"""修改過的update_unread_count方法"""

redisdb.zincrby(RK_NOTIFICATIONS_COUNTER, str(self.user_id), count)

# 同時我們也需要修改獲取使用者未讀訊息數方法，使其獲取redis中那些沒有被回寫

# 到資料庫的緩衝區資料。在這裡程式碼就省略了

通過以上的程式碼，我們把計數器的更新緩衝在了redis裡面，我們還需要一個指令碼來把這個緩衝區裡面的資料定時回寫到資料庫中。

通過自定義django的command，我們可以非常輕鬆的做到這一點：

# File: management/commands/notification_update_counter.py

# -*- coding: utf-8 -*-
from django.core.management.base import BaseCommand
from django.db.models import F

# Fix import prob
from notification.models import UserNotificationsCount
from notification.utils import RK_NOTIFICATIONS_COUNTER
from base_redis import redisdb

import logging
logger = logging.getLogger('stdout')

class Command(BaseCommand):
    help = 'Update UserNotificationsCounter objects, Write changes from redis to database'

    def handle(self, *args, **options):
        # 首先，通過 zrange 命令來獲取緩衝區所有修改過的使用者ID
        for user_id in redisdb.zrange(RK_NOTIFICATIONS_COUNTER, 0, -1):
            # 這裡值得注意，為了保證操作的原子性，我們使用了redisdb的pipeline
            pipe = redisdb.pipeline()
            pipe.zscore(RK_NOTIFICATIONS_COUNTER, user_id)
            pipe.zrem(RK_NOTIFICATIONS_COUNTER, user_id)
            count, _ = pipe.execute()
            count = int(count)
            if not count:
                continue

            logger.info('Updating unread count user %s: count %s' % (user_id, count))
            UserNotificationsCount.objects.filter(pk=obj.pk)\
                                          .update(unread_count=F('unread_count') + count)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

# File: management/commands/notification_update_counter.py

# -*- coding: utf-8 -*-

from django.core.management.base import BaseCommand

from django.db.models import F

# Fix import prob

from notification.models import UserNotificationsCount

from notification.utils import RK_NOTIFICATIONS_COUNTER

from base_redis import redisdb

import logging

logger = logging.getLogger('stdout')

class Command(BaseCommand):

help = 'Update UserNotificationsCounter objects, Write changes from redis to database'

def handle(self, *args, **options):

# 首先，通過 zrange 命令來獲取緩衝區所有修改過的使用者ID

for user_id in redisdb.zrange(RK_NOTIFICATIONS_COUNTER, 0, -1):

# 這裡值得注意，為了保證操作的原子性，我們使用了redisdb的pipeline

pipe = redisdb.pipeline()

pipe.zscore(RK_NOTIFICATIONS_COUNTER, user_id)

pipe.zrem(RK_NOTIFICATIONS_COUNTER, user_id)

count, _ = pipe.execute()

count = int(count)

if not count:

continue

logger.info('Updating unread count user %s: count %s' % (user_id, count))

UserNotificationsCount.objects.filter(pk=obj.pk)\

.update(unread_count=F('unread_count') + count)

之後，通過 python manage.py notification_update_counter 這樣的命令就可以把緩衝區裡面的改動批量回寫到資料庫了。我們還可以把這個命令配置到crontab中來定義執行。

總結

文章到了這裡，一個簡單的“高效能”未讀訊息計數器算是實現完了。說了這麼多，其實主要的知識點就是這麼些：

使用Django的signals來獲取Model的新建/刪除操作更新
使用資料庫的select for update來正確處理併發的資料庫操作
使用redis的sorted set來快取計數器的修改操作

希望能對您有所幫助。 :)

在Django中實現一個高效能未讀訊息計數器

故事的開始：.count()

更優的方案：建立計數器

挑戰：實時更新你的計數器

1. 當有新的訊息過來的時候，為計數器 +1

2. 當訊息被異常刪除時，如果關聯的訊息為未讀，為計數器 -1

3. 當閱讀一個新訊息的時候，為計數器 -1

使用 select for update 資料庫查詢

使用update來實現原子性修改

高效能？

使用sorted set來快取計數器改動

總結

相關文章

使用 `select for update` 資料庫查詢