使用事件溯源提高了Elasticsearch的效能 - luis-sena

banq發表於2021-11-29

Elasticseach 並不真正支援更新。在 Elasticsearch 中，更新總是意味著刪除+建立，持續不斷的文件更新可能會使 Elasticsearch 叢集癱瘓。幸運的是，有辦法避免這種情況。

最終的解決方案包括使用事件溯源設計模式將所有需要儲存的更改轉換為事件。此案例的應用程式狀態將儲存在 Elasticsearch 中。

解決方案的第二部分是我們如何儲存該狀態。

我們需要將對每個文件的所有更改組合到一個操作中。這意味著在某些情況下有一個值列表，而在其他情況下只保留最後一次更改（如使用者名稱更改）。為了確保狀態一致，我們需要保證針對同一文件的事件的順序，否則我們可能會面臨狀態與我們的真實來源不匹配的風險。

該解決方案基於流的概念。我使用的是 Kafka，但 Redis Streams 或 AWS Kinesis 都可以正常工作。

這個想法是將所有新更改（如新關注者、使用者名稱更改等）儲存在分割槽主題中。確保您的分割槽鍵與文件 id 一致以保證順序，但也要避免每個使用者 id 一個分割槽，否則您將殺死 Kafka 叢集。

順序對於覆蓋最後一個欄位值（如使用者名稱更改）的事件很重要。我們要確保我們堅持使用最後一個版本而不是中間版本。

為了處理這些訊息，我們需要一個流處理解決方案。對於此示例，我將使用Faust，但它是如此簡單的模式，我建議您使用更適合您的模式。

import base64
import os
import random

import faust


# Models describe how messages are serialized:
# {"user_id": "3fae...", username": "my_new_username"}
class UserEvent(faust.Record):
    user_id: str
    username: str
    follower_id: str
    following_id: str


app = faust.App('es_event_processor', broker='kafka://broker')
topic = app.topic(
    'user_events',
    key_type=str,
    value_type=UserEvent
)


def add_value(user_docs, user_id, key, value, op):
    if user_id not in user_docs:
        user_docs[user_id] = {}
    if op == 'set':
        user_docs[user_id][key] = value
    elif op == 'append':
        if not user_docs[user_id][key]:
            user_docs[user_id][key] = []
        user_docs[user_id][key].append(value)


@app.agent(topic)
async def user_event_consumer(user_events):
    """
    Very simple way to aggregate user events and storing them into ES
    Other options like aggregating into another kafka topic
    or using RocksDB table is also valid
    :param user_events:
    :return:
    """
    print("starting agent")
    # get 1000 messages with 30s timeout
    async for lst in user_events.take(1000, within=30.0):
        user_docs = {}
        for user_event in lst:
            if user_event.username:
                add_value(user_docs, user_event.user_id, 'username', user_event.username, 'set')
            if user_event.follower_id:
                add_value(user_docs, user_event.user_id, 'follower_id', user_event.follower_id, 'set')
            if user_event.following_id:
                add_value(user_docs, user_event.user_id, 'following_id', user_event.following_id, 'set')

        # ES Load logic goes here
        # user_docs already has the aggregated data ready to bulk load into the index
        # or even multiple indexes
        print(len(user_docs.keys()))


@app.timer(interval=0.5)
async def example_sender(app):
    """
    Used to simulate user interactions with the system like username change, add follower, etc
    :param app:
    :return:
    """
    print("preparing msg")
    user_id_lst = ['er56kmn', 'oiuh76n', 'df47kj']

    user_id = random.choice(user_id_lst)
    extra_kwargs = {}
    if random.randint(0,10) >= 5:
        extra_kwargs['username'] = f'c{base64.urlsafe_b64encode(os.urandom(6)).decode()}'

    user_event = UserEvent(
        user_id=user_id,
        follower_id=f'c{base64.urlsafe_b64encode(os.urandom(6)).decode()}',
        following_id=f'c{base64.urlsafe_b64encode(os.urandom(6)).decode()}',
        **extra_kwargs
    )
    await topic.send(
        key=user_id,
        value=user_event,
    )
    print("sent msg")


if __name__ == '__main__':
    app.main()

使用Kafka實現事件溯源
2018-10-31
Kafka事件
事件流與事件溯源
2024-02-04
事件
事件協作和事件溯源
2022-06-17
事件
PHP 事件溯源
2021-07-09
PHP事件
使用Datomic實現沒有麻煩的事件溯源
2018-11-13
事件
Rust中的事件溯源 - ariseyhun
2022-02-08
Rust事件
剖玄析微聚合 - 事件溯源
2021-08-09
事件
說服您的CTO使用事件溯源 -Event Store Blog
2020-05-30
事件
使用EventStoreDB實現事件溯源的Python開源專案
2022-06-14
事件Python
使用EventStoreDB實現事件溯源的Java開源專案
2022-05-12
事件Java
如何讓客戶方便地使用事件溯源？事件溯源有什麼好處？- daryush_d
2020-05-16
事件
Chronicle事件溯源的最佳實踐
2018-12-15
事件
Python的事件溯源開源庫
2019-10-31
Python事件
事件溯源全指南 - Arkwrite
2020-01-02
事件
事件消費者之 Saga - 事件溯源
2021-08-02
事件
事件消費者之 Reactor - 事件溯源
2021-07-26
事件React
事件消費者之 Projector - 事件溯源
2021-07-17
事件Project
Linux中的getrandom()方法效能提高了8450% - Phoronix
2022-02-23
Linuxrandom
.NET的事件溯源構建庫：Eventuous
2021-04-21
事件
Elasticsearch使用syslog傳送Watcher告警事件
2018-07-24
Elasticsearch事件
使用AsyncAPI規範簡潔實現CQRS事件溯源案例
2021-06-18
API事件
事件溯源與流水賬的結賬模式
2024-02-20
事件模式
MySQL的事件溯源Event Sourcing表結構
2020-03-15
MySql事件
事件溯源：是來自事件的狀態與作為狀態的事件？ - verraes
2021-11-16
事件
在微服務中使用事件溯源的六大原因 - Herath
2022-07-27
微服務事件
Rust重寫後效能提高了900倍
2024-12-12
Rust
.NET分散式Orleans - 6 - 事件溯源
2024-03-28
分散式事件
Spring Boot和EventStoreDB事件溯源案例
2022-03-30
Spring Boot事件
Occcurrent：JVM事件溯源工具庫包
2022-02-08
JVM事件
新的谷歌電視更新提高了效能和儲存
2023-05-19
谷歌
.NET Core中的事件溯源開源專案
2018-09-14
事件
從 CRUD 遷移到事件溯源的祕訣 - eventstore
2021-06-06
事件
拯救祭天的程式設計師——事件溯源模式
2021-05-27
程式設計師事件模式
審計系統的一劑良方——事件溯源
2020-11-30
事件
GitHub - soooban/AxonDemo: 使用Axon/Spring Cloud實現事件溯源和CQRS案例
2019-07-08
GithubSpringCloud事件
HomeAway分享雲端事件溯源經驗
2018-11-06
事件
Java反應式事件溯源：領域
2022-01-23
Java事件
從入門到放棄 - 事件溯源
2021-08-16
事件

使用事件溯源提高了Elasticsearch的效能 - luis-sena

相關文章