[Design Pattern] Upload big file - 3. Code Design - part 1

Zhentiw發表於2024-12-05

Code design

SDK has 3 layers:

[Design Pattern] Upload big file - 3. Code Design - part 1

  • Upload Protocol: Defines the communication format between the frontend and backendt
  • upload-core: Protocol-based API that provides core functionalities such as creating and reading protocol fields, as well as common utility functions for both frontend and backend.
  • upload-client: SDK for client
  • upload-server: SDK for BFF

upload-core: resuable functions

EvnetEmitter

Unify frontend and backend event handling by using a publish-subscribe pattern to provide a standardized EventEmitter class.

  • Frontend Events:

    • Upload Progress Changed Event: Triggered when the upload progress updates.
    • Upload Paused/Resumed Event: Triggered when the upload is paused or resumed.
  • Backend Events:

    • Chunk Write Completed Event: Triggered when a chunk is successfully written to storage.
    • Chunk Merge Completed Event: Triggered when all chunks are successfully merged into the final file.
export class EventEmitter<T extends string> {
  private events: Map<T, Set<Function>>;
  constructor() {
    this.events = new Map();
  }

  on(event: T, listener: Function) {
    if (!this.events.has(event)) {
      this.events.set(event, new Set());
    }
    this.events.get(event)!.add(listener);
  }

  off(event: T, listener: Function) {
    if (!this.events.has(event)) return;
    this.events.get(event)!.delete(listener);
  }

  emit(event: T, ...args: any[]) {
    if (!this.events.has(event)) return;
    this.events.get(event)!.forEach((listener) => listener(...args));
  }
}

TaskQueue

To support concurrent execution of multiple tasks for both frontend and backend, a TaskQueue class can be implemented.

1. Potential concurrent execution on the frontend: concurrent requests
2. Potential concurrent execution on the backend: concurrent chunk hash verification

import { EventEmitter } from "./EventEmitter";

export class Task {
  fn: Function;
  payload?: any;
  constructor(fn: Function, payload?: any) {
    this.fn = fn;
    this.payload = payload;
  }

  run() {
    return this.fn(this.payload);
  }
}

export class TaskQueue extends EventEmitter<"start" | "pause" | "drain"> {
  private tasks: Set<Task> = new Set();
  private currentCount = 0;
  private status: "running" | "paused" = "paused";
  // max concurrency allowed
  private concurrency: number = 4;

  constructor(concurrency: number = 4) {
    super();
    this.concurrency = concurrency;
  }

  add(...tasks: Task[]) {
    tasks.forEach((task) => this.tasks.add(task));
  }

  addAndStart(...tasks: Task[]) {
    this.add(...tasks);
    this.start();
  }

  start() {
    if (this.status === "running") return;

    if (this.tasks.size === 0) {
      this.emit("drain");
      return;
    }

    this.status = "running";
    this.currentCount++;
    this.runNext();
  }

  private takeHeadTask() {
    const task = this.tasks.values().next().value;
    if (task) {
      this.tasks.delete(task);
    }
    return task;
  }

  private runNext() {
    if (this.status !== "running") return;

    if (this.currentCount >= this.concurrency) return;

    const task = this.takeHeadTask();
    if (!task) {
      this.status = "paused";
      this.emit("pause");
      return;
    }
    this.currentCount++;

    Promise.resolve(task.run()).finally(() => {
      this.currentCount--;
      this.runNext();
    });
  }

  pause() {
    this.status = "paused";
    this.emit("pause");
  }
}

Complex issues in frontend code

The frontend involves two core issues:

1. How to split files into chunks
2. How to control requests

How to split files into chunks

First, implement the handling of chunk objects

export interface Chunk {
  blob: Blob;
  start: number;
  end: number;
  hash: string;
  index: number;
}

// Create a chunk with empty hash
export function createChunk(
  file: File,
  index: number,
  chunkSize: number
): Chunk {
  const start = index * chunkSize;
  const end = Math.min((index + 1) * chunkSize, file.size);
  const blob = file.slice(start, end);
  return {
    blob,
    start,
    end,
    hash: "",
    index,
  };
}

export function calcChunkHash(chunk: Chunk): Promise<string> {
  return new Promise((resolve, reject) => {
    const spark = new SparkMD5.ArrayBuffer();
    const fileReader = new FileReader();
    fileReader.onload = (e) => {
      spark.append(e.target?.result as ArrayBuffer);
      resolve(spark.end());
    };
    fileReader.onerror = reject;
    fileReader.readAsArrayBuffer(chunk.blob);
  });
}

Next, the entire file needs to be chunked. There are various chunking methods, such as:

  • Standard chunking
  • Multithreaded chunking: navigator.hardwareConcurrency
  • Main thread time-sliced chunking: requestIdleCallback
  • Other chunking patterns

To ensure versatility, the implementation must provide different chunking modes to the upper layer while also allowing for custom chunking patterns. For this reason, the design employs a template pattern based on an abstract class to handle the process.

Template pattern

The template pattern is a behavioral design pattern that defines the skeleton of an algorithm in a base class (abstract or concrete) and allows derived classes to override specific steps without changing the overall algorithm structure.

Key is define the common steps, for example

abstract class Chess {
  move(x, y) {
    // Boundary Checking
    // Determining Valid Moves in a Game
    // Check rule
    if (rule(x, y)) {
      // finish movment
    }
  }
    
  abstract rule(x: y): boolean
}

class Horse extends Chess {
    rule(x, y): boolean {....}
}
// chunkSplitor.ts
import { EventEmitter } from "../upload-core/EventEmitter";
import { Chunk, createChunk } from "./chunk";

export type ChunkSplitorEvents = "chunks" | "wholeHash" | "drain";

export abstract class ChunkSplitor extends EventEmitter<ChunkSplitorEvents> {
  protected file: File;
  protected chunkSize: number;
  protected chunks: Chunk[] = [];
  protected hash?: string; // hash of the whole file

  private handleChunkCount = 0; // the chunks that have been handled
  private spark = new SparkMD5();
  private hasSplitted = false; // chunked or not

  constructor(file: File, chunkSize: number = 1024 * 1024 * 5) {
    super();
    this.file = file;
    this.chunkSize = chunkSize;
    const chunkCount = Math.ceil(file.size / chunkSize);
    this.chunks = new Array(chunkCount)
      .fill(0)
      .map((_, index) => createChunk(this.file, index, this.chunkSize));
  }

  split() {
    if (this.hasSplitted) return;
    this.hasSplitted = true;
    const emitter = new EventEmitter<"chunks">();
    const chunksHandler = (chunks: Chunk[]) => {
      this.emit("chunks", chunks);
      chunks.forEach((chunk) => {
        this.spark.append(chunk.hash);
      });
      this.handleChunkCount += chunks.length;
      if (this.handleChunkCount === this.chunks.length) {
        emitter.off("chunks", chunksHandler);
        this.emit("wholeHash", this.spark.end());
        this.spark.destroy();
        this.emit("drain");
      }
    };
    emitter.on("chunks", chunksHandler);
    this.calcHash(this.chunks, emitter);
  }

  abstract calcHash(chunks: Chunk[], emitter: EventEmitter<"chunks">): void;
}

Based on this abstract class, various chunking modes can be implemented. Each mode only needs to inherit from ChunkSplitor and implement the calculation of the chunk hash.

For example, a multithreaded chunking class can be implemented very simply.

// multiThreadSplitor.ts
import { EventEmitter } from "../upload-core/EventEmitter";
import { Chunk } from "./chunk";
import { ChunkSplitor } from "./chunkSplitor";

export class MultiThreadSplitor extends ChunkSplitor {
  private workers: Worker[] = new Array(navigator.hardwareConcurrency || 4)
    .fill(0)
    .map(
      () =>
        new Worker(new URL("./splitWorker.ts", import.meta.url), {
          type: "module",
        })
    );

  calcHash(chunks: Chunk[], emitter: EventEmitter<"chunks">): void {
    const workerSize = Math.ceil(chunks.length / this.workers.length);
    for (let i = 0; i < this.workers.length; i++) {
      const worker = this.workers[i];
      const start = i * workerSize;
      const end = Math.min((i + 1) * workerSize, chunks.length);
      const workerChunks = chunks.slice(start, end);
      worker.postMessage(workerChunks);
      worker.onmessage = (e) => {
        emitter.emit("chunks", e.data);
      };
    }
  }

  dispose() {
    this.workers.forEach((worker) => worker.terminate());
  }
}

// splitWorker.ts

import { calcChunkHash, Chunk } from "./chunk";

onmessage = (e) => {
  const chunks = e.data as Chunk[];
  chunks.forEach((chunk) => {
    calcChunkHash(chunk).then((hash) => {
      chunk.hash = hash;
      postMessage([chunk]);
    });
  });
};

相關文章