Flume學習——Flume的架構

devos發表於2013-11-23

Flume有三個元件:Source、Channel 和 Sink。在原始碼中對應同名的三個介面。

When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it’s consumed by a Flume sink. 

public interface Source extends LifecycleAware, NamedComponent {

  /**
   * Specifies which channel processor will handle this source's events.
   *
   * @param channelProcessor
   */
  public void setChannelProcessor(ChannelProcessor channelProcessor);

  /**
   * Returns the channel processor that will handle this source's events.
   */
  public ChannelProcessor getChannelProcessor();

}

Source並沒有與定義與Event有關的介面,它的介面只是對ChannelProcessor的get和set方法。Source通過獲取對應的ChannelProcessor來完成訊息的投遞。

public interface Sink extends LifecycleAware, NamedComponent {
  public void setChannel(Channel channel);
  public Channel getChannel();
  public Status process() throws EventDeliveryException;
  public static enum Status {
    READY, BACKOFF
  }
}

Sink也有與Channel有關的set和get方法,不過它是對應於Channel,而是ChannelProcessor。ChannelProcessor是位於Source和Channel之間的一個訊息分發器一樣的角色。因此,一個Source的訊息可以通過ChannelProcessor發給多個Channel(簡單的分發到所有,或者有選擇的傳送給特定channel),但是Sink直接對應於Channel,所以每個Sink只從唯一一個Channel中獲得訊息。

ChannelProcessor的API為:

 void close()
 void configure(Context context)
          The Context of the associated Source is passed.
 ChannelSelector getSelector()
 void initialize()
 void processEvent(Event event)
          Attempts to put the given event into each configured channel.
 void processEventBatch(List<Event> events)
          Attempts to put the given events into each configured channel.

 通過ChannelProcessor, Flume可以實現下面的訊息流

public interface Channel extends LifecycleAware, NamedComponent {
  public void put(Event event) throws ChannelException;
  public Event take() throws ChannelException;
  public Transaction getTransaction();
}

A channel connects a Source to a Sink. The source acts as producer while the sink acts as a consumer of events. The channel itself is the buffer between the two.

A channel exposes a Transaction interface that can be used by its clients to ensure atomic put and take semantics. This is necessary to guarantee single hop reliability between agents. For instance, a source will successfully produce an event if and only if that event can be committed to the source's associated channel. Similarly, a sink will consume an event if and only if its respective endpoint can accept the event. The extent of transaction support varies for different channel implementations ranging from strong to best-effort semantics.

Channels are associated with unique names that can be used for separating configuration and working namespaces.

Channel連線起了Source和Sink。Source相當於訊息的生產者,Sink相當於訊息的消費者。Channel相當於二者之間的緩衝層。

更重要的是,Channel提供了Transaction的機制,來確保了訊息的可靠傳遞。

Flume uses a transactional approach to guarantee the reliable delivery of the events. The sources and sinks encapsulate in a transaction the storage/retrieval, respectively, of the events placed in or provided by a transaction provided by the channel. This ensures that the set of events are reliably passed from point to point in the flow. In the case of a multi-hop flow, the sink from the previous hop and the source from the next hop both have their transactions running to ensure that the data is safely stored in the channel of the next hop.

Flume使用了事務來保證訊息的可靠傳遞。Source和Sink對於訊息的儲存和獲取都被包裝在由Channel提供的事務中。一個訊息只有被成功存入下一個agent的Channel(或者至最終的儲存位置),才會被從當前的channel中刪除。這就使用在訊息流中,訊息可以可靠地從一點傳送到另一點。在一個包括agent間傳遞的訊息流中,前一個agent的sink和下一個agent的source都有各自的事務,來保證訊息被安全存入下一個agent的channel.

相關文章