深入理解Elasticsearch-Filebeat: config and mechanism

quantLearner發表於2020-12-25
  • How Filebeat Works

    Filebeat consists of two main components: input and harvesters.

  • Harvester

    A harvester is responsible for reading the content of a single file.

    The harvester reads each file, line by line, and sends the content to the output.

    One harvester is started for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remians open while the harvester is running.

    If a file is removed or renamed while it’s being harvested, Filebeat continues to read the file. This has the side effect that the space on your disk is reserved until the harvester closes.

    By default, Filebeat keeps the file open until close_inactive is reached.

  • Input

    An input is resonsible for managing the harvesters and finding all sources to read from.

    Each input runs its own Go routine.

    New lines are only picked up if the size of the file has changed since the harvester was closed.

  • How does Filebeat keep the state of files

    Filebeat keeps the state of each file and frequently flushes the state to disk in the registry file.

    The state is used to remember the last offset a harvester was reading from and to ensure all log lines are sent.

    If the output such as Elasticsearch or Logstash, is not reachable, Filebeat keeps track of the last lines sent and will continue reading the files as soon as the output becomes avaiable again.

  • Configure

    Filebeat modules provide the fastest getting started experience for common log formats. You can configure modules in the modules.d directory(recommended), or in the Filebeat configuration file.

    Because Filebeat modules contain default configurations, Elasticsearch ingest node pipline definitions, and Kibana dashboards to help you implement and deploy a log monitoring solution, before running FIlebeat with modules enabled, make sure you also set up the environment to use Kibana dashboards.

    See Regular expression support for a list of supported regexp patterns.

    Filebeat regular expression support is based on RE2.

    By default, Filebeat identifies files based on their inodes and device IDs.

    The path section of the filebeat.yml config file contains configuration options that define where Filebeat looks for its files.

  • Autodiscover

    When you run applications on containers, they become moving targets to the monitoring system.

    Autodiscover allows you to track them and adapt settings as changes happen.

    Autodiscover providers work by watching for events on the system and translating those events into internal autodiscover events with a common format.

    The Docker autodiscover provider watches for Docker containers to start and stop.

  • Internal Queue

    Filebeat uses an internal queue to store events before publishing them.

  • Modules

    Filebeat modules simplify the collection, parsing, and visualization of common log formats.

    A typical module is composed of one or more filesets.

    A fileset contains the following:

    • Filebeat input configurations, which contain the default paths where to look for the log files.
    • Elasticsearch Ingest Node pipeline definition, which is used to parse the log lines
    • Fields definitions, which are used to configure Elasticsearch with the correct types for each field
    • Sample Kibana dashboards
  • Processor

    You can use processors to filter and enhance data before sending it to the configured output.

    To define a processor, you specify the processor name, an optional condition, and a set of parameters.

    processors:
    - <processor_name>
      when:
        <condition>
      <parameters>
    - <priocessor_name>
      when:
        <condition>
      <parameters>
    

    <processor_name> specifies a processor that performs some kind of action, such as selecting the fields that are exported or adding metadata to the event.

    <condition> specifies an optional condition. If the condition is present, then the action is executed only if the condition is fulfilled. If no condition is set, then the action is always executed.

    <parameters> is the list of parameters to pass to the processor.

    The supported processors are here.