Kubernetes容器生命週期 —— 鉤子函式詳解(postStart、preStop)

人艰不拆_zmc發表於2024-06-24

1、概述

  容器生命週期鉤子(Container Lifecycle Hooks)監聽容器生命週期的特定事件,並在事件發生時執行已註冊的回撥函式。

  鉤子函式能夠感知自身生命週期中的事件,並在相應的時刻到來時執行使用者指定的程式程式碼。

  kubernetes在主容器的啟動之後和停止之前提供了兩個鉤子函式:

  1. postStart:一種容器鉤子。該鉤子在容器被建立後立刻觸發,通知容器它已經被建立。該鉤子不需要向其所對應的Hook Handler傳入任何引數。如果該鉤子對應的Hook Handler執行失敗,則該容器會終止執行,並根據該容器的重啟策略決定是否要重啟該容器。更多資訊,請參見Container Lifecycle Hooks
  2. preStop :一種容器鉤子。該鉤子在容器被刪除前觸發,其所對應的Hook Handler必須在刪除該容器的請求傳送給Docker Daemon之前完成。在該鉤子對應的Hook Handler完成後不論執行的結果如何,Docker Daemon會傳送一個SIGTERN訊號給Docker Daemon來刪除該容器。更多資訊,請參見Container Lifecycle Hooks 以及《詳解Kubernetes Pod優雅退出》這篇博文。

2、鉤子函式定義方式

  鉤子的回撥函式支援兩種方式定義動作,下面以 postStart 為例講解,preStop 定義方式與其一致:

2.1 exec模式

在容器中執行指定的命令。如果命令退出時返回碼為0,判定為執行成功。

示例:

lifecycle:
    postStart:
      exec:
        command:
        - cat
        - /var/lib/redis.conf

2.2 httpGet模式

對容器中的指定端點執行HTTP GET請求,如果響應的狀態碼大於等於200且小於400,判定為執行成功。

httpGet模式支援以下配置引數:

  • Host:HTTP請求主機地址,不設定時預設為Pod的IP地址。

  • Path:HTTP請求路徑,預設值是/。

  • Port:HTTP請求埠號。

  • Scheme:協議型別,支援HTTP和HTTPS協議,預設是HTTP。

  • HttpHeaders:自定義HTTP請求頭。

示例:訪問 http://192.168.2.150:80/users

 lifecycle:
    postStart:
      httpGet:
        path: /users
        port: 80
        host: 192.168.2.150
        scheme: HTTP  # 或者HTTPS

3、原始碼講解

3.1 PostStart

Kubernetes原始碼(1.21.5,已在原始碼中移除dockershim模組):

pkg/kubelet/kuberuntime/kuberuntime_container.go:

開始容器方法主要包括拉取映象、建立容器、start容器、執行容器postStart鉤子函式。

// startContainer starts a container and returns a message indicates why it is failed on error.
// It starts the container through the following steps:
// * pull the image
// * create the container
// * start the container
// * run the post start lifecycle hooks (if applicable)
func (m *kubeGenericRuntimeManager) startContainer(podSandboxID string, podSandboxConfig *runtimeapi.PodSandboxConfig, spec *startSpec, pod *v1.Pod, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, podIP string, podIPs []string) (string, error) {
	container := spec.container

	// Step 1: pull the image.
	imageRef, msg, err := m.imagePuller.EnsureImageExists(pod, container, pullSecrets, podSandboxConfig)
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, "", v1.EventTypeWarning, events.FailedToCreateContainer, "Error: %v", s.Message())
		return msg, err
	}

	// Step 2: create the container.
	// For a new container, the RestartCount should be 0
	restartCount := 0
	containerStatus := podStatus.FindContainerStatusByName(container.Name)
	if containerStatus != nil {
		restartCount = containerStatus.RestartCount + 1
	}

	target, err := spec.getTargetID(podStatus)
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, "", v1.EventTypeWarning, events.FailedToCreateContainer, "Error: %v", s.Message())
		return s.Message(), ErrCreateContainerConfig
	}

	containerConfig, cleanupAction, err := m.generateContainerConfig(container, pod, restartCount, podIP, imageRef, podIPs, target)
	if cleanupAction != nil {
		defer cleanupAction()
	}
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, "", v1.EventTypeWarning, events.FailedToCreateContainer, "Error: %v", s.Message())
		return s.Message(), ErrCreateContainerConfig
	}

	err = m.internalLifecycle.PreCreateContainer(pod, container, containerConfig)
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, "", v1.EventTypeWarning, events.FailedToCreateContainer, "Internal PreCreateContainer hook failed: %v", s.Message())
		return s.Message(), ErrPreCreateHook
	}

	containerID, err := m.runtimeService.CreateContainer(podSandboxID, containerConfig, podSandboxConfig)
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, containerID, v1.EventTypeWarning, events.FailedToCreateContainer, "Error: %v", s.Message())
		return s.Message(), ErrCreateContainer
	}
	err = m.internalLifecycle.PreStartContainer(pod, container, containerID)
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, containerID, v1.EventTypeWarning, events.FailedToStartContainer, "Internal PreStartContainer hook failed: %v", s.Message())
		return s.Message(), ErrPreStartHook
	}
	m.recordContainerEvent(pod, container, containerID, v1.EventTypeNormal, events.CreatedContainer, fmt.Sprintf("Created container %s", container.Name))

	// Step 3: start the container.
	err = m.runtimeService.StartContainer(containerID)
	if err != nil {
		s, _ := grpcstatus.FromError(err)
		m.recordContainerEvent(pod, container, containerID, v1.EventTypeWarning, events.FailedToStartContainer, "Error: %v", s.Message())
		return s.Message(), kubecontainer.ErrRunContainer
	}
	m.recordContainerEvent(pod, container, containerID, v1.EventTypeNormal, events.StartedContainer, fmt.Sprintf("Started container %s", container.Name))

	// Symlink container logs to the legacy container log location for cluster logging
	// support.
	// TODO(random-liu): Remove this after cluster logging supports CRI container log path.
	containerMeta := containerConfig.GetMetadata()
	sandboxMeta := podSandboxConfig.GetMetadata()
	legacySymlink := legacyLogSymlink(containerID, containerMeta.Name, sandboxMeta.Name,
		sandboxMeta.Namespace)
	containerLog := filepath.Join(podSandboxConfig.LogDirectory, containerConfig.LogPath)
	// only create legacy symlink if containerLog path exists (or the error is not IsNotExist).
	// Because if containerLog path does not exist, only dangling legacySymlink is created.
	// This dangling legacySymlink is later removed by container gc, so it does not make sense
	// to create it in the first place. it happens when journald logging driver is used with docker.
	if _, err := m.osInterface.Stat(containerLog); !os.IsNotExist(err) {
		if err := m.osInterface.Symlink(containerLog, legacySymlink); err != nil {
			klog.ErrorS(err, "Failed to create legacy symbolic link", "path", legacySymlink,
				"containerID", containerID, "containerLogPath", containerLog)
		}
	}

	// Step 4: execute the post start hook.
	if container.Lifecycle != nil && container.Lifecycle.PostStart != nil {
		kubeContainerID := kubecontainer.ContainerID{
			Type: m.runtimeName,
			ID:   containerID,
		}
		msg, handlerErr := m.runner.Run(kubeContainerID, pod, container, container.Lifecycle.PostStart)
		if handlerErr != nil {
			m.recordContainerEvent(pod, container, kubeContainerID.ID, v1.EventTypeWarning, events.FailedPostStartHook, msg)
			if err := m.killContainer(pod, kubeContainerID, container.Name, "FailedPostStartHook", reasonFailedPostStartHook, nil); err != nil {
				klog.ErrorS(fmt.Errorf("%s: %v", ErrPostStartHook, handlerErr), "Failed to kill container", "pod", klog.KObj(pod),
					"podUID", pod.UID, "containerName", container.Name, "containerID", kubeContainerID.String())
			}
			return msg, fmt.Errorf("%s: %v", ErrPostStartHook, handlerErr)
		}
	}

	return "", nil
}

(1)關注下建立容器,kubelet呼叫docker csi,相當於執行 docker create 命令:

pkg/kubelet/cri/remote/remote_runtime.go:

// CreateContainer creates a new container in the specified PodSandbox.
func (r *remoteRuntimeService) CreateContainer(podSandBoxID string, config *runtimeapi.ContainerConfig, sandboxConfig *runtimeapi.PodSandboxConfig) (string, error) {
	klog.V(10).InfoS("[RemoteRuntimeService] CreateContainer", "podSandboxID", podSandBoxID, "timeout", r.timeout)
	ctx, cancel := getContextWithTimeout(r.timeout)
	defer cancel()

	resp, err := r.runtimeClient.CreateContainer(ctx, &runtimeapi.CreateContainerRequest{
		PodSandboxId:  podSandBoxID,
		Config:        config,
		SandboxConfig: sandboxConfig,
	})
	if err != nil {
		klog.ErrorS(err, "CreateContainer in sandbox from runtime service failed", "podSandboxID", podSandBoxID)
		return "", err
	}

	klog.V(10).InfoS("[RemoteRuntimeService] CreateContainer", "podSandboxID", podSandBoxID, "containerID", resp.ContainerId)
	if resp.ContainerId == "" {
		errorMessage := fmt.Sprintf("ContainerId is not set for container %q", config.GetMetadata())
		err := errors.New(errorMessage)
		klog.ErrorS(err, "CreateContainer failed")
		return "", err
	}

	return resp.ContainerId, nil
}

(2)start容器,kubelet呼叫docker csi,相當於執行 docker start 命令:

pkg/kubelet/cri/remote/remote_runtime.go:

// StartContainer starts the container.
func (r *remoteRuntimeService) StartContainer(containerID string) error {
	klog.V(10).InfoS("[RemoteRuntimeService] StartContainer", "containerID", containerID, "timeout", r.timeout)
	ctx, cancel := getContextWithTimeout(r.timeout)
	defer cancel()

	_, err := r.runtimeClient.StartContainer(ctx, &runtimeapi.StartContainerRequest{
		ContainerId: containerID,
	})
	if err != nil {
		klog.ErrorS(err, "StartContainer from runtime service failed", "containerID", containerID)
		return err
	}
	klog.V(10).InfoS("[RemoteRuntimeService] StartContainer Response", "containerID", containerID)

	return nil
}

(3)執行容器postStart鉤子函式(kubelet):

pkg/kubelet/lifecycle/handlers.go:

func (hr *handlerRunner) Run(containerID kubecontainer.ContainerID, pod *v1.Pod, container *v1.Container, handler *v1.Handler) (string, error) {
	switch {
	case handler.Exec != nil:
		var msg string
		// TODO(tallclair): Pass a proper timeout value.
		output, err := hr.commandRunner.RunInContainer(containerID, handler.Exec.Command, 0)
		if err != nil {
			msg = fmt.Sprintf("Exec lifecycle hook (%v) for Container %q in Pod %q failed - error: %v, message: %q", handler.Exec.Command, container.Name, format.Pod(pod), err, string(output))
			klog.V(1).ErrorS(err, "Exec lifecycle hook for Container in Pod failed", "execCommand", handler.Exec.Command, "containerName", container.Name, "pod", klog.KObj(pod), "message", string(output))
		}
		return msg, err
	case handler.HTTPGet != nil:
		msg, err := hr.runHTTPHandler(pod, container, handler)
		if err != nil {
			msg = fmt.Sprintf("HTTP lifecycle hook (%s) for Container %q in Pod %q failed - error: %v, message: %q", handler.HTTPGet.Path, container.Name, format.Pod(pod), err, msg)
			klog.V(1).ErrorS(err, "HTTP lifecycle hook for Container in Pod failed", "path", handler.HTTPGet.Path, "containerName", container.Name, "pod", klog.KObj(pod))
		}
		return msg, err
	default:
		err := fmt.Errorf("invalid handler: %v", handler)
		msg := fmt.Sprintf("Cannot run handler: %v", err)
		klog.ErrorS(err, "Cannot run handler")
		return msg, err
	}
}

cri-dockerd原始碼:

(1)建立容器

core/container_create.go:

// CreateContainer creates a new container in the given PodSandbox
// Docker cannot store the log to an arbitrary location (yet), so we create an
// symlink at LogPath, linking to the actual path of the log.
func (ds *dockerService) CreateContainer(
	_ context.Context,
	r *v1.CreateContainerRequest,
) (*v1.CreateContainerResponse, error) {
	podSandboxID := r.PodSandboxId
	config := r.GetConfig()
	sandboxConfig := r.GetSandboxConfig()

	if config == nil {
		return nil, fmt.Errorf("container config is nil")
	}
	if sandboxConfig == nil {
		return nil, fmt.Errorf("sandbox config is nil for container %q", config.Metadata.Name)
	}

	labels := makeLabels(config.GetLabels(), config.GetAnnotations())
	// Apply a the container type label.
	labels[containerTypeLabelKey] = containerTypeLabelContainer
	// Write the container log path in the labels.
	labels[containerLogPathLabelKey] = filepath.Join(sandboxConfig.LogDirectory, config.LogPath)
	// Write the sandbox ID in the labels.
	labels[sandboxIDLabelKey] = podSandboxID

	apiVersion, err := ds.getDockerAPIVersion()
	if err != nil {
		return nil, fmt.Errorf("unable to get the docker API version: %v", err)
	}

	image := ""
	if iSpec := config.GetImage(); iSpec != nil {
		image = iSpec.Image
	}
	containerName := makeContainerName(sandboxConfig, config)
	mounts := config.GetMounts()
	terminationMessagePath, _ := config.Annotations["io.kubernetes.container.terminationMessagePath"]

	sandboxInfo, err := ds.client.InspectContainer(r.GetPodSandboxId())
	if err != nil {
		return nil, fmt.Errorf("unable to get container's sandbox ID: %v", err)
	}
	createConfig := dockerbackend.ContainerCreateConfig{
		Name: containerName,
		Config: &container.Config{
			Entrypoint: strslice.StrSlice(config.Command),
			Cmd:        strslice.StrSlice(config.Args),
			Env:        libdocker.GenerateEnvList(config.GetEnvs()),
			Image:      image,
			WorkingDir: config.WorkingDir,
			Labels:     labels,
			// Interactive containers:
			OpenStdin: config.Stdin,
			StdinOnce: config.StdinOnce,
			Tty:       config.Tty,
			// Disable Docker's health check until we officially support it
			// (https://github.com/kubernetes/kubernetes/issues/25829).
			Healthcheck: &container.HealthConfig{
				Test: []string{"NONE"},
			},
		},
		HostConfig: &container.HostConfig{
			Mounts: libdocker.GenerateMountBindings(mounts, terminationMessagePath),
			RestartPolicy: container.RestartPolicy{
				Name: "no",
			},
			Runtime: sandboxInfo.HostConfig.Runtime,
		},
	}

	// Only request relabeling if the pod provides an SELinux context. If the pod
	// does not provide an SELinux context relabeling will label the volume with
	// the container's randomly allocated MCS label. This would restrict access
	// to the volume to the container which mounts it first.
	if selinuxOpts := config.GetLinux().GetSecurityContext().GetSelinuxOptions(); selinuxOpts != nil {
		mountLabel, err := selinuxMountLabel(selinuxOpts)
		if err != nil {
			return nil, fmt.Errorf("unable to generate SELinux mount label: %v", err)
		}
		if mountLabel != "" {
			// Equates to "Z" in the old bind API
			const shared = false
			for _, m := range mounts {
				if m.SelinuxRelabel {
					if err := label.Relabel(m.HostPath, mountLabel, shared); err != nil {
						return nil, fmt.Errorf("unable to relabel %q with %q: %v", m.HostPath, mountLabel, err)
					}
				}
			}
		}
	}

	hc := createConfig.HostConfig
	err = ds.updateCreateConfig(
		&createConfig,
		config,
		sandboxConfig,
		podSandboxID,
		securityOptSeparator,
		apiVersion,
	)
	if err != nil {
		return nil, fmt.Errorf("failed to update container create config: %v", err)
	}
	// Set devices for container.
	devices := make([]container.DeviceMapping, len(config.Devices))
	for i, device := range config.Devices {
		devices[i] = container.DeviceMapping{
			PathOnHost:        device.HostPath,
			PathInContainer:   device.ContainerPath,
			CgroupPermissions: device.Permissions,
		}
	}
	hc.Resources.Devices = devices

	securityOpts, err := ds.getSecurityOpts(
		config.GetLinux().GetSecurityContext().GetSeccomp(),
		securityOptSeparator,
	)
	if err != nil {
		return nil, fmt.Errorf(
			"failed to generate security options for container %q: %v",
			config.Metadata.Name,
			err,
		)
	}

	hc.SecurityOpt = append(hc.SecurityOpt, securityOpts...)

	cleanupInfo, err := ds.applyPlatformSpecificDockerConfig(r, &createConfig)
	if err != nil {
		return nil, err
	}

	createResp, createErr := ds.client.CreateContainer(createConfig)
	if createErr != nil {
		createResp, createErr = recoverFromCreationConflictIfNeeded(
			ds.client,
			createConfig,
			createErr,
		)
	}

	if createResp != nil {
		containerID := createResp.ID

		if cleanupInfo != nil {
			// we don't perform the clean up just yet at that could destroy information
			// needed for the container to start (e.g. Windows credentials stored in
			// registry keys); instead, we'll clean up when the container gets removed
			ds.setContainerCleanupInfo(containerID, cleanupInfo)
		}
		return &v1.CreateContainerResponse{ContainerId: containerID}, nil
	}

	// the creation failed, let's clean up right away - we ignore any errors though,
	// this is best effort
	ds.performPlatformSpecificContainerCleanupAndLogErrors(containerName, cleanupInfo)

	return nil, createErr
}

呼叫CreateContainer方法(libdocker/kube_docker_client.go),然後作為 docker 客戶端呼叫 docker 服務端 ContainerCreate 方法。

func (d *kubeDockerClient) CreateContainer(
	opts dockerbackend.ContainerCreateConfig,
) (*dockercontainer.CreateResponse, error) {
	ctx, cancel := context.WithTimeout(context.Background(), d.timeout)
	defer cancel()
	// we provide an explicit default shm size as to not depend on docker daemon.
	if opts.HostConfig != nil && opts.HostConfig.ShmSize <= 0 {
		opts.HostConfig.ShmSize = defaultShmSize
	}
	createResp, err := d.client.ContainerCreate(
		ctx,
		opts.Config,
		opts.HostConfig,
		opts.NetworkingConfig,
		nil,
		opts.Name,
	)
	if ctxErr := contextError(ctx); ctxErr != nil {
		return nil, ctxErr
	}
	if err != nil {
		return nil, err
	}
	return &createResp, nil
}

vendor/github.com/docker/docker/client/container_create.go:

// ContainerCreate creates a new container based on the given configuration.
// It can be associated with a name, but it's not mandatory.
func (cli *Client) ContainerCreate(ctx context.Context, config *container.Config, hostConfig *container.HostConfig, networkingConfig *network.NetworkingConfig, platform *ocispec.Platform, containerName string) (container.CreateResponse, error) {
	var response container.CreateResponse

	// Make sure we negotiated (if the client is configured to do so),
	// as code below contains API-version specific handling of options.
	//
	// Normally, version-negotiation (if enabled) would not happen until
	// the API request is made.
	if err := cli.checkVersion(ctx); err != nil {
		return response, err
	}

	if err := cli.NewVersionError(ctx, "1.25", "stop timeout"); config != nil && config.StopTimeout != nil && err != nil {
		return response, err
	}
	if err := cli.NewVersionError(ctx, "1.41", "specify container image platform"); platform != nil && err != nil {
		return response, err
	}
	if err := cli.NewVersionError(ctx, "1.44", "specify health-check start interval"); config != nil && config.Healthcheck != nil && config.Healthcheck.StartInterval != 0 && err != nil {
		return response, err
	}
	if err := cli.NewVersionError(ctx, "1.44", "specify mac-address per network"); hasEndpointSpecificMacAddress(networkingConfig) && err != nil {
		return response, err
	}

	if hostConfig != nil {
		if versions.LessThan(cli.ClientVersion(), "1.25") {
			// When using API 1.24 and under, the client is responsible for removing the container
			hostConfig.AutoRemove = false
		}
		if versions.GreaterThanOrEqualTo(cli.ClientVersion(), "1.42") || versions.LessThan(cli.ClientVersion(), "1.40") {
			// KernelMemory was added in API 1.40, and deprecated in API 1.42
			hostConfig.KernelMemory = 0
		}
		if platform != nil && platform.OS == "linux" && versions.LessThan(cli.ClientVersion(), "1.42") {
			// When using API under 1.42, the Linux daemon doesn't respect the ConsoleSize
			hostConfig.ConsoleSize = [2]uint{0, 0}
		}
	}

	// Since API 1.44, the container-wide MacAddress is deprecated and will trigger a WARNING if it's specified.
	if versions.GreaterThanOrEqualTo(cli.ClientVersion(), "1.44") {
		config.MacAddress = "" //nolint:staticcheck // ignore SA1019: field is deprecated, but still used on API < v1.44.
	}

	query := url.Values{}
	if p := formatPlatform(platform); p != "" {
		query.Set("platform", p)
	}

	if containerName != "" {
		query.Set("name", containerName)
	}

	body := configWrapper{
		Config:           config,
		HostConfig:       hostConfig,
		NetworkingConfig: networkingConfig,
	}

	serverResp, err := cli.post(ctx, "/containers/create", query, body, nil)
	defer ensureReaderClosed(serverResp)
	if err != nil {
		return response, err
	}

	err = json.NewDecoder(serverResp.body).Decode(&response)
	return response, err
}

(2)start容器

core/container_start.go

// StartContainer starts the container.
func (ds *dockerService) StartContainer(
	_ context.Context,
	r *v1.StartContainerRequest,
) (*v1.StartContainerResponse, error) {
	err := ds.client.StartContainer(r.ContainerId)

	// Create container log symlink for all containers (including failed ones).
	if linkError := ds.createContainerLogSymlink(r.ContainerId); linkError != nil {
		// Do not stop the container if we failed to create symlink because:
		//   1. This is not a critical failure.
		//   2. We don't have enough information to properly stop container here.
		// Kubelet will surface this error to user via an event.
		return nil, linkError
	}

	if err != nil {
		err = transformStartContainerError(err)
		return nil, fmt.Errorf("failed to start container %q: %v", r.ContainerId, err)
	}

	return &v1.StartContainerResponse{}, nil
}

libdocker/kube_docker_client.go

func (d *kubeDockerClient) StartContainer(id string) error {
	ctx, cancel := context.WithTimeout(context.Background(), d.timeout)
	defer cancel()
	err := d.client.ContainerStart(ctx, id, dockercontainer.StartOptions{})
	if ctxErr := contextError(ctx); ctxErr != nil {
		return ctxErr
	}
	return err
}

github.com/docker/docker/client/container_start.go

// ContainerStart sends a request to the docker daemon to start a container.
func (cli *Client) ContainerStart(ctx context.Context, containerID string, options container.StartOptions) error {
	query := url.Values{}
	if len(options.CheckpointID) != 0 {
		query.Set("checkpoint", options.CheckpointID)
	}
	if len(options.CheckpointDir) != 0 {
		query.Set("checkpoint-dir", options.CheckpointDir)
	}

	resp, err := cli.post(ctx, "/containers/"+containerID+"/start", query, nil, nil)
	ensureReaderClosed(resp)
	return err
}

moby原始碼(docker改名為moby):

(1)建立容器

api/server/router/container/container.go:

router.NewPostRoute("/containers/create", r.postContainersCreate),

api/server/router/container/container_routes.go:

func (s *containerRouter) postContainersCreate(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
	if err := httputils.ParseForm(r); err != nil {
		return err
	}
	if err := httputils.CheckForJSON(r); err != nil {
		return err
	}

	name := r.Form.Get("name")

	config, hostConfig, networkingConfig, err := s.decoder.DecodeConfig(r.Body)
	if err != nil {
		return err
	}
	version := httputils.VersionFromContext(ctx)
	adjustCPUShares := versions.LessThan(version, "1.19")

	// When using API 1.24 and under, the client is responsible for removing the container
	if hostConfig != nil && versions.LessThan(version, "1.25") {
		hostConfig.AutoRemove = false
	}

	if hostConfig != nil && versions.LessThan(version, "1.40") {
		// Ignore BindOptions.NonRecursive because it was added in API 1.40.
		for _, m := range hostConfig.Mounts {
			if bo := m.BindOptions; bo != nil {
				bo.NonRecursive = false
			}
		}
		// Ignore KernelMemoryTCP because it was added in API 1.40.
		hostConfig.KernelMemoryTCP = 0

		// Older clients (API < 1.40) expects the default to be shareable, make them happy
		if hostConfig.IpcMode.IsEmpty() {
			hostConfig.IpcMode = container.IPCModeShareable
		}
	}
	if hostConfig != nil && versions.LessThan(version, "1.41") && !s.cgroup2 {
		// Older clients expect the default to be "host" on cgroup v1 hosts
		if hostConfig.CgroupnsMode.IsEmpty() {
			hostConfig.CgroupnsMode = container.CgroupnsModeHost
		}
	}

	if hostConfig != nil && versions.LessThan(version, "1.42") {
		for _, m := range hostConfig.Mounts {
			// Ignore BindOptions.CreateMountpoint because it was added in API 1.42.
			if bo := m.BindOptions; bo != nil {
				bo.CreateMountpoint = false
			}

			// These combinations are invalid, but weren't validated in API < 1.42.
			// We reset them here, so that validation doesn't produce an error.
			if o := m.VolumeOptions; o != nil && m.Type != mount.TypeVolume {
				m.VolumeOptions = nil
			}
			if o := m.TmpfsOptions; o != nil && m.Type != mount.TypeTmpfs {
				m.TmpfsOptions = nil
			}
			if bo := m.BindOptions; bo != nil {
				// Ignore BindOptions.CreateMountpoint because it was added in API 1.42.
				bo.CreateMountpoint = false
			}
		}
	}

	if hostConfig != nil && versions.GreaterThanOrEqualTo(version, "1.42") {
		// Ignore KernelMemory removed in API 1.42.
		hostConfig.KernelMemory = 0
		for _, m := range hostConfig.Mounts {
			if o := m.VolumeOptions; o != nil && m.Type != mount.TypeVolume {
				return errdefs.InvalidParameter(fmt.Errorf("VolumeOptions must not be specified on mount type %q", m.Type))
			}
			if o := m.BindOptions; o != nil && m.Type != mount.TypeBind {
				return errdefs.InvalidParameter(fmt.Errorf("BindOptions must not be specified on mount type %q", m.Type))
			}
			if o := m.TmpfsOptions; o != nil && m.Type != mount.TypeTmpfs {
				return errdefs.InvalidParameter(fmt.Errorf("TmpfsOptions must not be specified on mount type %q", m.Type))
			}
		}
	}

	if hostConfig != nil && runtime.GOOS == "linux" && versions.LessThan(version, "1.42") {
		// ConsoleSize is not respected by Linux daemon before API 1.42
		hostConfig.ConsoleSize = [2]uint{0, 0}
	}

	var platform *specs.Platform
	if versions.GreaterThanOrEqualTo(version, "1.41") {
		if v := r.Form.Get("platform"); v != "" {
			p, err := platforms.Parse(v)
			if err != nil {
				return errdefs.InvalidParameter(err)
			}
			platform = &p
		}
	}

	if hostConfig != nil && hostConfig.PidsLimit != nil && *hostConfig.PidsLimit <= 0 {
		// Don't set a limit if either no limit was specified, or "unlimited" was
		// explicitly set.
		// Both `0` and `-1` are accepted as "unlimited", and historically any
		// negative value was accepted, so treat those as "unlimited" as well.
		hostConfig.PidsLimit = nil
	}

	ccr, err := s.backend.ContainerCreate(types.ContainerCreateConfig{
		Name:             name,
		Config:           config,
		HostConfig:       hostConfig,
		NetworkingConfig: networkingConfig,
		AdjustCPUShares:  adjustCPUShares,
		Platform:         platform,
	})
	if err != nil {
		return err
	}

	return httputils.WriteJSON(w, http.StatusCreated, ccr)
}

(2)start容器

api/server/router/container/container.go:

router.NewPostRoute("/containers/{name:.*}/start", r.postContainersStart),

daemon/start.go:

// containerStart prepares the container to run by setting up everything the
// container needs, such as storage and networking, as well as links
// between containers. The container is left waiting for a signal to
// begin running.
func (daemon *Daemon) containerStart(container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (err error) {
	start := time.Now()
	container.Lock()
	defer container.Unlock()

	if resetRestartManager && container.Running { // skip this check if already in restarting step and resetRestartManager==false
		return nil
	}

	if container.RemovalInProgress || container.Dead {
		return errdefs.Conflict(errors.New("container is marked for removal and cannot be started"))
	}

	if checkpointDir != "" {
		// TODO(mlaventure): how would we support that?
		return errdefs.Forbidden(errors.New("custom checkpointdir is not supported"))
	}

	// if we encounter an error during start we need to ensure that any other
	// setup has been cleaned up properly
	defer func() {
		if err != nil {
			container.SetError(err)
			// if no one else has set it, make sure we don't leave it at zero
			if container.ExitCode() == 0 {
				container.SetExitCode(128)
			}
			if err := container.CheckpointTo(daemon.containersReplica); err != nil {
				logrus.Errorf("%s: failed saving state on start failure: %v", container.ID, err)
			}
			container.Reset(false)

			daemon.Cleanup(container)
			// if containers AutoRemove flag is set, remove it after clean up
			if container.HostConfig.AutoRemove {
				container.Unlock()
				if err := daemon.ContainerRm(container.ID, &types.ContainerRmConfig{ForceRemove: true, RemoveVolume: true}); err != nil {
					logrus.Errorf("can't remove container %s: %v", container.ID, err)
				}
				container.Lock()
			}
		}
	}()

	if err := daemon.conditionalMountOnStart(container); err != nil {
		return err
	}

	if err := daemon.initializeNetworking(container); err != nil {
		return err
	}

	spec, err := daemon.createSpec(container)
	if err != nil {
		return errdefs.System(err)
	}

	if resetRestartManager {
		container.ResetRestartManager(true)
		container.HasBeenManuallyStopped = false
	}

	if err := daemon.saveAppArmorConfig(container); err != nil {
		return err
	}

	if checkpoint != "" {
		checkpointDir, err = getCheckpointDir(checkpointDir, checkpoint, container.Name, container.ID, container.CheckpointDir(), false)
		if err != nil {
			return err
		}
	}

	shim, createOptions, err := daemon.getLibcontainerdCreateOptions(container)
	if err != nil {
		return err
	}

	ctx := context.TODO()

	err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
	if err != nil {
		if errdefs.IsConflict(err) {
			logrus.WithError(err).WithField("container", container.ID).Error("Container not cleaned up from containerd from previous run")
			// best effort to clean up old container object
			daemon.containerd.DeleteTask(ctx, container.ID)
			if err := daemon.containerd.Delete(ctx, container.ID); err != nil && !errdefs.IsNotFound(err) {
				logrus.WithError(err).WithField("container", container.ID).Error("Error cleaning up stale containerd container object")
			}
			err = daemon.containerd.Create(ctx, container.ID, spec, shim, createOptions)
		}
		if err != nil {
			return translateContainerdStartErr(container.Path, container.SetExitCode, err)
		}
	}

	// TODO(mlaventure): we need to specify checkpoint options here
	pid, err := daemon.containerd.Start(context.Background(), container.ID, checkpointDir,
		container.StreamConfig.Stdin() != nil || container.Config.Tty,
		container.InitializeStdio)
	if err != nil {
		if err := daemon.containerd.Delete(context.Background(), container.ID); err != nil {
			logrus.WithError(err).WithField("container", container.ID).
				Error("failed to delete failed start container")
		}
		return translateContainerdStartErr(container.Path, container.SetExitCode, err)
	}

	container.HasBeenManuallyRestarted = false
	container.SetRunning(pid, true)
	container.HasBeenStartedBefore = true
	daemon.setStateCounter(container)

	daemon.initHealthMonitor(container)

	if err := container.CheckpointTo(daemon.containersReplica); err != nil {
		logrus.WithError(err).WithField("container", container.ID).
			Errorf("failed to store container")
	}

	daemon.LogContainerEvent(container, "start")
	containerActions.WithValues("start").UpdateSince(start)

	return nil
}

3.2 PreStop

  原始碼分析見《詳解Kubernetes Pod優雅退出》這篇博文,本文不再贅餘。

4、postStart 非同步執行

  在第三章節講述開始容器方法時,我們已經很清晰知道,開始容器方法先執行容器,再執行執行容器 postStart 鉤子函式

  需要注意的事,docker start 命令在啟動容器時,並不會等待 ENTRYPOINT 命令執行完畢後才返回。它只會等待容器的主程序啟動並確認其正常執行,然後立即返回。換句話說,docker start 命令返回時,ENTRYPOINT 命令已經開始執行,但不一定已經執行完畢。

  所以PostStart的執行相對於容器主程序的執行是非同步的,它會在容器start後立刻觸發,並不能保證PostStart鉤子在容器ENTRYPOINT之前執行。

4.1 關於postStart非同步執行測試

apiVersion: v1
kind: Pod
metadata:
  name: test-post-start
spec:
  containers:
  - name: test-post-start-container
    image: busybox
    command: ["/bin/sh", "-c", "sleep 5 && echo $(date) 'written by entrypoint' >> log.log && sleep 600"]
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "sleep 10 && echo $(date) 'written by post start' >> log.log"]

建立上面的pod,透過進入pod,檢視log.log列印日誌,證明:

  • PostStart是否會擋住主程序的啟動
  • PostStart是否是非同步執行

如果 PostStart 會阻擋 ENTRYPOINT 的啟動,則日誌檔案內容應該是:

(時間點 T)written by post start
(時間點 T + 約 10 秒)written by entrypoint

否則內容應該是:

(時間點 T)written by entrypoint
(時間點 T + 約 5 秒)written by post start
```log

實驗結果:

/ # cat log.log
Thu Jun 4 06:14:50 UTC 2020 written by entrypoint
Thu Jun 4 06:14:55 UTC 2020 written by post start

修改YAML檔案:

apiVersion: v1
kind: Pod
metadata:
  name: test-post-start
spec:
  containers:
  - name: test-post-start-container
    image: busybox
    command: ["/bin/sh", "-c", "sleep 15 && echo $(date) 'written by entrypoint' >> log.log && sleep 600"]
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "sleep 10 && echo $(date) 'written by post start' >> log.log"]

如果 PostStart 不是非同步執行,則日誌檔案內容應該是:

(時間點 T)written by entrypoint
(時間點 T + 約 5 秒)written by post start
```log

否則內容應該是:
```log
(時間點 T)written by post start
(時間點 T + 約 5 秒)written by entrypoint

實驗結果:

[root@master k8s]# kubectl exec -it test-post-start sh
/ # cat log.log
Thu Jun 4 06:17:54 UTC 2020 written by post start
Thu Jun 4 06:17:59 UTC 2020 written by entrypoint
/ # 

4.2 結論

  • PostStart不會擋住主程序的啟動
  • PostStart是非同步執行

5、使用場景

5.1 postStart

  • 資料庫連線:在容器啟動後,可以使用 postStart 鉤子來建立資料庫連線,並確保應用程式在啟動時可以正常訪問資料庫;
  • 檔案下載:容器啟動後,可以使用 postStart 鉤子從外部下載檔案,並將其放置在容器內部以供應用程式使用;
  • 啟動後臺任務:在容器啟動後,可以使用 postStart 鉤子來啟動或觸發後臺任務,例如資料同步、日誌收集等。

5.2 preStop

  • 優雅終止:在容器終止之前,可以使用 preStop 鉤子傳送訊號或通知給應用程式,讓應用程式優雅地處理未完成的請求或任務,並進行清理操作;
  • 資料持久化:在容器終止之前,可以使用 preStop 鉤子將容器中的資料持久化到外部儲存,以確保資料不會丟失;
  • 日誌上傳:在容器終止之前,可以使用 preStop 鉤子將容器中的日誌上傳到中央日誌系統,以便進一步分析和存檔。

6、總結

  容器生命週期鉤子(PostStart和PreStop)提供在容器啟動後和停止前執行自定義操作的能力,適用於資料庫連線、檔案下載、優雅終止等場景。postStart平時用的場景不是很多,重點關注preStop。

參考: https://blog.51cto.com/happywzy/2855934

相關文章