kubernetes/k8s CNI分析-容器網路介面分析

良凱爾發表於2021-08-22

關聯部落格:kubernetes/k8s CSI分析-容器儲存介面分析
kubernetes/k8s CRI分析-容器執行時介面分析

概述

kubernetes的設計初衷是支援可插拔架構,從而利於擴充套件kubernetes的功能。在此架構思想下,kubernetes提供了3個特定功能的介面,分別是容器網路介面CNI、容器執行時介面CRI和容器儲存介面CSI。kubernetes通過呼叫這幾個介面,來完成相應的功能。

下面我們來對容器執行時介面CNI來做一下介紹與分析。

CNI是什麼

CNI,全稱是 Container Network Interface,即容器網路介面。

CNI是K8s 中標準的呼叫網路實現的介面。Kubelet 通過這個標準的介面來呼叫不同的網路外掛以實現不同的網路配置方式。

CNI網路外掛是一個可執行檔案,是遵守容器網路介面(CNI)規範的網路外掛。常見的 CNI網路外掛包括 Calico、flannel、Terway、Weave Net等。

當kubelet選擇使用CNI型別的網路外掛時(通過kubelet啟動引數指定),kubelet在建立pod、刪除pod的時候,會呼叫CNI網路外掛來做pod的構建網路和銷燬網路等操作。

kubelet的網路外掛

kubelet的網路外掛有以下3種型別:
(1)CNI;
(2)kubenet;
(3)Noop,代表不配置網路外掛。

這裡主要對kubelet中CNI相關的原始碼進行分析。

CNI架構

kubelet建立/刪除pod時,會呼叫CRI,然後CRI會呼叫CNI來進行pod網路的構建/刪除。

kubelet構建pod網路的大致過程

(1)kubelet先通過CRI建立pause容器(pod sandbox),生成network namespace;
(2)kubelet根據啟動引數配置呼叫具體的網路外掛如CNI網路外掛;
(3)網路外掛給pause容器(pod sandbox)配置網路;
(4)pod 中其他的容器都與pause容器(pod sandbox)共享網路。

kubelet中cni相關的原始碼分析

kubelet的cni原始碼分析包括如下幾部分:
(1)cni相關啟動引數分析;
(2)關鍵struct/interface分析;
(3)cni初始化分析;
(4)cni構建pod網路分析;
(5)cni銷燬pod網路分析。

基於tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

1.kubelet元件cni相關啟動引數分析

kubelet元件cni相關啟動引數相關程式碼如下:

// pkg/kubelet/config/flags.go
func (s *ContainerRuntimeOptions) AddFlags(fs *pflag.FlagSet) {
    ...
    // Network plugin settings for Docker.
	fs.StringVar(&s.NetworkPluginName, "network-plugin", s.NetworkPluginName, fmt.Sprintf("<Warning: Alpha feature> The name of the network plugin to be invoked for various events in kubelet/pod lifecycle. %s", dockerOnlyWarning))
	fs.StringVar(&s.CNIConfDir, "cni-conf-dir", s.CNIConfDir, fmt.Sprintf("<Warning: Alpha feature> The full path of the directory in which to search for CNI config files. %s", dockerOnlyWarning))
	fs.StringVar(&s.CNIBinDir, "cni-bin-dir", s.CNIBinDir, fmt.Sprintf("<Warning: Alpha feature> A comma-separated list of full paths of directories in which to search for CNI plugin binaries. %s", dockerOnlyWarning))
	fs.StringVar(&s.CNICacheDir, "cni-cache-dir", s.CNICacheDir, fmt.Sprintf("<Warning: Alpha feature> The full path of the directory in which CNI should store cache files. %s", dockerOnlyWarning))
	fs.Int32Var(&s.NetworkPluginMTU, "network-plugin-mtu", s.NetworkPluginMTU, fmt.Sprintf("<Warning: Alpha feature> The MTU to be passed to the network plugin, to override the default. Set to 0 to use the default 1460 MTU. %s", dockerOnlyWarning))
    ...
}

cni相關啟動引數的預設值在NewContainerRuntimeOptions函式中設定。

// cmd/kubelet/app/options/container_runtime.go
// NewContainerRuntimeOptions will create a new ContainerRuntimeOptions with
// default values.
func NewContainerRuntimeOptions() *config.ContainerRuntimeOptions {
	dockerEndpoint := ""
	if runtime.GOOS != "windows" {
		dockerEndpoint = "unix:///var/run/docker.sock"
	}

	return &config.ContainerRuntimeOptions{
		ContainerRuntime:           kubetypes.DockerContainerRuntime,
		RedirectContainerStreaming: false,
		DockerEndpoint:             dockerEndpoint,
		DockershimRootDirectory:    "/var/lib/dockershim",
		PodSandboxImage:            defaultPodSandboxImage,
		ImagePullProgressDeadline:  metav1.Duration{Duration: 1 * time.Minute},
		ExperimentalDockershim:     false,

		//Alpha feature
		CNIBinDir:   "/opt/cni/bin",
		CNIConfDir:  "/etc/cni/net.d",
		CNICacheDir: "/var/lib/cni/cache",
	}
}

下面來簡單分析幾個比較重要的cni相關啟動引數:
(1)--network-plugin:指定要使用的網路外掛型別,可選值cnikubenet"",預設為空串,代表Noop,即不配置網路外掛(不構建pod網路)。此處配置值為cni時,即指定kubelet使用的網路外掛型別為cni

(2)--cni-conf-dir:CNI 配置檔案所在路徑。預設值:/etc/cni/net.d

(3)--cni-bin-dir:CNI 外掛的可執行檔案所在路徑,kubelet 將在此路徑中查詢 CNI 外掛的可執行檔案來執行pod的網路操作。預設值:/opt/cni/bin

2.關鍵struct/interface分析

interface NetworkPlugin

先來看下關鍵的interface:NetworkPlugin

NetworkPlugin interface宣告瞭kubelet網路外掛的一些操作方法,不同型別的網路外掛只需要實現這些方法即可,其中最關鍵的就是SetUpPodTearDownPod方法,作用分別是構建pod網路與銷燬pod網路,cniNetworkPlugin實現了該interface。

// pkg/kubelet/dockershim/network/plugins.go
// NetworkPlugin is an interface to network plugins for the kubelet
type NetworkPlugin interface {
	// Init initializes the plugin.  This will be called exactly once
	// before any other methods are called.
	Init(host Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) error

	// Called on various events like:
	// NET_PLUGIN_EVENT_POD_CIDR_CHANGE
	Event(name string, details map[string]interface{})

	// Name returns the plugin's name. This will be used when searching
	// for a plugin by name, e.g.
	Name() string

	// Returns a set of NET_PLUGIN_CAPABILITY_*
	Capabilities() utilsets.Int

	// SetUpPod is the method called after the infra container of
	// the pod has been created but before the other containers of the
	// pod are launched.
	SetUpPod(namespace string, name string, podSandboxID kubecontainer.ContainerID, annotations, options map[string]string) error

	// TearDownPod is the method called before a pod's infra container will be deleted
	TearDownPod(namespace string, name string, podSandboxID kubecontainer.ContainerID) error

	// GetPodNetworkStatus is the method called to obtain the ipv4 or ipv6 addresses of the container
	GetPodNetworkStatus(namespace string, name string, podSandboxID kubecontainer.ContainerID) (*PodNetworkStatus, error)

	// Status returns error if the network plugin is in error state
	Status() error
}

struct cniNetworkPlugin

cniNetworkPlugin struct實現了NetworkPlugin interface,實現了SetUpPodTearDownPod等方法。

// pkg/kubelet/dockershim/network/cni/cni.go
type cniNetworkPlugin struct {
	network.NoopNetworkPlugin

	loNetwork *cniNetwork

	sync.RWMutex
	defaultNetwork *cniNetwork

	host        network.Host
	execer      utilexec.Interface
	nsenterPath string
	confDir     string
	binDirs     []string
	cacheDir    string
	podCidr     string
}

struct PluginManager

struct PluginManager中的plugin屬性是interface NetworkPlugin型別,可以傳入具體的網路外掛實現,如cniNetworkPlugin struct

// pkg/kubelet/dockershim/network/plugins.go
// The PluginManager wraps a kubelet network plugin and provides synchronization
// for a given pod's network operations.  Each pod's setup/teardown/status operations
// are synchronized against each other, but network operations of other pods can
// proceed in parallel.
type PluginManager struct {
	// Network plugin being wrapped
	plugin NetworkPlugin

	// Pod list and lock
	podsLock sync.Mutex
	pods     map[string]*podLock
}

struct dockerService

struct dockerService其實在CRI分析的博文部分有做過詳細分析,可以去回顧一下,下面再簡單做一下介紹。

struct dockerService實現了CRI shim服務端的容器執行時介面以及容器映象介面,所以其代表了dockershim(kubelet內建的CRI shim)的服務端。

struct dockerService中的network屬性是struct PluginManager型別,在該結構體初始化時會將具體的網路外掛結構體如struct cniNetworkPlugin儲存進該屬性。

建立pod、刪除pod時會根據dockerService結構體的network屬性裡面儲存的具體的網路外掛結構體,去呼叫某個具體網路外掛(如cniNetworkPlugin)的SetUpPodTearDownPod方法來構建pod的網路、銷燬pod的網路。

// pkg/kubelet/dockershim/docker_service.go
type dockerService struct {
	client           libdocker.Interface
	os               kubecontainer.OSInterface
	podSandboxImage  string
	streamingRuntime *streamingRuntime
	streamingServer  streaming.Server

	network *network.PluginManager
	// Map of podSandboxID :: network-is-ready
	networkReady     map[string]bool
	networkReadyLock sync.Mutex

	containerManager cm.ContainerManager
	// cgroup driver used by Docker runtime.
	cgroupDriver      string
	checkpointManager checkpointmanager.CheckpointManager
	// caches the version of the runtime.
	// To be compatible with multiple docker versions, we need to perform
	// version checking for some operations. Use this cache to avoid querying
	// the docker daemon every time we need to do such checks.
	versionCache *cache.ObjectCache
	// startLocalStreamingServer indicates whether dockershim should start a
	// streaming server on localhost.
	startLocalStreamingServer bool

	// containerCleanupInfos maps container IDs to the `containerCleanupInfo` structs
	// needed to clean up after containers have been removed.
	// (see `applyPlatformSpecificDockerConfig` and `performPlatformSpecificContainerCleanup`
	// methods for more info).
	containerCleanupInfos map[string]*containerCleanupInfo
}

3.cni初始化分析

Kubelet 啟動過程中針對網路主要做以下步驟,分別是探針獲取當前環境的網路外掛以及初始化網路外掛(只有當容器執行時選擇為內建dockershim時,才會做CNI的初始化操作,將CNI初始化完成後交給dockershim使用)。

cni初始化的呼叫鏈:
main (cmd/kubelet/kubelet.go)
-> NewKubeletCommand (cmd/kubelet/app/server.go)
-> Run (cmd/kubelet/app/server.go)
-> run (cmd/kubelet/app/server.go)
-> RunKubelet (cmd/kubelet/app/server.go)
-> CreateAndInitKubelet(cmd/kubelet/app/server.go)
-> kubelet.NewMainKubelet(pkg/kubelet/kubelet.go)
-> cni.ProbeNetworkPlugins & network.InitNetworkPlugin(pkg/kubelet/network/plugins.go)

呼叫鏈很長,這裡直接進入關鍵的函式NewMainKubelet進行分析。

NewMainKubelet

NewMainKubelet函式中主要看到dockershim.NewDockerService呼叫。

// pkg/kubelet/kubelet.go
// NewMainKubelet instantiates a new Kubelet object along with all the required internal modules.
// No initialization of Kubelet and its modules should happen here.
func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration,...) {
    ...
    switch containerRuntime {
	case kubetypes.DockerContainerRuntime:
		// Create and start the CRI shim running as a grpc server.
		streamingConfig := getStreamingConfig(kubeCfg, kubeDeps, crOptions)
		ds, err := dockershim.NewDockerService(kubeDeps.DockerClientConfig, crOptions.PodSandboxImage, streamingConfig,
			&pluginSettings, runtimeCgroups, kubeCfg.CgroupDriver, crOptions.DockershimRootDirectory, !crOptions.RedirectContainerStreaming)
    ...
}

這裡對變數containerRuntime值等於docker時做分析,即kubelet啟動引數--container-runtime值為docker,這時kubelet會使用內建的CRI shimdockershim作為容器執行時,初始化並啟動dockershim

其中,呼叫dockershim.NewDockerService的作用是:新建並初始化dockershim服務端,包括初始化docker client、初始化cni網路配置等操作。

而其中CNI部分的主要邏輯為:
(1)呼叫cni.ProbeNetworkPlugins:根據kubelet啟動引數cni相關配置,獲取cni配置檔案、cni網路外掛可執行檔案等資訊,根據這些cni的相關資訊來初始化cniNetworkPlugin結構體並返回;
(2)呼叫network.InitNetworkPlugin:根據networkPluginName的值(對應kubelet啟動引數--network-plugin),選擇相應的網路外掛,呼叫其Init()方法,做網路外掛的初始化操作(初始化操作主要是起了一個goroutine,定時探測cni的配置檔案以及可執行檔案,讓其可以熱更新);
(3)將上面步驟中獲取到的cniNetworkPlugin結構體,賦值給dockerService structnetwork屬性,待後續建立pod、刪除pod時可以呼叫cniNetworkPluginSetUpPodTearDownPod方法來構建pod的網路、銷燬pod的網路。

kubelet對CNI的實現的主要程式碼:pkg/kubelet/network/cni/cni.go-SetUpPod/TearDownPod(構建Pod網路和銷燬Pod網路)

其中函式入參pluginSettings *NetworkPluginSettings的引數值,其實是從kubelet啟動引數配置而來,kubelet cni相關啟動引數在前面已經做了分析了,忘記的可以回頭看一下。

// pkg/kubelet/dockershim/docker_service.go
// NewDockerService creates a new `DockerService` struct.
// NOTE: Anything passed to DockerService should be eventually handled in another way when we switch to running the shim as a different process.
func NewDockerService(config *ClientConfig, podSandboxImage string, streamingConfig *streaming.Config, pluginSettings *NetworkPluginSettings,
	cgroupsName string, kubeCgroupDriver string, dockershimRootDir string, startLocalStreamingServer bool, noJsonLogPath string) (DockerService, error) {
    ...
    ds := &dockerService{
		client:          c,
		os:              kubecontainer.RealOS{},
		podSandboxImage: podSandboxImage,
		streamingRuntime: &streamingRuntime{
			client:      client,
			execHandler: &NativeExecHandler{},
		},
		containerManager:          cm.NewContainerManager(cgroupsName, client),
		checkpointManager:         checkpointManager,
		startLocalStreamingServer: startLocalStreamingServer,
		networkReady:              make(map[string]bool),
		containerCleanupInfos:     make(map[string]*containerCleanupInfo),
		noJsonLogPath:             noJsonLogPath,
	}
	...
    // dockershim currently only supports CNI plugins.
	pluginSettings.PluginBinDirs = cni.SplitDirs(pluginSettings.PluginBinDirString)
	// (1)根據kubelet啟動引數cni相關配置,獲取cni配置檔案、cni網路外掛可執行檔案等資訊,根據這些cni的相關資訊來初始化```cniNetworkPlugin```結構體並返回
	cniPlugins := cni.ProbeNetworkPlugins(pluginSettings.PluginConfDir, pluginSettings.PluginCacheDir, pluginSettings.PluginBinDirs)
	cniPlugins = append(cniPlugins, kubenet.NewPlugin(pluginSettings.PluginBinDirs, pluginSettings.PluginCacheDir))
	netHost := &dockerNetworkHost{
		&namespaceGetter{ds},
		&portMappingGetter{ds},
	}
	// (2)根據networkPluginName的值(對應kubelet啟動引數```--network-plugin```),選擇相應的網路外掛,呼叫其```Init()```方法,做網路外掛的初始化操作(初始化操作主要是起了一個goroutine,定時探測cni的配置檔案以及可執行檔案,讓其可以熱更新)
	plug, err := network.InitNetworkPlugin(cniPlugins, pluginSettings.PluginName, netHost, pluginSettings.HairpinMode, pluginSettings.NonMasqueradeCIDR, pluginSettings.MTU)
	if err != nil {
		return nil, fmt.Errorf("didn't find compatible CNI plugin with given settings %+v: %v", pluginSettings, err)
	}
	// (3)將上面步驟中獲取到的```cniNetworkPlugin```結構體,賦值給```dockerService struct```的```network```屬性,待後續建立pod、刪除pod時可以呼叫```cniNetworkPlugin```的```SetUpPod```、```TearDownPod```方法來構建pod的網路、銷燬pod的網路。  
	ds.network = network.NewPluginManager(plug)
	klog.Infof("Docker cri networking managed by %v", plug.Name())
    ...
}

先來看下pluginSettings長什麼樣,其實是struct NetworkPluginSettings,包含了網路外掛名稱、網路外掛可執行檔案所在目錄、網路外掛配置檔案所在目錄等屬性,程式碼如下:

// pkg/kubelet/dockershim/docker_service.go
type NetworkPluginSettings struct {
	// HairpinMode is best described by comments surrounding the kubelet arg
	HairpinMode kubeletconfig.HairpinMode
	// NonMasqueradeCIDR is the range of ips which should *not* be included
	// in any MASQUERADE rules applied by the plugin
	NonMasqueradeCIDR string
	// PluginName is the name of the plugin, runtime shim probes for
	PluginName string
	// PluginBinDirString is a list of directiores delimited by commas, in
	// which the binaries for the plugin with PluginName may be found.
	PluginBinDirString string
	// PluginBinDirs is an array of directories in which the binaries for
	// the plugin with PluginName may be found. The admin is responsible for
	// provisioning these binaries before-hand.
	PluginBinDirs []string
	// PluginConfDir is the directory in which the admin places a CNI conf.
	// Depending on the plugin, this may be an optional field, eg: kubenet
	// generates its own plugin conf.
	PluginConfDir string
	// PluginCacheDir is the directory in which CNI should store cache files.
	PluginCacheDir string
	// MTU is the desired MTU for network devices created by the plugin.
	MTU int
}

3.1 cni.ProbeNetworkPlugins

cni.ProbeNetworkPlugins中主要作用為:根據kubelet啟動引數cni相關配置,獲取cni配置檔案、cni網路外掛可執行檔案等資訊,根據這些cni的相關資訊來初始化cniNetworkPlugin結構體並返回。

其中看到plugin.syncNetworkConfig()呼叫,主要作用是給cniNetworkPlugin結構體的defaultNetwork屬性賦值。

// pkg/kubelet/dockershim/network/cni/cni.go
// ProbeNetworkPlugins : get the network plugin based on cni conf file and bin file
func ProbeNetworkPlugins(confDir, cacheDir string, binDirs []string) []network.NetworkPlugin {
	old := binDirs
	binDirs = make([]string, 0, len(binDirs))
	for _, dir := range old {
		if dir != "" {
			binDirs = append(binDirs, dir)
		}
	}

	plugin := &cniNetworkPlugin{
		defaultNetwork: nil,
		loNetwork:      getLoNetwork(binDirs),
		execer:         utilexec.New(),
		confDir:        confDir,
		binDirs:        binDirs,
		cacheDir:       cacheDir,
	}

	// sync NetworkConfig in best effort during probing.
	plugin.syncNetworkConfig()
	return []network.NetworkPlugin{plugin}
}
plugin.syncNetworkConfig()

主要邏輯:
(1)getDefaultCNINetwork():根據kubelet啟動引數配置,去對應的cni conf資料夾下尋找cni配置檔案,返回包含cni資訊的cniNetwork結構體;
(2)plugin.setDefaultNetwork():根據上一步獲取到的cniNetwork結構體,賦值給cniNetworkPlugin結構體的defaultNetwork屬性。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) syncNetworkConfig() {
	network, err := getDefaultCNINetwork(plugin.confDir, plugin.binDirs)
	if err != nil {
		klog.Warningf("Unable to update cni config: %s", err)
		return
	}
	plugin.setDefaultNetwork(network)
}
getDefaultCNINetwork()

主要邏輯:
(1)在cni配置檔案所在目錄下,可以識別3種cni配置檔案,分別是.conf, .conflist, .json

(2)呼叫sort.Strings()將cni配置檔案所在目錄下的所有cni配置檔案按照字典順序升序排序。

(3)只取第一個讀取到的cni配置檔案,然後直接return。所以就算在cni配置檔案目錄下配置了多個cni配置檔案,也只會有其中一個最終生效。

(4)呼叫cniConfig.ValidateNetworkList(),校驗cni可執行檔案目錄下是否存在對應的可執行檔案。

// pkg/kubelet/dockershim/network/cni/cni.go
func getDefaultCNINetwork(confDir string, binDirs []string) (*cniNetwork, error) {
	files, err := libcni.ConfFiles(confDir, []string{".conf", ".conflist", ".json"})
	switch {
	case err != nil:
		return nil, err
	case len(files) == 0:
		return nil, fmt.Errorf("no networks found in %s", confDir)
	}

	cniConfig := &libcni.CNIConfig{Path: binDirs}

	sort.Strings(files)
	for _, confFile := range files {
		var confList *libcni.NetworkConfigList
		if strings.HasSuffix(confFile, ".conflist") {
			confList, err = libcni.ConfListFromFile(confFile)
			if err != nil {
				klog.Warningf("Error loading CNI config list file %s: %v", confFile, err)
				continue
			}
		} else {
			conf, err := libcni.ConfFromFile(confFile)
			if err != nil {
				klog.Warningf("Error loading CNI config file %s: %v", confFile, err)
				continue
			}
			// Ensure the config has a "type" so we know what plugin to run.
			// Also catches the case where somebody put a conflist into a conf file.
			if conf.Network.Type == "" {
				klog.Warningf("Error loading CNI config file %s: no 'type'; perhaps this is a .conflist?", confFile)
				continue
			}

			confList, err = libcni.ConfListFromConf(conf)
			if err != nil {
				klog.Warningf("Error converting CNI config file %s to list: %v", confFile, err)
				continue
			}
		}
		if len(confList.Plugins) == 0 {
			klog.Warningf("CNI config list %s has no networks, skipping", string(confList.Bytes[:maxStringLengthInLog(len(confList.Bytes))]))
			continue
		}

		// Before using this CNI config, we have to validate it to make sure that
		// all plugins of this config exist on disk
		caps, err := cniConfig.ValidateNetworkList(context.TODO(), confList)
		if err != nil {
			klog.Warningf("Error validating CNI config list %s: %v", string(confList.Bytes[:maxStringLengthInLog(len(confList.Bytes))]), err)
			continue
		}

		klog.V(4).Infof("Using CNI configuration file %s", confFile)

		return &cniNetwork{
			name:          confList.Name,
			NetworkConfig: confList,
			CNIConfig:     cniConfig,
			Capabilities:  caps,
		}, nil
	}
	return nil, fmt.Errorf("no valid networks found in %s", confDir)
}
plugin.setDefaultNetwork

將上面獲取到的cniNetwork結構體賦值給cniNetworkPlugin結構體的defaultNetwork屬性。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) setDefaultNetwork(n *cniNetwork) {
	plugin.Lock()
	defer plugin.Unlock()
	plugin.defaultNetwork = n
}

3.2 network.InitNetworkPlugin

network.InitNetworkPlugin()主要作用:根據networkPluginName的值(對應kubelet啟動引數--network-plugin),選擇相應的網路外掛,呼叫其Init()方法,做網路外掛的初始化操作。

// pkg/kubelet/dockershim/network/plugins.go
// InitNetworkPlugin inits the plugin that matches networkPluginName. Plugins must have unique names.
func InitNetworkPlugin(plugins []NetworkPlugin, networkPluginName string, host Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) (NetworkPlugin, error) {
	if networkPluginName == "" {
		// default to the no_op plugin
		plug := &NoopNetworkPlugin{}
		plug.Sysctl = utilsysctl.New()
		if err := plug.Init(host, hairpinMode, nonMasqueradeCIDR, mtu); err != nil {
			return nil, err
		}
		return plug, nil
	}

	pluginMap := map[string]NetworkPlugin{}

	allErrs := []error{}
	for _, plugin := range plugins {
		name := plugin.Name()
		if errs := validation.IsQualifiedName(name); len(errs) != 0 {
			allErrs = append(allErrs, fmt.Errorf("network plugin has invalid name: %q: %s", name, strings.Join(errs, ";")))
			continue
		}

		if _, found := pluginMap[name]; found {
			allErrs = append(allErrs, fmt.Errorf("network plugin %q was registered more than once", name))
			continue
		}
		pluginMap[name] = plugin
	}

	chosenPlugin := pluginMap[networkPluginName]
	if chosenPlugin != nil {
		err := chosenPlugin.Init(host, hairpinMode, nonMasqueradeCIDR, mtu)
		if err != nil {
			allErrs = append(allErrs, fmt.Errorf("network plugin %q failed init: %v", networkPluginName, err))
		} else {
			klog.V(1).Infof("Loaded network plugin %q", networkPluginName)
		}
	} else {
		allErrs = append(allErrs, fmt.Errorf("network plugin %q not found", networkPluginName))
	}

	return chosenPlugin, utilerrors.NewAggregate(allErrs)
}
chosenPlugin.Init()

當kubelet啟動引數--network-plugin的值配置為cni時,會呼叫到cniNetworkPluginInit()方法,程式碼如下。

啟動一個goroutine,每隔5秒,呼叫一次plugin.syncNetworkConfig。再來回憶一下plugin.syncNetworkConfig()的作用:根據kubelet啟動引數配置,去對應的cni conf資料夾下尋找cni配置檔案,返回包含cni資訊的cniNetwork結構體,賦值給cniNetworkPlugin結構體的defaultNetwork屬性,從而達到cni conf以及bin更新後,kubelet也能感知並更新cniNetworkPlugin結構體的效果。

此處也可以看出該goroutine存在的意義,讓cni的配置檔案以及可執行檔案等可以熱更新,而無需重啟kubelet。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) Init(host network.Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) error {
	err := plugin.platformInit()
	if err != nil {
		return err
	}

	plugin.host = host

	plugin.syncNetworkConfig()

	// start a goroutine to sync network config from confDir periodically to detect network config updates in every 5 seconds
	go wait.Forever(plugin.syncNetworkConfig, defaultSyncConfigPeriod)

	return nil
}

plugin.platformInit()只是檢查了下是否有nsenter,沒有做其他操作。

// pkg/kubelet/dockershim/network/cni/cni_others.go
func (plugin *cniNetworkPlugin) platformInit() error {
	var err error
	plugin.nsenterPath, err = plugin.execer.LookPath("nsenter")
	if err != nil {
		return err
	}
	return nil
}

4.CNI構建pod網路分析

kubelet建立pod時,通過CRI建立並啟動pod sandbox,然後CRI會呼叫CNI網路外掛構建pod網路。

kubelet中CNI構建pod網路的方法是:pkg/kubelet/network/cni/cni.go-SetUpPod

其中SetUpPod方法的呼叫鏈如下(只列出了關鍵部分):
main (cmd/kubelet/kubelet.go)
...
-> klet.syncPod(pkg/kubelet/kubelet.go)
-> kl.containerRuntime.SyncPod(pkg/kubelet/kubelet.go)
-> m.createPodSandbox(pkg/kubelet/kuberuntime/kuberuntime_manager.go)
-> m.runtimeService.RunPodSandbox (pkg/kubelet/kuberuntime/kuberuntime_sandbox.go)
-> ds.network.SetUpPod(pkg/kubelet/dockershim/docker_sandbox.go)
-> pm.plugin.SetUpPod(pkg/kubelet/dockershim/network/plugins.go)
-> SetUpPod(pkg/kubelet/dockershim/network/cni/cni.go)

下面的程式碼只是列出來看一下關鍵方法cniNetworkPlugin.SetUpPod()的呼叫鏈,不做具體分析。

// pkg/kubelet/kuberuntime/kuberuntime_manager.go
func (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result kubecontainer.PodSyncResult) {
	...
	podSandboxID, msg, err = m.createPodSandbox(pod, podContainerChanges.Attempt)
	...
}
// pkg/kubelet/kuberuntime/kuberuntime_sandbox.go
// createPodSandbox creates a pod sandbox and returns (podSandBoxID, message, error).
func (m *kubeGenericRuntimeManager) createPodSandbox(pod *v1.Pod, attempt uint32) (string, string, error) {
    ...
    podSandBoxID, err := m.runtimeService.RunPodSandbox(podSandboxConfig, runtimeHandler)
    ...
}

RunPodSandbox方法中可以看到,是先建立pod sandbox,然後啟動pod sandbox,然後才是給該pod sandbox構建網路。

// pkg/kubelet/dockershim/docker_sandbox.go
func (ds *dockerService) RunPodSandbox(ctx context.Context, r *runtimeapi.RunPodSandboxRequest) (*runtimeapi.RunPodSandboxResponse, error) {
    ...
    createResp, err := ds.client.CreateContainer(*createConfig)
    ...
    err = ds.client.StartContainer(createResp.ID)
    ...
    err = ds.network.SetUpPod(config.GetMetadata().Namespace, config.GetMetadata().Name, cID, config.Annotations, networkOptions)
    ...
}

PluginManager.SetUpPod方法中可以看到,呼叫了pm.plugin.SetUpPod,前面介紹cni初始化的時候講過相關賦值初始化操作,這裡會呼叫到cniNetworkPluginSetUpPod方法。

// pkg/kubelet/dockershim/network/plugins.go
func (pm *PluginManager) SetUpPod(podNamespace, podName string, id kubecontainer.ContainerID, annotations, options map[string]string) error {
	defer recordOperation("set_up_pod", time.Now())
	fullPodName := kubecontainer.BuildPodFullName(podName, podNamespace)
	pm.podLock(fullPodName).Lock()
	defer pm.podUnlock(fullPodName)

	klog.V(3).Infof("Calling network plugin %s to set up pod %q", pm.plugin.Name(), fullPodName)
	if err := pm.plugin.SetUpPod(podNamespace, podName, id, annotations, options); err != nil {
		return fmt.Errorf("networkPlugin %s failed to set up pod %q network: %v", pm.plugin.Name(), fullPodName, err)
	}

	return nil
}

cniNetworkPlugin.SetUpPod

cniNetworkPlugin.SetUpPod方法作用cni網路外掛構建pod網路的呼叫入口。其主要邏輯為:
(1)呼叫plugin.checkInitialized():檢查網路外掛是否已經初始化完成;
(2)呼叫plugin.host.GetNetNS():獲取容器網路名稱空間路徑,格式/proc/${容器PID}/ns/net
(3)呼叫context.WithTimeout():設定呼叫cni網路外掛的超時時間;
(3)呼叫plugin.addToNetwork():如果是linux環境,則呼叫cni網路外掛,給pod構建迴環網路;
(4)呼叫plugin.addToNetwork():呼叫cni網路外掛,給pod構建預設網路。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) SetUpPod(namespace string, name string, id kubecontainer.ContainerID, annotations, options map[string]string) error {
	if err := plugin.checkInitialized(); err != nil {
		return err
	}
	netnsPath, err := plugin.host.GetNetNS(id.ID)
	if err != nil {
		return fmt.Errorf("CNI failed to retrieve network namespace path: %v", err)
	}

	// Todo get the timeout from parent ctx
	cniTimeoutCtx, cancelFunc := context.WithTimeout(context.Background(), network.CNITimeoutSec*time.Second)
	defer cancelFunc()
	// Windows doesn't have loNetwork. It comes only with Linux
	if plugin.loNetwork != nil {
		if _, err = plugin.addToNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, annotations, options); err != nil {
			return err
		}
	}

	_, err = plugin.addToNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, annotations, options)
	return err
}
plugin.addToNetwork

plugin.addToNetwork方法的作用就是呼叫cni網路外掛,給pod構建指定型別的網路,其主要邏輯為:
(1)呼叫plugin.buildCNIRuntimeConf():構建呼叫cni網路外掛的配置;
(2)呼叫cniNet.AddNetworkList():呼叫cni網路外掛,進行網路構建。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) addToNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations, options map[string]string) (cnitypes.Result, error) {
	rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, options)
	if err != nil {
		klog.Errorf("Error adding network when building cni runtime conf: %v", err)
		return nil, err
	}

	pdesc := podDesc(podNamespace, podName, podSandboxID)
	netConf, cniNet := network.NetworkConfig, network.CNIConfig
	klog.V(4).Infof("Adding %s to network %s/%s netns %q", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, podNetnsPath)
	res, err := cniNet.AddNetworkList(ctx, netConf, rt)
	if err != nil {
		klog.Errorf("Error adding %s to network %s/%s: %v", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, err)
		return nil, err
	}
	klog.V(4).Infof("Added %s to network %s: %v", pdesc, netConf.Name, res)
	return res, nil
}
cniNet.AddNetworkList

AddNetworkList方法中主要是呼叫了addNetwork方法,所以來看下addNetwork方法的邏輯:
(1)呼叫c.exec.FindInPath():拼接出cni網路外掛可執行檔案的絕對路徑;
(2)呼叫buildOneConfig():構建配置;
(3)呼叫c.args():構建呼叫cni網路外掛的引數;
(4)呼叫invoke.ExecPluginWithResult():呼叫cni網路外掛進行pod網路的構建操作。

// vendor/github.com/containernetworking/cni/libcni/api.go 
func (c *CNIConfig) AddNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) (types.Result, error) {
	var err error
	var result types.Result
	for _, net := range list.Plugins {
		result, err = c.addNetwork(ctx, list.Name, list.CNIVersion, net, result, rt)
		if err != nil {
			return nil, err
		}
	}

	if err = setCachedResult(result, list.Name, rt); err != nil {
		return nil, fmt.Errorf("failed to set network %q cached result: %v", list.Name, err)
	}

	return result, nil
}

func (c *CNIConfig) addNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) (types.Result, error) {
	c.ensureExec()
	pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)
	if err != nil {
		return nil, err
	}

	newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)
	if err != nil {
		return nil, err
	}

	return invoke.ExecPluginWithResult(ctx, pluginPath, newConf.Bytes, c.args("ADD", rt), c.exec)
}
c.args

c.args方法作用是構建呼叫cni網路外掛可執行檔案時的引數。

從程式碼中可以看出,引數有Command(命令,Add代表構建網路,Del代表銷燬網路)、ContainerID(容器ID)、NetNS(容器網路名稱空間路徑)、IfName(Interface Name即網路介面名稱)、PluginArgs(其他引數如pod名稱、pod名稱空間等)等。

// vendor/github.com/containernetworking/cni/libcni/api.go
func (c *CNIConfig) args(action string, rt *RuntimeConf) *invoke.Args {
	return &invoke.Args{
		Command:     action,
		ContainerID: rt.ContainerID,
		NetNS:       rt.NetNS,
		PluginArgs:  rt.Args,
		IfName:      rt.IfName,
		Path:        strings.Join(c.Path, string(os.PathListSeparator)),
	}
}
invoke.ExecPluginWithResult

invoke.ExecPluginWithResult主要是將呼叫引數變成env,然後呼叫cni網路外掛可執行檔案,並獲取返回結果。

func ExecPluginWithResult(ctx context.Context, pluginPath string, netconf []byte, args CNIArgs, exec Exec) (types.Result, error) {
	if exec == nil {
		exec = defaultExec
	}

	stdoutBytes, err := exec.ExecPlugin(ctx, pluginPath, netconf, args.AsEnv())
	if err != nil {
		return nil, err
	}

	// Plugin must return result in same version as specified in netconf
	versionDecoder := &version.ConfigDecoder{}
	confVersion, err := versionDecoder.Decode(netconf)
	if err != nil {
		return nil, err
	}

	return version.NewResult(confVersion, stdoutBytes)
}

5.CNI銷燬pod網路分析

kubelet刪除pod時,CRI會呼叫CNI網路外掛銷燬pod網路。

kubelet中CNI銷燬pod網路的方法是:pkg/kubelet/network/cni/cni.go-TearDownPod

其中TearDownPod方法的呼叫鏈如下(只列出了關鍵部分):
main (cmd/kubelet/kubelet.go)
...
-> m.runtimeService.StopPodSandbox (pkg/kubelet/kuberuntime/kuberuntime_sandbox.go)
-> ds.network.TearDownPod(pkg/kubelet/dockershim/docker_sandbox.go)
-> pm.plugin.TearDownPod(pkg/kubelet/dockershim/network/plugins.go)
-> TearDownPod(pkg/kubelet/dockershim/network/cni/cni.go)

下面的程式碼只是列出來看一下關鍵方法cniNetworkPlugin.TearDownPod()的呼叫鏈,不做具體分析。

StopPodSandbox方法中可以看到,會先銷燬pod網路,然後停止pod sandbox的執行,但是這兩個操作中的任何一個發生錯誤,kubelet都會繼續進行重試,直到成功為止,所以對這兩個操作成功的順序並沒有嚴格的要求(刪除pod sandbox的操作由kubelet gc去完成)。

// pkg/kubelet/dockershim/docker_sandbox.go
func (ds *dockerService) StopPodSandbox(ctx context.Context, r *runtimeapi.StopPodSandboxRequest) (*runtimeapi.StopPodSandboxResponse, error) {
    ...
    // WARNING: The following operations made the following assumption:
	// 1. kubelet will retry on any error returned by StopPodSandbox.
	// 2. tearing down network and stopping sandbox container can succeed in any sequence.
	// This depends on the implementation detail of network plugin and proper error handling.
	// For kubenet, if tearing down network failed and sandbox container is stopped, kubelet
	// will retry. On retry, kubenet will not be able to retrieve network namespace of the sandbox
	// since it is stopped. With empty network namespcae, CNI bridge plugin will conduct best
	// effort clean up and will not return error.
	errList := []error{}
	ready, ok := ds.getNetworkReady(podSandboxID)
	if !hostNetwork && (ready || !ok) {
		// Only tear down the pod network if we haven't done so already
		cID := kubecontainer.BuildContainerID(runtimeName, podSandboxID)
		err := ds.network.TearDownPod(namespace, name, cID)
		if err == nil {
			ds.setNetworkReady(podSandboxID, false)
		} else {
			errList = append(errList, err)
		}
	}
	if err := ds.client.StopContainer(podSandboxID, defaultSandboxGracePeriod); err != nil {
		// Do not return error if the container does not exist
		if !libdocker.IsContainerNotFoundError(err) {
			klog.Errorf("Failed to stop sandbox %q: %v", podSandboxID, err)
			errList = append(errList, err)
		} else {
			// remove the checkpoint for any sandbox that is not found in the runtime
			ds.checkpointManager.RemoveCheckpoint(podSandboxID)
		}
	}
    ...
}

PluginManager.TearDownPod方法中可以看到,呼叫了pm.plugin.TearDownPod,前面介紹cni初始化的時候講過相關賦值初始化操作,這裡會呼叫到cniNetworkPluginTearDownPod方法。

// pkg/kubelet/dockershim/network/plugins.go
func (pm *PluginManager) TearDownPod(podNamespace, podName string, id kubecontainer.ContainerID) error {
	defer recordOperation("tear_down_pod", time.Now())
	fullPodName := kubecontainer.BuildPodFullName(podName, podNamespace)
	pm.podLock(fullPodName).Lock()
	defer pm.podUnlock(fullPodName)

	klog.V(3).Infof("Calling network plugin %s to tear down pod %q", pm.plugin.Name(), fullPodName)
	if err := pm.plugin.TearDownPod(podNamespace, podName, id); err != nil {
		return fmt.Errorf("networkPlugin %s failed to teardown pod %q network: %v", pm.plugin.Name(), fullPodName, err)
	}

	return nil
}

cniNetworkPlugin.TearDownPod

cniNetworkPlugin.TearDownPod方法作用cni網路外掛銷燬pod網路的呼叫入口。其主要邏輯為:
(1)呼叫plugin.checkInitialized():檢查網路外掛是否已經初始化完成;
(2)呼叫plugin.host.GetNetNS():獲取容器網路名稱空間路徑,格式/proc/${容器PID}/ns/net
(3)呼叫context.WithTimeout():設定呼叫cni網路外掛的超時時間;
(3)呼叫plugin.deleteFromNetwork():如果是linux環境,則呼叫cni網路外掛,銷燬pod的迴環網路;
(4)呼叫plugin.deleteFromNetwork():呼叫cni網路外掛,銷燬pod的預設網路。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) TearDownPod(namespace string, name string, id kubecontainer.ContainerID) error {
	if err := plugin.checkInitialized(); err != nil {
		return err
	}

	// Lack of namespace should not be fatal on teardown
	netnsPath, err := plugin.host.GetNetNS(id.ID)
	if err != nil {
		klog.Warningf("CNI failed to retrieve network namespace path: %v", err)
	}

	// Todo get the timeout from parent ctx
	cniTimeoutCtx, cancelFunc := context.WithTimeout(context.Background(), network.CNITimeoutSec*time.Second)
	defer cancelFunc()
	// Windows doesn't have loNetwork. It comes only with Linux
	if plugin.loNetwork != nil {
		// Loopback network deletion failure should not be fatal on teardown
		if err := plugin.deleteFromNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, nil); err != nil {
			klog.Warningf("CNI failed to delete loopback network: %v", err)
		}
	}

	return plugin.deleteFromNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, nil)
}
plugin.deleteFromNetwork

plugin.deleteFromNetwork方法的作用就是呼叫cni網路外掛,銷燬pod指定型別的網路,其主要邏輯為:
(1)呼叫plugin.buildCNIRuntimeConf():構建呼叫cni網路外掛的配置;
(2)呼叫cniNet.DelNetworkList():呼叫cni網路外掛,進行pod網路銷燬。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) deleteFromNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations map[string]string) error {
	rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, nil)
	if err != nil {
		klog.Errorf("Error deleting network when building cni runtime conf: %v", err)
		return err
	}

	pdesc := podDesc(podNamespace, podName, podSandboxID)
	netConf, cniNet := network.NetworkConfig, network.CNIConfig
	klog.V(4).Infof("Deleting %s from network %s/%s netns %q", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, podNetnsPath)
	err = cniNet.DelNetworkList(ctx, netConf, rt)
	// The pod may not get deleted successfully at the first time.
	// Ignore "no such file or directory" error in case the network has already been deleted in previous attempts.
	if err != nil && !strings.Contains(err.Error(), "no such file or directory") {
		klog.Errorf("Error deleting %s from network %s/%s: %v", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, err)
		return err
	}
	klog.V(4).Infof("Deleted %s from network %s/%s", pdesc, netConf.Plugins[0].Network.Type, netConf.Name)
	return nil
}
cniNet.DelNetworkList

DelNetworkList方法中主要是呼叫了addNetwork方法,所以來看下addNetwork方法的邏輯:
(1)呼叫c.exec.FindInPath():拼接出cni網路外掛可執行檔案的絕對路徑;
(2)呼叫buildOneConfig():構建配置;
(3)呼叫c.args():構建呼叫cni網路外掛的引數;
(4)呼叫invoke.ExecPluginWithResult():呼叫cni網路外掛進行pod網路的銷燬操作。

// vendor/github.com/containernetworking/cni/libcni/api.go 
// DelNetworkList executes a sequence of plugins with the DEL command
func (c *CNIConfig) DelNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) error {
	var cachedResult types.Result

	// Cached result on DEL was added in CNI spec version 0.4.0 and higher
	if gtet, err := version.GreaterThanOrEqualTo(list.CNIVersion, "0.4.0"); err != nil {
		return err
	} else if gtet {
		cachedResult, err = getCachedResult(list.Name, list.CNIVersion, rt)
		if err != nil {
			return fmt.Errorf("failed to get network %q cached result: %v", list.Name, err)
		}
	}

	for i := len(list.Plugins) - 1; i >= 0; i-- {
		net := list.Plugins[i]
		if err := c.delNetwork(ctx, list.Name, list.CNIVersion, net, cachedResult, rt); err != nil {
			return err
		}
	}
	_ = delCachedResult(list.Name, rt)

	return nil
}

func (c *CNIConfig) delNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) error {
	c.ensureExec()
	pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)
	if err != nil {
		return err
	}

	newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)
	if err != nil {
		return err
	}

	return invoke.ExecPluginWithoutResult(ctx, pluginPath, newConf.Bytes, c.args("DEL", rt), c.exec)
}
c.args

c.args方法作用是構建呼叫cni網路外掛可執行檔案時的引數。

從程式碼中可以看出,引數有Command(命令,Add代表構建網路,Del代表銷燬網路)、ContainerID(容器ID)、NetNS(容器網路名稱空間路徑)、IfName(Interface Name即網路介面名稱)、PluginArgs(其他引數如pod名稱、pod名稱空間等)等。

// vendor/github.com/containernetworking/cni/libcni/api.go
func (c *CNIConfig) args(action string, rt *RuntimeConf) *invoke.Args {
	return &invoke.Args{
		Command:     action,
		ContainerID: rt.ContainerID,
		NetNS:       rt.NetNS,
		PluginArgs:  rt.Args,
		IfName:      rt.IfName,
		Path:        strings.Join(c.Path, string(os.PathListSeparator)),
	}
}
invoke.ExecPluginWithResult

invoke.ExecPluginWithResult主要是將呼叫引數變成env,然後呼叫cni網路外掛可執行檔案,並獲取返回結果。

func ExecPluginWithResult(ctx context.Context, pluginPath string, netconf []byte, args CNIArgs, exec Exec) (types.Result, error) {
	if exec == nil {
		exec = defaultExec
	}

	stdoutBytes, err := exec.ExecPlugin(ctx, pluginPath, netconf, args.AsEnv())
	if err != nil {
		return nil, err
	}

	// Plugin must return result in same version as specified in netconf
	versionDecoder := &version.ConfigDecoder{}
	confVersion, err := versionDecoder.Decode(netconf)
	if err != nil {
		return nil, err
	}

	return version.NewResult(confVersion, stdoutBytes)
}

總結

CNI

CNI,全稱是 Container Network Interface,即容器網路介面。

CNI是K8s 中標準的呼叫網路實現的介面。Kubelet 通過這個標準的介面來呼叫不同的網路外掛以實現不同的網路配置方式。

CNI網路外掛是一個可執行檔案,是遵守容器網路介面(CNI)規範的網路外掛。常見的 CNI網路外掛包括 Calico、flannel、Terway、Weave Net等。

當kubelet選擇使用CNI型別的網路外掛時(通過kubelet啟動引數指定),kubelet在建立pod、刪除pod的時候,通過CRI呼叫CNI網路外掛來做pod的構建網路和銷燬網路等操作。

kubelet構建pod網路的大致過程

(1)kubelet先通過CRI建立pause容器(pod sandbox),生成network namespace;
(2)kubelet根據啟動引數配置呼叫具體的網路外掛如CNI網路外掛;
(3)網路外掛給pause容器(pod sandbox)配置網路;
(4)pod 中其他的容器都與pause容器(pod sandbox)共享網路。

kubelet元件CNI相關啟動引數分析

(1)--network-plugin:指定要使用的網路外掛型別,可選值cnikubenet"",預設為空串,代表Noop,即不配置網路外掛(不構建pod網路)。此處配置值為cni時,即指定kubelet使用的網路外掛型別為cni

(2)--cni-conf-dir:CNI 配置檔案所在路徑。預設值:/etc/cni/net.d

(3)--cni-bin-dir:CNI 外掛的可執行檔案所在路徑,kubelet 將在此路徑中查詢 CNI 外掛的可執行檔案來執行pod的網路操作。預設值:/opt/cni/bin

kubelet中的CNI初始化

kubelet啟動後,會根據啟動引數中cni的相關引數,獲取cni配置檔案並初始化cni網路外掛,待後續建立pod、刪除pod時會呼叫SetUpPodTearDownPod方法來構建pod的網路、銷燬pod的網路。同時,初始化時起了一個goroutine,定時探測cni的配置檔案以及可執行檔案,讓其可以熱更新。

CNI構建pod網路

kubelet建立pod時,通過CRI建立並啟動pod sandbox,然後CRI會呼叫CNI網路外掛構建pod網路。

kubelet中CNI構建pod網路的程式碼方法是:pkg/kubelet/network/cni/cni.go-SetUpPod

CNI銷燬pod網路

kubelet刪除pod時,CRI會呼叫CNI網路外掛銷燬pod網路。

kubelet中CNI銷燬pod網路的方法是:pkg/kubelet/network/cni/cni.go-TearDownPod

相關文章