What makes Alibaba Cloud ECS unique

Background: It is known that when creating a Kubernetes cluster, the container network typically uses an independent private subnet to build the pod network and the service network within the Kubernetes cluster. However, in actual business scenarios, no company will do this within a given period of time. All internal services are migrated to the Kubernetes cluster (due to the business architecture and the reliability of the entire business). This leads to some scenarios where the internal services of the Kubernetes cluster and the external services of the cluster call each other. Of course, if it's an HTTP service we can use proxy tools like LVS, Nginx, and HAProxy to route traffic inside and outside the cluster. However, when it comes to a TCP service like the Dubbo framework, the manufacturer and consumer need to be directly connected. If manufacturer and consumer are not connected at the same time, they can be connected. Interoperable networks will be more problematic, so the first thing large manufacturers need to solve when using Kubernetes on a large scale is the network problem. For example, if we are into math we use Contiv, the BGP model realizes the connection between the container network and the external network, which normally requires a professional SDN team to build and maintain. As a startup company, a public cloud is usually used to run their own business. The advantage of this light asset model is that a professional team is available for protection in the lower area. Hence, we use Aliyun's Terway network plug-in to achieve internal goals Kubernetes cluster network.

Existing network plug-in

  • Flannel: Flannel is the CoreOS team's earliest open source network plug-in. It is used to allow containers created by different nodes in the cluster to have a globally unique network in the cluster (not known outside of the cluster). It's also a relatively mature solution among Kubernetes' current open source solutions. It supports HostGW and VXLAN mode
  • Calico: Calico is a pure 3-layer network solution for data centers that supports IPIP and BGP modes. The latter can seamlessly integrate an IaaS cloud architecture such as OpenStack and enable controllable IP communication between VMs, containers and bare metal systems if you need network devices to support BGP (the Alibaba Cloud vpc subnet should not support BGP) and support At the same time the iptables-based network policy control
  • Contiv: Contiv is the open source open source container network architecture from Cisco for heterogeneous container provisioning on virtual machines, bare metal, public cloud or private clouds. It can support Layer 2 and Layer 3 networks (and usually requires BGP support)
  • Terway: Terway is an open source VPC network based CNI plug-in from Alibaba Cloud. It supports VPC and ENI modes. The latter can implement the container network using the VPC subnet network

This is the centralized network solution that is more widely used in the current open source Kubernetes cluster. Our business needs also need to open up the network inside and outside the container. For reasons of cost, efficiency and stability, the Terway network solution from Alibaba Cloud is therefore preferred to meet our requirements. Kubernetes cluster requirements.

Create a Kubernetes cluster of a Terway network based on Alibaba Cloud ECS

Alibaba Cloud Container Service ACK Two networks are also supported by default, Flannel and Terway. The former is essentially the same as the open source plug-in, while the latter supports VPC mode and ENI mode. In VPC mode, container networks can use the switch subnet address in vpc, but by default they cannot be used with ecs under other switches. For host communication, the ENI mode assigns an elastic network card to the pod container group in order to establish the connection to the network outside the cluster. However, ENI mode under the Terway network requires some special models to support this.

byACKUnder Terway's ENI model requirements for models, we use the purchaseECSCreate a cluster test yourself with a nodeConnecting containers under the network.

  • The VPC subnet has been created
  • Create two virtual switches under the VPC subnet (simulating the Kubernetes cluster network and the ECS network)
  • Buy two ECS hosts in two subnets each (simulate the connection between ECS and container)

The official OS mirror verified by the Terway network plugin is, you need to be careful when buying Ecs

1. Use kubeadm to install the k8s single node cluster

Since you want to use the Terway network to connect the Pod and Ecs networks, you need to set the kernel parameters Set Everything to (The source address of the packet is not checked.)

2. Create a Terway network for the k8s cluster

The k8s cluster created with kubeadm is v1.16 and the relevant parts of DaemonSet need to be changed slightly in the official yaml file

At this point, we have completed the Kubernetes single-node cluster on the Terway network. Then we can try to get the pods in the k8s cluster to use the vpc network so that the container network in the k8s cluster can be parallel to the network of other ecs hosts. of.

3. Test the patio net

We're using the k8 single node cluster created by kubeadm, and kubeadm sets taint for the master node by default, so taint needs to be removed before testing

It can be seen that there is no problem using the Terway network within the cluster. However, we found that it is still inaccessible when other ECS hosts are accessing the pod network (since it defaults to Terway's VPC mode, which is actually similar to Calico mode. At this point, you have to use the eni mode, ie add the flexible eni network card to the k8s node, then the network traffic of the pod is transmitted evenly via the eni network card of the node node and can then be easily connected to the entire intranet VPC.

4. Test the ENI mode

Add to Nginx configuration above, that is, it should be noted that N represents the number of eni flexible NICs on the node node and the number depends on the limit of eni in different specifications of Alibaba Cloud ecs.

Alibaba Cloud ecs specification details

5. Test the connection of internal and external networks in the cluster

The k8s cluster uses the vpc network so standard cluster access to the external ECS network is not a problem by default. The main purpose of this is to test that the external ECS network can connect directly to the pod network for communication

At this point, if we check the NIC information on the node node, we can see that two additional NICs have been added

6. Other problems

As mentioned earlier, when you use the mode different ECS specifications can be bound elastic network cards are limited, which means the containers that can connect are limited. Let's check this out here.

It can be found that when using terway network, if in mode the elastic network card that the ecs can support reaches the limit, then k8s cannot schedule.

So the problem comes. Usually we would like to use k8s to elastically expand the capacity. We want to run more pods on the k8s node, but after using the Terway network we found that we created a pod that communicates with the ecs host outside of the k8s cluster, the amount is limited by eni , is that possible?

Don't worry, the answer from Alibaba Cloud is that in this case setting static routes on the vpc can realize intercommunication between multiple pods on the node node and the ecs host outside of the cluster. At this point in time, the eni on the ecs host corresponds to the entire container. We can actually insure the network exit here, since the eni mode is not needed in time after using terway and the pod network is also globally unique. At this point, some static routes can be added accordingly to realize the k8s container network throughout the vpc and outside the container. The ecs host network is interconnected which solves our initial problem very well.

Alibaba Cloud Container Service ACK's Terway network mode cluster creates some routing rules by default. So, if you use the ACK cluster, you just need to purchase the support, the nodes of the specifications and the containers created by default can all realize the connection and communication with the external ecs host. At this time, the elastic network card created on the ecs will serve as the network output of the k8s container on the node, and the ecs host itself will only exist as a management network, interested students can click Try to use Alibaba Cloud ACK network mode.

Read the full text, Welcome to follow my official account: BGBiao, make progress together ~