This blog post describes how to deploy a TiDB cluster on AWS Elastic Kubernetes Service (EKS). TiDB on Kubernetes is the standard way to deploy TiDB on public clouds
Install AWS, kubectl & eksctl CLI’s
Install AWS CLI
MAC – Install and configure AWS CLI
# Download Binary curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg" # Install the binary sudo installer -pkg ./AWSCLIV2.pkg -target / # Verify the installation aws --version
Reference: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-mac.html
Windows 10 – Install and configure AWS CLI
- The AWS CLI version 2 is supported on Windows XP or later.
- The AWS CLI version 2 supports only 64-bit versions of Windows.
- Download Binary: https://awscli.amazonaws.com/AWSCLIV2.msi
- Install the downloaded binary (standard windows install)
# Verify the installation aws --version
Reference: https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-windows.html
Configure AWS Command Line using Security Credentials
- Go to AWS Management Console –> Services –> IAM
- Select the IAM User: <user>
- **Important Note:** Use only IAM user to generate **Security Credentials**. Never ever use Root User. (Highly not recommended)
- Click on **Security credentials** tab
- Click on **Create access key**
- Copy Access ID and Secret access key
- Go to command line and provide the required details
aws configure
Test if AWS CLI is working after configuring the above:
aws ec2 describe-vpcs
Install kubectl CLI
- Kubectl binaries for EKS please prefer to use from Amazon
- This will help us to get the exact Kubectl client version based on our EKS Cluster version. You can use the below documentation link to download the binary.
- Reference: https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html
MAC – Install and configure kubectl
# Download the Package mkdir kubectlbinary cd kubectlbinary curl -o kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/darwin/amd64/kubectl # Provide execute permissions chmod +x ./kubectl # Set the Path by copying to user Home Directory mkdir -p $HOME/bin && cp ./kubectl $HOME/bin/kubectl && export PATH=$PATH:$HOME/bin echo 'export PATH=$PATH:$HOME/bin' >> ~/.bash_profile # Verify the kubectl version kubectl version --short --client Output: Client Version: v1.16.8-eks-e16311
mkdir kubectlbinary cd kubectlbinary curl -o kubectl.exe https://amazon-eks.s3.us-west-2.amazonaws.com/1.16.8/2020-04-16/bin/windows/amd64/kubectl.exe ``` - Update the system **Path** environment variable ``` # Verify the kubectl client version kubectl version --short --client kubectl version --client
Install eksctl CLI
eksctl on Mac
# Install Homebrew on MacOs /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" # Install the Weaveworks Homebrew tap. brew tap weaveworks/tap # Install the Weaveworks Homebrew tap. brew install weaveworks/tap/eksctl # Verify eksctl version eksctl version
eksctl on windows or linux
- For windows and linux OS, you can refer below documentation link.
- Reference: https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html#installing-eksctl
Getting started with Amazon EKS – eksctl
https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html
TiDB Architecture
TiDB is designed to consist of multiple components. These components communicate with each other and form a complete TiDB system. The architecture is as follows:
TiDB server
The TiDB server is a stateless SQL layer that exposes the connection endpoint of the MySQL protocol to the outside. The TiDB server receives SQL requests, performs SQL parsing and optimization, and ultimately generates a distributed execution plan. It is horizontally scalable and provides the unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. It does not store data and is only for computing and SQL analyzing, transmitting actual data read request to TiKV nodes (or TiFlash nodes).
Placement Driver (PD) server
The PD server is the metadata managing component of the entire cluster. It stores metadata of real-time data distribution of every single TiKV node and the topology structure of the entire TiDB cluster, provides the TiDB Dashboard management UI, and allocates transaction IDs to distributed transactions. The PD server is “the brain” of the entire TiDB cluster because it not only stores metadata of the cluster, but also sends data scheduling command to specific TiKV nodes according to the data distribution state reported by TiKV nodes in real time. In addition, the PD server consists of three nodes at least and has high availability. It is recommended to deploy an odd number of PD nodes.
Storage servers
TiKV server
The TiKV server is responsible for storing data. TiKV is a distributed transactional key-value storage engine. Region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. Multiple Regions exist in each TiKV node. TiKV APIs provide native support to distributed transactions at the key-value pair level and supports the Snapshot Isolation level isolation by default. This is the core of how TiDB supports distributed transactions at the SQL level. After processing SQL statements, the TiDB server converts the SQL execution plan to an actual call to the TiKV API. Therefore, data is stored in TiKV. All the data in TiKV is automatically maintained in multiple replicas (three replicas by default), so TiKV has native high availability and supports automatic failover.
TiFlash server
The TiFlash Server is a special type of storage server. Unlike ordinary TiKV nodes, TiFlash stores data by column, mainly designed to accelerate analytical processing.
Create a EKS cluster and a node pool
It is recommended to create a node pool in each availability zone (at least 3 in total) for each component when creating an EKS.
References:
https://aws.amazon.com/blogs/containers/amazon-eks-cluster-multi-zone-auto-scaling-groups/
Save the following configuration as the cluster.yaml file. Replace ${clusterame} with your preferred cluster name, and specify your preferred region.
apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: ${clusterName} region: ap-northeast-1 nodeGroups: - name: admin desiredCapacity: 1 privateNetworking: true labels: dedicated: admin - name: tidb-1a desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] labels: dedicated: tidb taints: dedicated: tidb:NoSchedule - name: tidb-1d desiredCapacity: 0 privateNetworking: true availabilityZones: ["ap-northeast-1d"] labels: dedicated: tidb taints: dedicated: tidb:NoSchedule - name: tidb-1c desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1c"] labels: dedicated: tidb taints: dedicated: tidb:NoSchedule - name: pd-1a desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] labels: dedicated: pd taints: dedicated: pd:NoSchedule - name: pd-1d desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1d"] labels: dedicated: pd taints: dedicated: pd:NoSchedule - name: pd-1c desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1c"] labels: dedicated: pd taints: dedicated: pd:NoSchedule - name: tikv-1a desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] labels: dedicated: tikv taints: dedicated: tikv:NoSchedule - name: tikv-1d desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1d"] labels: dedicated: tikv taints: dedicated: tikv:NoSchedule - name: tikv-1c desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1c"] labels: dedicated: tikv taints: dedicated: tikv:NoSchedule
Create Cluster
eksctl create cluster -f cluster.yaml
Deploy TiDB Operator
This section describes how to deploy a TiDB Operator on AWS EKS.
Install Helm (Prerequisite)
MAC – Install Helm
brew install helm
Windows 10 – Install Helm
choco install kubernetes-helm
Create CRD
TiDB Operator uses Custom Resource Definition (CRD) to extend Kubernetes. Therefore, to use TiDB Operator, you must first create the TidbCluster CRD, which is a one-time job in your Kubernetes cluster.
Create a file called crd.yaml. Copy the configuration from the link below.
https://raw.githubusercontent.com/pingcap/tidb-operator/master/manifests/crd.yaml
Build the TiDBCluster CRD by executing the command below.
kubectl apply -f crd.yaml
Add the PingCAP repository
helm repo add pingcap https://charts.pingcap.org/
Expected output:
“pingcap” has been added to your repositories
Create a namespace for TiDB Operator
kubectl create namespace tidb-admin
Expected output:
namespace/tidb-admin created
Install TiDB Operator
helm install --namespace tidb-admin tidb-operator pingcap/tidb-operator --version v1.2.1
To confirm that the TiDB Operator components are running, execute the following command:
kubectl get pods --namespace tidb-admin -l app.kubernetes.io/instance=tidb-operator
Deploy a TiDB cluster and the Monitoring Component
This section describes how to deploy a TiDB cluster and its monitoring component in AWS EKS.
Create namespace
kubectl create namespace tidb-cluster
Note: A namespace is a virtual cluster backed by the same physical cluster. This document takes tidb-cluster as an example. If you want to use a different namespace, modify the corresponding arguments of -n or –namespace.
Deploy
Download the sample TidbCluster and TidbMonitor configuration files:
curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/aws/tidb-cluster.yaml && \ curl -O https://raw.githubusercontent.com/pingcap/tidb-operator/master/examples/aws/tidb-monitor.yaml
Execute the command below the deploy TiDB cluster and its monitoring component.
kubectl apply -f tidb-cluster.yaml -n tidb-cluster
kubectl apply -f tidb-monitor.yaml -n tidb-cluster
After the yaml file above is applied to the Kubernetes cluster, TiDB Operator creates the desired TiDB cluster and its monitoring component according to the yaml file.
Verify Cluster & Nodes
View cluster status
kubectl get pods -n tidb-cluster
When all the Pods are in the Running or Ready state, the TiDB cluster is successfully started.
List worker nodes
List Nodes in current kubernetes cluster
kubectl get nodes -o wide
Verify Cluster, NodeGroup in EKS Management Console
Go to Services -> Elastic Kubernetes Service -> ${clustername}
Verify Worker Node IAM Role and list of Policies
Go to Services -> EC2 -> Worker Nodes
Verify CloudFormation Stacks
Verify Control Plane Stack & Events
Verify NodeGroup Stack & Events
Below are the associated NodeGroup Events
Access the Database
You can access the TiDB database to test or develop your application after you have deployed a TiDB cluster.
Prepare a bastion host
The LoadBalancer created for your TiDB cluster is an intranet LoadBalancer. You can create a bastion host in the cluster VPC to access the database.
Select the cluster’s VPC and Subnet and verify whether the cluster name is correct in the dropdown box.
You can view the cluster’s VPC and Subnet by running the following command:
eksctl get cluster -n tidbcluster -r ap-northeast-1
Allow the bastion host to access the Internet. Select the correct key pair so that you can log in to the host via SSH.
Install the MySQL client and connect
sudo yum install mysql -y
Connect the client to the TiDB cluster
mysql -h ${tidb-nlb-dnsname} -P 4000 -u root
kubectl get svc basic-tidb -n tidb-cluster |
${tidb-nlb-dnsname} is the LoadBalancer domain name of the TiDB service. You can view the domain name in the EXTERNAL-IP field by executing kubectl get svc basic-tidb -n tidb-cluster.
kubectl get svc basic-tidb -n tidb-cluster
Check TiDB version
select tidb_version()\G
Create test table
use test; create table test_table (id int unsigned not null auto_increment primary key, v varchar(32)); select * from information_schema.tikv_region_status where db_name=database() and table_name='test_table'\G
select * from information_schema.tikv_store_status\G
Query the TiDB cluster information
select * from information_schema.cluster_info\G
Access the Grafana Monitoring Dashboard
Obtain the LoadBalancer domain name of Grafana
kubectl -n tidb-cluster get svc basic-grafana
In the output below, the EXTERNAL-IP column is the LoadBalancer domain name.
You can access the ${grafana-lb}:3000 address using your web browser to view monitoring metrics. Replace ${grafana-lb} with the LoadBalancer domain name.
Upgrade
To upgrade the TiDB cluster, edit the spec.version by executing the command below.
kubectl edit tc basic -n tidb-cluster
Scale out
Before scaling out the cluster, you need to scale out the corresponding node group so that the new instances have enough resources for operation.
This section describes how to scale out the EKS node group and TiDB components.
Scale out EKS node group
When scaling out TiKV, the node groups must be scaled out evenly among the different availability zones. The following example shows how to scale out the tikv-1a, tikv-1c, and tikv-1d groups of the ${clusterName} cluster to 2 nodes.
eksctl scale nodegroup --cluster ${clusterName} --name tikv-1a --nodes 2 --nodes-min 2 --nodes-max 2
Scale out TiDB components
After scaling out the EKS node group, execute kubectl edit tc basic -n tidb-cluster, and modify each component’s replicas to the desired number of replicas. The scaling-out process is then completed.
Deploy TiFlash/TiCDC
TiFlash is the columnar storage extension of TiKV.
TiCDC is a tool for replicating the incremental data of TiDB by pulling TiKV change logs.
In the configuration file of eksctl (cluster.yaml), add the following two items to add a node group for TiFlash/TiCDC respectively. desiredCapacity is the number of nodes you desire.
- name: tiflash-1a desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] labels: dedicated: tiflash taints: dedicated: tiflash:NoSchedule - name: tiflash-1d desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1d"] labels: dedicated: tiflash taints: dedicated: tiflash:NoSchedule - name: tiflash-1c desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1c"] labels: dedicated: tiflash taints: dedicated: tiflash:NoSchedule - name: ticdc-1a desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1a"] labels: dedicated: ticdc taints: dedicated: ticdc:NoSchedule - name: ticdc-1d desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1d"] labels: dedicated: ticdc taints: dedicated: ticdc:NoSchedule - name: ticdc-1c desiredCapacity: 1 privateNetworking: true availabilityZones: ["ap-northeast-1c"] labels: dedicated: ticdc taints: dedicated: ticdc:NoSchedule
Depending on the EKS cluster status, use different commands:
- If the cluster is not created, execute eksctl create cluster -f cluster.yaml to create the cluster and node groups.
- If the cluster is already created, execute eksctl create nodegroup -f cluster.yaml to create the node groups. The existing node groups are ignored and will not be created again.
Deploy TiFlash/TiCDC
To deploy TiFlash, configure spec.tiflash in tidb-cluster.yaml:
spec: ... tiflash: baseImage: pingcap/tiflash replicas: 1 storageClaims: - resources: requests: storage: 100Gi tolerations: - effect: NoSchedule key: dedicated operator: Equal value: tiflash
Deploy TiFlash/TiCDC
To deploy TiCDC, configure spec.ticdc in tidb-cluster.yaml
kubectl -n tidb-cluster apply -f tidb-cluster.yaml
Finally, execute kubectl -n tidb-cluster apply -f tidb-cluster.yaml to update the TiDB cluster configuration.
View Cluster Status
kubectl get pods -n tidb-cluster
Delete EKS Cluster & Node Groups
This section describes how to delete EKS cluster and Node Groups.
List EKS Clusters
eksctl get clusters -r ap-northeast-1
Delete Clusters
eksctl get clusters -r ap-northeast-1
Delete Clusters
eksctl delete cluster tidbcluster -r ap-northeast-1
OR;
eksctl delete cluster --region=ap-northeast-1 --name=tidbcluster
Cheers!
Leave a Reply