【星海出品】K8S调度器leader
发现K8S的技术资料越写越多,独立阐述一下K8S-Scheduler-leader
调度器通过Watch机制来发现集群中【新创建】且尚未被调度【unscheduled】到节点上的pod。
由于 Pod 中的容器和 Pod 本身可能有不同的要求,调度程序会过滤掉任何不满足 Pod 特定调度需求的节点。
在集群中找到一个 Pod 的所有可调度节点,然后根据一系列函数对这些可调度节点打分, 选出其中得分最高的节点来运行 Pod。
调度器将这个调度决定通知给 kube-apiserver,这个过程叫做绑定。
检查调度器是否正常
kubectl get pods -n kube-system | grep kube-scheduler
如果调度失败,可通过kubectl describe pod查看。
其基本信息已经存储在etcd中,需通过delete删除pod
kubectl delete pod
静态pod运行
/etc/kubernetes/manifests 目录下,例如 kube-scheduler.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:kubeconfig: /etc/kubernetes/kubeconfigqps: 100burst: 150
profiles:- schedulerName: default-schedulerplugins:postFilter:disabled:- name: DefaultPreemptionpreFilter:enabled:- name: CheckCSIStorageCapacityfilter:enabled:- name: CheckPodCountLimit- name: CheckPodLimitResources- name: CheckCSIStorageCapacity- name: LvmVolumeCapacitypluginConfig:- name: CheckPodCountLimitargs:podCountLimit: 2- name: CheckPodLimitResourcesargs:limitRatio:cpu: 0.7memory: 0.7
调度器默认不启用高可用,需要手动设置
# 启动命令示例
kube-scheduler \--leader-elect=true \--leader-elect-lease-duration=15s \--leader-elect-renew-deadline=10s \--leader-elect-retry-period=2s
自主编写K8S扩展调度器
https://github.com/kubernetes/kubernetes/tree/master/plugin/pkg/scheduler
package mainimport ("context""k8s.io/client-go/kubernetes""k8s.io/client-go/rest""k8s.io/client-go/tools/clientcmd""k8s.io/api/core/v1""k8s.io/apimachinery/pkg/api/resource"metav1 "k8s.io/apimachinery/pkg/apis/meta/v1""k8s.io/klog/v2"
)func main() {// 1. 配置Kubernetes客户端config, err := rest.InClusterConfig() // 集群内模式// config, err := clientcmd.BuildConfigFromFlags("", "/path/to/kubeconfig") // 集群外模式if err != nil {klog.Fatal(err)}clientset := kubernetes.NewForConfigOrDie(config)// 2. 定义调度逻辑(示例:按CPU资源调度)scheduler := NewCustomScheduler(clientset)scheduler.Schedule()
}type CustomScheduler struct {clientset *kubernetes.Clientset
}func NewCustomScheduler(clientset *kubernetes.Clientset) *CustomScheduler {return &CustomScheduler{clientset: clientset}
}func (s *CustomScheduler) Schedule() {// 3. 监听待调度PodpodList, err := s.clientset.CoreV1().Pods("").List(context.TODO(), metav1.ListOptions{FieldSelector: "spec.nodeName="", // 未调度的Pod})if err != nil {klog.Fatal(err)}for _, pod := range podList.Items {// 4. 筛选可用节点nodes, err := s.clientset.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})if err != nil {klog.Fatal(err)}var suitableNodes []v1.Nodefor _, node := range nodes.Items {if s.isNodeSuitable(pod, node) {suitableNodes = append(suitableNodes, node)}}// 5. 选择最优节点(示例:随机选择)if len(suitableNodes) > 0 {selectedNode := suitableNodes[0] // 实际场景需优化选择算法s.bindPodToNode(pod, selectedNode)}}
}func (s *CustomScheduler) isNodeSuitable(pod v1.Pod, node v1.Node) bool {// 示例:检查节点CPU资源是否满足Pod请求cpuRequest := pod.Spec.Containers[0].Resources.Requests.Cpu()if cpuRequest == nil {return true // 无CPU请求,默认允许}nodeCPU := node.Status.Allocatable.Cpu()if nodeCPU.Cmp(*cpuRequest) >= 0 {return true}return false
}func (s *CustomScheduler) bindPodToNode(pod v1.Pod, node v1.Node) {// 6. 将Pod绑定到选定节点binding := &v1.Binding{ObjectMeta: metav1.ObjectMeta{Name: pod.Name,Namespace: pod.Namespace,},Target: v1.ObjectReference{APIVersion: "v1",Kind: "Node",Name: node.Name,},}err := s.clientset.CoreV1().Pods(pod.Namespace).Bind(context.TODO(), binding, metav1.CreateOptions{})if err != nil {klog.Errorf("Failed to bind Pod %s to Node %s: %v", pod.Name, node.Name, err)} else {klog.Infof("Pod %s bound to Node %s", pod.Name, node.Name)}
}
# Dockerfile
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go mod tidy
RUN CGO_ENABLED=0 GOOS=linux go build -o custom-schedulerFROM alpine:latest
COPY --from=builder /app/custom-scheduler /usr/local/bin/
ENTRYPOINT ["custom-scheduler"]
https://www.qikqiak.com/k8strain/scheduler/overview/
构建镜像并推送至镜像仓库:
docker build -t your-dockerhub-id/custom-scheduler:v1 .
docker push your-dockerhub-id/custom-scheduler:v1
部署多用例
# custom-scheduler-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:name: custom-scheduler
spec:replicas: 3 # 部署3个实例selector:matchLabels:app: custom-schedulertemplate:metadata:labels:app: custom-schedulerspec:containers:- name: schedulerimage: your-dockerhub-id/custom-scheduler:v1args:- --leader-elect=true # 启用领导者选举- --leader-elect-lease-duration=15s- --leader-elect-renew-deadline=10s- --leader-elect-retry-period=2s- --v=2 # 日志级别resources:limits:cpu: "1"memory: 512Mirequests:cpu: "0.5"memory: 256Mi
为调度器创建ServiceAccount并绑定ClusterRole:
# custom-scheduler-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:name: custom-schedulernamespace: kube-system---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: custom-scheduler-role-binding
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: cluster-admin
subjects:
- kind: ServiceAccountname: custom-schedulernamespace: kube-system
将调度器注册为K8S的组件
# custom-scheduler-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:name: custom-scheduler-confignamespace: kube-system
data:scheduler-config.yaml: |apiVersion: kubescheduler.config.k8s.io/v1beta3kind: KubeSchedulerConfigurationleaderElection:leaderElect: trueresourceLock: leasesresourceName: custom-schedulerresourceNamespace: kube-system
kubectl get pods -n kube-system | grep custom-scheduler
创建pod
# custom-scheduled-pod.yaml
apiVersion: v1
kind: Pod
metadata:name: custom-scheduled-pod
spec:schedulerName: custom-scheduler # 指定使用自定义调度器containers:- name: nginximage: nginxresources:requests:cpu: "500m"
检查
kubectl get pod custom-scheduled-pod -o wide