[K8S] 將失敗的 K8S Job 重建,讓它重新執行
有時會發現 Kubernetes Job 失敗了,
可是平常是不能重跑 Job 的,
這樣要怎麼在修正錯誤後,重新跑 Job 呢?
試了一下,似乎只能建立一個新的 Job,或是取代原本的 Job…
舉例來說,下面是先將 Job 的內容輸出到檔案:
kubectl -n my-app get jobs backend-sync-27892800 -o yaml > job.yaml
打開這個檔案,把裡面和 uid 有關的部分都刪除掉
(因為會影響 Job 的註冊,大概是因為不能有兩個 Job 有相同 UID 吧)~
以這個 yaml 為例,會刪掉的會有:
- metadata.labels.controller-uoid
- metadata.ownerReferences[0].uid
- metadata.uid
- spec.selector.matchLabels.controller-uid
- spec.template.metadata.labels.controller-uid
apiVersion: batch/v1 kind: Job metadata: creationTimestamp: "2023-01-13T00:00:00Z" labels: controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0 job-name: backend-sync-27892800 name: backend-sync-27892800 namespace: my-app ownerReferences: - apiVersion: batch/v1 blockOwnerDeletion: true controller: true kind: CronJob name: backend-sync uid: 65786d95-15da-4e8e-8c64-c2e8de8d367f resourceVersion: "3374934" uid: db71ae05-224e-47ed-8e69-a931de5209e0 spec: backoffLimit: 0 completions: 1 parallelism: 1 selector: matchLabels: controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0 template: metadata: creationTimestamp: null labels: controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0 job-name: backend-sync-27892800 spec: containers: - command: - /bin/sh - -c - node server_sync.js ...... status: conditions: - lastProbeTime: "2023-01-13T00:00:26Z" lastTransitionTime: "2023-01-13T00:00:26Z" message: Job has reached the specified backoff limit reason: BackoffLimitExceeded status: "True" type: Failed failed: 1 startTime: "2023-01-13T00:00:01Z"
將修改後的檔案儲存,再執行下面的指令,
強制用現在定義的這個 Job 取代現有的 Job:
kubectl -n my-app replace --force -f job.yaml
這樣就能看到這個 Job 又再跑出一個 Pod 來執行一次囉~
參考資料:docker – Is it possible to rerun kubernetes job?
(本頁面已被瀏覽過 236 次)