[K8S] 將失敗的 K8S Job 重建,讓它重新執行

[K8S] 將失敗的 K8S Job 重建,讓它重新執行

有時會發現 Kubernetes Job 失敗了,

可是平常是不能重跑 Job 的,

這樣要怎麼在修正錯誤後,重新跑 Job 呢?

 

試了一下,似乎只能建立一個新的 Job,或是取代原本的 Job…

舉例來說,下面是先將 Job 的內容輸出到檔案:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
kubectl -n my-app get jobs backend-sync-27892800 -o yaml > job.yaml
kubectl -n my-app get jobs backend-sync-27892800 -o yaml > job.yaml
kubectl -n my-app get jobs backend-sync-27892800 -o yaml > job.yaml

 

打開這個檔案,把裡面和 uid 有關的部分都刪除掉

(因為會影響 Job 的註冊,大概是因為不能有兩個 Job 有相同 UID 吧)~

以這個 yaml 為例,會刪掉的會有:

  • metadata.labels.controller-uoid
  • metadata.ownerReferences[0].uid
  • metadata.uid
  • spec.selector.matchLabels.controller-uid
  • spec.template.metadata.labels.controller-uid
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2023-01-13T00:00:00Z"
labels:
controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0
job-name: backend-sync-27892800
name: backend-sync-27892800
namespace: my-app
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: backend-sync
uid: 65786d95-15da-4e8e-8c64-c2e8de8d367f
resourceVersion: "3374934"
uid: db71ae05-224e-47ed-8e69-a931de5209e0
spec:
backoffLimit: 0
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0
template:
metadata:
creationTimestamp: null
labels:
controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0
job-name: backend-sync-27892800
spec:
containers:
- command:
- /bin/sh
- -c
- node server_sync.js
......
status:
conditions:
- lastProbeTime: "2023-01-13T00:00:26Z"
lastTransitionTime: "2023-01-13T00:00:26Z"
message: Job has reached the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
failed: 1
startTime: "2023-01-13T00:00:01Z"
apiVersion: batch/v1 kind: Job metadata: creationTimestamp: "2023-01-13T00:00:00Z" labels: controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0 job-name: backend-sync-27892800 name: backend-sync-27892800 namespace: my-app ownerReferences: - apiVersion: batch/v1 blockOwnerDeletion: true controller: true kind: CronJob name: backend-sync uid: 65786d95-15da-4e8e-8c64-c2e8de8d367f resourceVersion: "3374934" uid: db71ae05-224e-47ed-8e69-a931de5209e0 spec: backoffLimit: 0 completions: 1 parallelism: 1 selector: matchLabels: controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0 template: metadata: creationTimestamp: null labels: controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0 job-name: backend-sync-27892800 spec: containers: - command: - /bin/sh - -c - node server_sync.js ...... status: conditions: - lastProbeTime: "2023-01-13T00:00:26Z" lastTransitionTime: "2023-01-13T00:00:26Z" message: Job has reached the specified backoff limit reason: BackoffLimitExceeded status: "True" type: Failed failed: 1 startTime: "2023-01-13T00:00:01Z"
apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2023-01-13T00:00:00Z"
  labels:
    controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0
    job-name: backend-sync-27892800
  name: backend-sync-27892800
  namespace: my-app
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: backend-sync
    uid: 65786d95-15da-4e8e-8c64-c2e8de8d367f
  resourceVersion: "3374934"
  uid: db71ae05-224e-47ed-8e69-a931de5209e0
spec:
  backoffLimit: 0
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0
  template:
    metadata:
      creationTimestamp: null
      labels:
        controller-uid: db71ae05-224e-47ed-8e69-a931de5209e0
        job-name: backend-sync-27892800
    spec:
      containers:
      - command:
        - /bin/sh
        - -c
        - node server_sync.js
......
status:
  conditions:
  - lastProbeTime: "2023-01-13T00:00:26Z"
    lastTransitionTime: "2023-01-13T00:00:26Z"
    message: Job has reached the specified backoff limit
    reason: BackoffLimitExceeded
    status: "True"
    type: Failed
  failed: 1
  startTime: "2023-01-13T00:00:01Z"

 

將修改後的檔案儲存,再執行下面的指令,

強制用現在定義的這個 Job 取代現有的 Job:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
kubectl -n my-app replace --force -f job.yaml
kubectl -n my-app replace --force -f job.yaml
kubectl -n my-app replace --force -f job.yaml

 

這樣就能看到這個 Job 又再跑出一個 Pod 來執行一次囉~

 

參考資料:docker – Is it possible to rerun kubernetes job?

(本頁面已被瀏覽過 257 次)

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料