Documentation

Documentation

Batch Framework

New in version MinIO: RELEASE.2022-10-08T20-11-00Z

The Batch Framework was introduced with the replicate job type in the mc RELEASES.2022-10-08T20-11-00Z.

Overview

The MinIO Batch Framework allows you to create, manage, monitor, and execute jobs using a YAML-formatted job definition file (a “batch file”). The batch jobs run directly on the MinIO deployment to take advantage of the server-side processing power without constraints of the local machine where you run the MinIO Client .

A batch file defines one job task.

Once started, MinIO starts processing the job. Time to completion depends on the resources available to the deployment.

If any portion of the job fails, MinIO retries the job up to the number of times defined in the job definition.

The MinIO Batch Framework supports the following job types:

Job Type

Description

replicate

Perform a one-time replication procedure from one MinIO location to another MinIO location.

MinIO Batch CLI

The mc batch commands include

mc batch generate

The mc batch generate command creates a basic YAML-formatted template file for the specified job type.

mc batch start

The mc batch start command launches a batch job from a job batch YAML file.

mc batch list

The mc batch list command outputs a list of the batch jobs currently in progress on a deployment.

mc batch status

The mc batch status command outputs real-time summaries of job events on a MinIO server.

mc batch describe

The mc batch describe command outputs the job definition for a specified job ID.

Access to mc batch

You can use MinIO’s Policy Based Access Control and the administrative policy actions to restrict who can start a batch job, retrieve a list of running jobs, or describe a running job.

Job Types

Replicate

Use the replicate job type to create a batch job that replicates objects from the local MinIO deployment to another MinIO location. The definition file can limit the replication by bucket, prefix, and/or filters to only replicate certain objects.

For example, you can use a batch job to perform a one-time replication sync of objects from minio-alpha/invoices/ to minio-baker/invoices .

The advantages of Batch Replication over mc mirror include:

  • Removes the client to cluster network as a potential bottleneck

  • A user only needs access to starting a batch job with no other permissions, as the job runs entirely server side on the cluster

  • The job provides for retry attempts in event that objects do not replicate

  • Batch jobs are one-time, curated processes allowing for fine control replication

Changed in version RELEASE.2023-02-17T17-52-43Z: Run batch replication with multiple workers in parallel by specifying the MINIO_BATCH_REPLICATION_WORKERS environment variable.

Sample YAML Description File for a replicate Job Type

Create a basic replicate job definition file you can edit with mc batch generate .

replicate:
  apiVersion: v1
  # source of the objects to be replicated
  source:
    type: TYPE # valid values are "s3"
    bucket: BUCKET
    prefix: PREFIX
    # endpoint: ENDPOINT
    # credentials:
    #   accessKey: ACCESS-KEY
    #   secretKey: SECRET-KEY
    #   sessionToken: SESSION-TOKEN # Available when rotating credentials are used

  # target where the objects must be replicated
  target:
    type: TYPE # valid values are "s3"
    bucket: BUCKET
    prefix: PREFIX
    # endpoint: ENDPOINT
    # credentials:
    #   accessKey: ACCESS-KEY
    #   secretKey: SECRET-KEY
    #   sessionToken: SESSION-TOKEN # Available when rotating credentials are used

  # optional flags based filtering criteria
  # for all source objects
  flags:
    filter:
      newerThan: "7d" # match objects newer than this value (e.g. 7d10h31s)
      olderThan: "7d" # match objects older than this value (e.g. 7d10h31s)
      createdAfter: "date" # match objects created after "date"
      createdBefore: "date" # match objects created before "date"

      # tags:
      #   - key: "name"
      #     value: "pick*" # match objects with tag 'name', with all values starting with 'pick'

      ## NOTE: metadata filter not supported when "source" is non MinIO.
      # metadata:
      #   - key: "content-type"
      #     value: "image/*" # match objects with 'content-type', with all values starting with 'image/'

  notify:
    endpoint: "https://notify.endpoint" # notification endpoint to receive job status events
    token: "Bearer xxxxx" # optional authentication token for the notification endpoint

  retry:
    attempts: 10 # number of retries for the job before giving up
    delay: "500ms" # least amount of delay between each retry