Search Tutorials


Spring Batch Architecture | JavaInUse

Spring Batch Architecture

In previous tutorial we looked at What is Spring Batch. In this tutorial we will be understanding the Spring Batch Architecture. Following are the main building blocks of spring batch -
  1. Job
  2. Job Instance
  3. Job Execution
  4. Job Launcher
  5. Job Repository

Video

This tutorial is explained in the below Youtube Video.

  • Job - Job can be defined as the work to be executed using spring batch. This work might involve a simple or complex task.
    In case of simple tasks usually a Spring Batch Tasklet is used. For example for a task like deleting a file, executing a database query a Spring Batch Tasklet will be implemented to execute the job. In next tutorial we implement a Spring Batch Tasklet Hello World Example to delete a file at a particular location.
    Spring Batch Tasklet job
    Or job can involve multiple steps that can be executed in a sequence. For example a job consisting of reading from the database, processing it and writing output to a particular location. In case of such complex jobs Spring Batch Chunk is used to complete the job. In next tutorial we implement a Spring Batch Chunk Processing World Example. Chunk based processing provides advantages like parallel processing,less network overhead etc.
    Spring Batch Chunk Job
  • Job Instance - When a spring batch job is run, a JobInstance gets created. The JobInstance is a logical run of the job. Each JobInstance is identified by the job name and the parameters passed to a job. For example if we schedule a batch job to run every day, then a new JobInstance will be created everyday for each run.
    Spring Batch Job Instance
  • Job Execution - Each JobInstance is considered to be completed when it has an attempt or JobExecution that has successfully completed.
    Spring Batch JobInstance
    The Batch execution can have one of the following status -
    1. ABANDONED
    2. COMPLETED
    3. FAILED
    4. STARTED
    5. STARTING
    6. STOPPED
    7. STOPPING
    8. UNKNOWN
  • JobLauncher - The Spring Batch JobLauncher interface is an interface used to launch Spring Batch Jobs. Spring Batch JobLauncher has a single method - run(Job job, JobParameters jobParameters). It takes two parameters
    • Job to be executed
    • JobParameters to be passed to the Job

    Spring Batch JobLauncher
  • Job Repository - One of the important features of spring batch is state management. This is achieved using the Job Repository. Suppose a spring batch job was running and an error occurs. How does spring batch know that an error has occured and the job needs to be rerun. Important aspect of Spring Batch is that there should be no external interferance. So we need to save state of the job and future executions should take this into considerations. State Management is an important aspect when processing large volumes of data. This is achieved using Spring Batch JobRepository.
    Spring Batch Job Repository
    Spring Batch JobRepository provides two different types of data stores -
    • In-Memory Job Repository - When developing a Spring Batch job or running unit tests, configuring an external database may be more trouble than it's worth. Because of that, Spring Batch provides an implementation of the JobRepository that utilizes java.util.Map instances as the datastore.
    • Relational Database - A relational database is the default option for the job repository in Spring Batch. The spring batch meta data is persisted using the database tables.
      Spring Batch Datatables
      • BATCH_JOB_INSTANCE Table - A single job instance is created when the job is executed for the first time with a unique set of identifying job parameters. It persists the logical run of a job.
      • BATCH_JOB_EXECUTION Table - This table represents every physical run of a batch job. Each time a job is launched, a new record will be created here and be updated periodically as the job progresses.
      • BATCH_JOB_EXECUTION_CONTEXT Table - Batch processes are stateful by their nature. They need to know what step they are on. They need to know start, end and status of the job.This is done using this table.
      • BATCH_JOB_EXECUTION_PARAMS Table - This table is where the job parameters are stored for each execution.
      • BATCH_STEP_EXECUTION Table - used to store the metadata for a step
      • BATCH_STEP_EXECUTION_CONTEXT - They need to know how many records they have processed within that step.