引言
Spring Batch是處理大量資料操作的一個框架,主要用來讀取大量資料,然後進行一定的處理後輸出指定的形式。比如我們可以將csv檔案中的資料(資料量幾百萬甚至幾千萬都是沒問題的)批處理插入儲存到資料庫中,就可以使用該框架,但是不管是資料資料還是網上資料,我看到很少有這樣的詳細講解。所以本片博文的主要目的邊講解的同時邊實戰(其中的程式碼都是經過實踐的)。同樣地先從Spring Boot對Batch框架的支援說起,最後一步一步進行程式碼實踐!
一、Spring Boot對Batch框架的支援
1、Spring Batch框架的組成部分
1)JobRepository:用來註冊Job容器,設定資料庫相關屬性。
2)JobLauncher:用來啟動Job的介面
3)Job:我們要實際執行的任務,包含一個或多個
4)Step:即步驟,包括:ItemReader->ItemProcessor->ItemWriter
5)ItemReader:用來讀取資料,做實體類與資料欄位之間的對映。比如讀取csv檔案中的人員資料,之後對應實體person的欄位做mapper
6)ItemProcessor:用來處理資料的介面,同時可以做資料校驗(設定校驗器,使用JSR-303(hibernate-validator)註解),比如將中文性別男/女,轉為M/F。同時校驗年齡欄位是否符合要求等
7)ItemWriter:用來輸出資料的介面,設定資料庫源。編寫預處理SQL插入語句
以上七個組成部分,只需要在配置類中逐一註冊即可,同時配置類需要開啟@EnableBatchProcessing註解
@Configuration @EnableBatchProcessing // 開啟批處理的支援 @Import(DruidDBConfig.class) // 注入datasource public class CsvBatchConfig { }
2、批處理流程圖
如下流程圖即可以解釋在配置類中為什麼需要這麼定義,具體請看實戰部分的程式碼。
二、實戰
1、新增依賴
1)spring batch依賴
<!-- spring batch --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-batch</artifactId> </dependency>
2)校驗器依賴
<!-- hibernate validator --> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-validator</artifactId> <version>6.0.7.Final</version> </dependency>
3)mysql+druid依賴
<!-- mysql connector--> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.35</version> </dependency> <!-- alibaba dataSource --> <dependency> <groupId>com.alibaba</groupId> <artifactId>druid</artifactId> <version>1.1.12</version> </dependency>
4)test測試依賴
<dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> </dependency>
2、application.yml配置
當job釋出開始執行任務時,spring batch會自動生成相關的batch開頭的表。這些表一開始是不存在的!需要在application配置檔案中做相關的設定。
# batch
batch:
job:
# 預設自動執行定義的Job(true),改為false,需要jobLaucher.run執行
enabled: false
# spring batch在資料庫裡面建立預設的資料表,如果不是always則會提示相關表不存在
initialize-schema: always
# 設定batch表的字首
# table-prefix: csv-batch
3、資料來源配置
datasource: username: root password: 1234 url: jdbc:mysql://127.0.0.1:3306/db_base?useSSL=false&serverTimezone=UTC&characterEncoding=utf8 driver-class-name: com.mysql.jdbc.Driver
註冊DBConfig配置類:之後通過import匯入batch配置類中
/** * @author jian * @dete 2019/4/20 * @description 自定義DataSource * */ @Configuration public class DruidDBConfig { private Logger logger = LoggerFactory.getLogger(DruidDBConfig.class); @Value("${spring.datasource.url}") private String dbUrl; @Value("${spring.datasource.username}") private String username; @Value("${spring.datasource.password}") private String password; @Value("${spring.datasource.driver-class-name}") private String driverClassName; /* @Value("${spring.datasource.initialSize}") private int initialSize; @Value("${spring.datasource.minIdle}") private int minIdle; @Value("${spring.datasource.maxActive}") private int maxActive; @Value("${spring.datasource.maxWait}") private int maxWait; @Value("${spring.datasource.timeBetweenEvictionRunsMillis}") private int timeBetweenEvictionRunsMillis; @Value("${spring.datasource.minEvictableIdleTimeMillis}") private int minEvictableIdleTimeMillis; @Value("${spring.datasource.validationQuery}") private String validationQuery; @Value("${spring.datasource.testWhileIdle}") private boolean testWhileIdle; @Value("${spring.datasource.testOnBorrow}") private boolean testOnBorrow; @Value("${spring.datasource.testOnReturn}") private boolean testOnReturn; @Value("${spring.datasource.poolPreparedStatements}") private boolean poolPreparedStatements; @Value("${spring.datasource.maxPoolPreparedStatementPerConnectionSize}") private int maxPoolPreparedStatementPerConnectionSize; @Value("${spring.datasource.filters}") private String filters; @Value("{spring.datasource.connectionProperties}") private String connectionProperties;*/ @Bean @Primary // 被注入的優先順序最高 public DataSource dataSource() { DruidDataSource dataSource = new DruidDataSource(); logger.info("-------->dataSource[url="+dbUrl+" ,username="+username+"]"); dataSource.setUrl(dbUrl); dataSource.setUsername(username); dataSource.setPassword(password); dataSource.setDriverClassName(driverClassName); /* //configuration datasource.setInitialSize(initialSize); datasource.setMinIdle(minIdle); datasource.setMaxActive(maxActive); datasource.setMaxWait(maxWait); datasource.setTimeBetweenEvictionRunsMillis(timeBetweenEvictionRunsMillis); datasource.setMinEvictableIdleTimeMillis(minEvictableIdleTimeMillis); datasource.setValidationQuery(validationQuery); datasource.setTestWhileIdle(testWhileIdle); datasource.setTestOnBorrow(testOnBorrow); datasource.setTestOnReturn(testOnReturn); datasource.setPoolPreparedStatements(poolPreparedStatements); datasource.setMaxPoolPreparedStatementPerConnectionSize(maxPoolPreparedStatementPerConnectionSize); try { datasource.setFilters(filters); } catch (SQLException e) { logger.error("druid configuration initialization filter", e); } datasource.setConnectionProperties(connectionProperties);*/ return dataSource; } @Bean public ServletRegistrationBean druidServletRegistrationBean() { ServletRegistrationBean servletRegistrationBean = new ServletRegistrationBean(); servletRegistrationBean.setServlet(new StatViewServlet()); servletRegistrationBean.addUrlMappings("/druid/*"); return servletRegistrationBean; } /** * 註冊DruidFilter攔截 * * @return */ @Bean public FilterRegistrationBean duridFilterRegistrationBean() { FilterRegistrationBean filterRegistrationBean = new FilterRegistrationBean(); filterRegistrationBean.setFilter(new WebStatFilter()); Map<String, String> initParams = new HashMap<String, String>(); //設定忽略請求 initParams.put("exclusions", "*.js,*.gif,*.jpg,*.bmp,*.png,*.css,*.ico,/druid/*"); filterRegistrationBean.setInitParameters(initParams); filterRegistrationBean.addUrlPatterns("/*"); return filterRegistrationBean; } }
4、編寫batch配置類
在配置類中,註冊Spring Batch的各個組成部分即可,其中部分說明已在程式碼中註釋.
/** * * @author jian * @date 2019/4/28 * @description spring batch cvs檔案批處理配置需要注入Spring Batch以下組成部分 * spring batch組成: * 1)JobRepository 註冊job的容器 * 2)JonLauncher 用來啟動job的介面 * 3)Job 實際執行的任務,包含一個或多個Step * 4)Step Step步驟包括ItemReader、ItemProcessor和ItemWrite * 5)ItemReader 讀取資料的介面 * 6)ItemProcessor 處理資料的介面 * 7)ItemWrite 輸出資料的介面 * * */ @Configuration @EnableBatchProcessing // 開啟批處理的支援 @Import(DruidDBConfig.class) // 注入datasource public class CsvBatchConfig { private Logger logger = LoggerFactory.getLogger(CsvBatchConfig.class); /** * ItemReader定義:讀取檔案資料+entirty對映 * @return */ @Bean public ItemReader<Person> reader(){ // 使用FlatFileItemReader去讀cvs檔案,一行即一條資料 FlatFileItemReader<Person> reader = new FlatFileItemReader<>(); // 設定檔案處在路徑 reader.setResource(new ClassPathResource("person.csv")); // entity與csv資料做對映 reader.setLineMapper(new DefaultLineMapper<Person>() { { setLineTokenizer(new DelimitedLineTokenizer() { { setNames(new String[]{"id", "name", "age", "gender"}); } }); setFieldSetMapper(new BeanWrapperFieldSetMapper<Person>() { { setTargetType(Person.class); } }); } }); return reader; } /** * 註冊ItemProcessor: 處理資料+校驗資料 * @return */ @Bean public ItemProcessor<Person, Person> processor(){ CvsItemProcessor cvsItemProcessor = new CvsItemProcessor(); // 設定校驗器 cvsItemProcessor.setValidator(csvBeanValidator()); return cvsItemProcessor; } /** * 註冊校驗器 * @return */ @Bean public CsvBeanValidator csvBeanValidator(){ return new CsvBeanValidator<Person>(); } /** * ItemWriter定義:指定datasource,設定批量插入sql語句,寫入資料庫 * @param dataSource * @return */ @Bean public ItemWriter<Person> writer(DataSource dataSource){ // 使用jdbcBcatchItemWrite寫資料到資料庫中 JdbcBatchItemWriter<Person> writer = new JdbcBatchItemWriter<>(); // 設定有引數的sql語句 writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Person>()); String sql = "insert into person values(:id,:name,:age,:gender)"; writer.setSql(sql); writer.setDataSource(dataSource); return writer; } /** * JobRepository定義:設定資料庫,註冊Job容器 * @param dataSource * @param transactionManager * @return * @throws Exception */ @Bean public JobRepository cvsJobRepository(DataSource dataSource, PlatformTransactionManager transactionManager) throws Exception{ JobRepositoryFactoryBean jobRepositoryFactoryBean = new JobRepositoryFactoryBean(); jobRepositoryFactoryBean.setDatabaseType("mysql"); jobRepositoryFactoryBean.setTransactionManager(transactionManager); jobRepositoryFactoryBean.setDataSource(dataSource); return jobRepositoryFactoryBean.getObject(); } /** * jobLauncher定義: * @param dataSource * @param transactionManager * @return * @throws Exception */ @Bean public SimpleJobLauncher csvJobLauncher(DataSource dataSource, PlatformTransactionManager transactionManager) throws Exception{ SimpleJobLauncher jobLauncher = new SimpleJobLauncher(); // 設定jobRepository jobLauncher.setJobRepository(cvsJobRepository(dataSource, transactionManager)); return jobLauncher; } /** * 定義job * @param jobs * @param step * @return */ @Bean public Job importJob(JobBuilderFactory jobs, Step step){ return jobs.get("importCsvJob") .incrementer(new RunIdIncrementer()) .flow(step) .end() .listener(csvJobListener()) .build(); } /** * 註冊job監聽器 * @return */ @Bean public CsvJobListener csvJobListener(){ return new CsvJobListener(); } /** * step定義:步驟包括ItemReader->ItemProcessor->ItemWriter 即讀取資料->處理校驗資料->寫入資料 * @param stepBuilderFactory * @param reader * @param writer * @param processor * @return */ @Bean public Step step(StepBuilderFactory stepBuilderFactory, ItemReader<Person> reader, ItemWriter<Person> writer, ItemProcessor<Person, Person> processor){ return stepBuilderFactory .get("step") .<Person, Person>chunk(65000) // Chunk的機制(即每次讀取一條資料,再處理一條資料,累積到一定數量後再一次性交給writer進行寫入操作) .reader(reader) .processor(processor) .writer(writer) .build(); } }
5、定義處理器
只需要實現ItemProcessor介面,重寫process方法,輸入的引數是從ItemReader讀取到的資料,返回的資料給ItemWriter
/** * @author jian * @date 2019/4/28 * @description * CSV檔案資料處理及校驗 * 只需要實現ItemProcessor介面,重寫process方法,輸入的引數是從ItemReader讀取到的資料,返回的資料給ItemWriter */ public class CvsItemProcessor extends ValidatingItemProcessor<Person> { private Logger logger = LoggerFactory.getLogger(CvsItemProcessor.class); @Override public Person process(Person item) throws ValidationException { // 執行super.process()才能呼叫自定義的校驗器 logger.info("processor start validating..."); super.process(item); // 資料處理,比如將中文性別設定為M/F if ("男".equals(item.getGender())) { item.setGender("M"); } else { item.setGender("F"); } logger.info("processor end validating..."); return item; } }
6、定義校驗器
定義校驗器:使用JSR-303(hibernate-validator)註解,來校驗ItemReader讀取到的資料是否滿足要求。如不滿足則不會進行接下來的批處理任務。
/** * * @author jian * @date 2019/4/28 * @param <T> * @description 定義校驗器:使用JSR-303(hibernate-validator)註解,來校驗ItemReader讀取到的資料是否滿足要求。 */ public class CsvBeanValidator<T> implements Validator<T>, InitializingBean { private javax.validation.Validator validator; /** * 進行JSR-303的Validator的初始化 * @throws Exception */ @Override public void afterPropertiesSet() throws Exception { ValidatorFactory validatorFactory = Validation.buildDefaultValidatorFactory(); validator = validatorFactory.usingContext().getValidator(); } /** * 使用validator方法檢驗資料 * @param value * @throws ValidationException */ @Override public void validate(T value) throws ValidationException { Set<ConstraintViolation<T>> constraintViolations = validator.validate(value); if (constraintViolations.size() > 0) { StringBuilder message = new StringBuilder(); for (ConstraintViolation<T> constraintViolation: constraintViolations) { message.append(constraintViolation.getMessage() + "\n"); } throw new ValidationException(message.toString()); } } }
7、定義監聽器:
監聽Job執行情況,則定義一個類實現JobExecutorListener,並定義Job的Bean上繫結該監聽器
/** * @author jian * @date 2019/4/28 * @description * 監聽Job執行情況,則定義一個類實現JobExecutorListener,並定義Job的Bean上繫結該監聽器 */ public class CsvJobListener implements JobExecutionListener { private Logger logger = LoggerFactory.getLogger(CsvJobListener.class); private long startTime; private long endTime; @Override public void beforeJob(JobExecution jobExecution) { startTime = System.currentTimeMillis(); logger.info("job process start..."); } @Override public void afterJob(JobExecution jobExecution) { endTime = System.currentTimeMillis(); logger.info("job process end..."); logger.info("elapsed time: " + (endTime - startTime) + "ms"); } }
三、測試
1、person.csv檔案
csv檔案時以逗號為分隔的資料表示欄位,回車表示一行(條)資料記錄
1,Zhangsan,21,男 2,Lisi,22,女 3,Wangwu,23,男 4,Zhaoliu,24,男 5,Zhouqi,25,女
放在resources下,在ItemReader中讀取的該路徑即可
2、person實體
person.csv中的欄位與之對應,並在該實體中可以新增校驗註解,如@Size表示該欄位的長度範圍,如果超過規定。則會被校驗檢測到,批處理將不會進行!
public class Person implements Serializable { private final long serialVersionUID = 1L; private String id; @Size(min = 2, max = 8) private String name; private int age; private String gender; public String getId() { return id; } public void setId(String id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public int getAge() { return age; } public void setAge(int age) { this.age = age; } public String getGender() { return gender; } public void setGender(String gender) { this.gender = gender; } @Override public String toString() { return "Person{" + "id='" + id + '\'' + ", name='" + name + '\'' + ", age=" + age + ", gender='" + gender + '\'' + '}'; } }
3、資料表
CREATE TABLE `person` ( `id` int(11) NOT NULL, `name` varchar(10) DEFAULT NULL, `age` int(11) DEFAULT NULL, `gender` varchar(2) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
一開始表是沒有資料的
4、測試類
需要注入釋出器,與job任務。同時可以使用後置引數靈活處理,最後呼叫JobLauncher.run方法執行批處理任務
@RunWith(SpringRunner.class) @SpringBootTest public class BatchTest { @Autowired SimpleJobLauncher jobLauncher; @Autowired Job importJob; @Test public void test() throws Exception{ // 後置引數:使用JobParameters中繫結引數 JobParameters jobParameters = new JobParametersBuilder().addLong("time", System.currentTimeMillis()) .toJobParameters(); jobLauncher.run(importJob, jobParameters); } }
5、測試結果
....
2019-05-09 15:23:39.576 INFO 18296 --- [ main] com.lijian.test.BatchTest : Started BatchTest in 6.214 seconds (JVM running for 7.185) 2019-05-09 15:23:39.939 INFO 18296 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importCsvJob]] launched with the following parameters: [{time=1557386619763}] 2019-05-09 15:23:39.982 INFO 18296 --- [ main] com.lijian.config.batch.CsvJobListener : job process start... 2019-05-09 15:23:40.048 INFO 18296 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step] 2019-05-09 15:23:40.214 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor start validating... 2019-05-09 15:23:40.282 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor end validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor start validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor end validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor start validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor end validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor start validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor end validating... 2019-05-09 15:23:40.283 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor start validating... 2019-05-09 15:23:40.284 INFO 18296 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor end validating... 2019-05-09 15:23:40.525 INFO 18296 --- [ main] com.lijian.config.batch.CsvJobListener : job process end... 2019-05-09 15:23:40.526 INFO 18296 --- [ main] com.lijian.config.batch.CsvJobListener : elapsed time: 543ms 2019-05-09 15:23:40.548 INFO 18296 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importCsvJob]] completed with the following parameters: [{time=1557386619763}] and the following status: [COMPLETED] 2019-05-09 15:23:40.564 INFO 18296 --- [ Thread-5] com.alibaba.druid.pool.DruidDataSource : {dataSource-1} closed
檢視錶中資料: select * from person;
若繼續插入資料,並且測試校驗器是否生效,則將person.csv更改為如下內容:
6,springbatch,24,男
7,springboot,23,女
由於實體類中JSR校驗註解對name長度範圍進行了檢驗,即新增了 @Size(min=2, max=8) 的註解。故會報錯顯示校驗不通過,批處理將不會進行。
... Started BatchTest in 5.494 seconds (JVM running for 6.41) 2019-05-09 15:30:02.147 INFO 20368 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [FlowJob: [name=importCsvJob]] launched with the following parameters: [{time=1557387001499}] 2019-05-09 15:30:02.247 INFO 20368 --- [ main] com.lijian.config.batch.CsvJobListener : job process start... 2019-05-09 15:30:02.503 INFO 20368 --- [ main] o.s.batch.core.job.SimpleStepHandler : Executing step: [step] 2019-05-09 15:30:02.683 INFO 20368 --- [ main] c.lijian.config.batch.CvsItemProcessor : processor start validating... 2019-05-09 15:30:02.761 ERROR 20368 --- [ main] o.s.batch.core.step.AbstractStep : Encountered an error executing step step in job importCsvJob org.springframework.batch.item.validator.ValidationException: size must be between 2 and 8 ...