Spring Boot + Elasticsearch 实现索引批量写入
在使用Eleasticsearch进行索引维护的过程中,如果你的应用场景需要频繁的大批量的索引写入,再使用上篇中提到的维护方法的话显然效率是低下的,此时推荐使用bulkIndex来提升效率。批写入数据块的大小取决于你的数据集及集群的配置。
下面我们以Spring Boot结合Elasticsearch创建一个示例项目,从基本的pom配置开始
-
<dependency> -
<groupId>com.google.code.gson</groupId> -
<artifactId>gson</artifactId> -
<version>1.4</version> -
</dependency> -
<dependency> -
<groupId>org.springframework.boot</groupId> -
<artifactId>spring-boot-starter-data-elasticsearch</artifactId> -
</dependency>
application.properties配置
-
#elasticsearch config -
spring.data.elasticsearch.cluster-name:elasticsearch -
spring.data.elasticsearch.cluster-nodes:192.168.1.105:9300 -
-
#application config -
server.port=8080 -
spring.application.name=esp-app
我们需要定义域的实体和一个Spring data的基本的CRUD支持库类。用id注释定义标识符字段,如果你没有指定ID字段,Elasticsearch不能索引你的文件。同时需要指定索引名称类型,@Document注解也有助于我们设置分片和副本数量。
-
@Data -
@Document(indexName = "carIndex", type = "carType", shards = 1, replicas = 0) -
public class Car implements Serializable { -
/** -
* serialVersionUID: -
* @since JDK 1.6 -
*/ -
private static final long serialVersionUID = 1L; -
@Id -
private Long id; -
private String brand; -
private String model; -
private BigDecimal amount; -
-
public Car(Long id, String brand, String model, BigDecimal amount) { -
this.id = id; -
this.brand = brand; -
this.model = model; -
this.amount = amount; -
} -
}
接着定义一个IndexService并使用bulk请求来处理索引,操作前首先要判断索引是否存在,以免出现异常。为了更好的掌握Java API,这里采用了不同于上篇中ElasticSearchRepository的ElasticSearchTemplate工具集,相对来讲功能更加丰富。
-
@Service -
public class IndexerService { -
private static final String CAR_INDEX_NAME = "car_index"; -
private static final String CAR_INDEX_TYPE = "car_type"; -
@Autowired -
ElasticsearchTemplate elasticsearchTemplate; -
-
public long bulkIndex() throws Exception { -
int counter = 0; -
try { -
//判断索引是否存在 -
if (!elasticsearchTemplate.indexExists(CAR_INDEX_NAME)) { -
elasticsearchTemplate.createIndex(CAR_INDEX_NAME); -
} -
Gson gson = new Gson(); -
List<IndexQuery> queries = new ArrayList<IndexQuery>(); -
List<Car> cars = assembleTestData(); -
for (Car car : cars) { -
IndexQuery indexQuery = new IndexQuery(); -
indexQuery.setId(car.getId().toString()); -
indexQuery.setSource(gson.toJson(car)); -
indexQuery.setIndexName(CAR_INDEX_NAME); -
indexQuery.setType(CAR_INDEX_TYPE); -
queries.add(indexQuery); -
//分批提交索引 -
if (counter % 500 == 0) { -
elasticsearchTemplate.bulkIndex(queries); -
queries.clear(); -
System.out.println("bulkIndex counter : " + counter); -
} -
counter++; -
} -
//不足批的索引最后不要忘记提交 -
if (queries.size() > 0) { -
elasticsearchTemplate.bulkIndex(queries); -
} -
elasticsearchTemplate.refresh(CAR_INDEX_NAME); -
System.out.println("bulkIndex completed."); -
} catch (Exception e) { -
System.out.println("IndexerService.bulkIndex e;" + e.getMessage()); -
throw e; -
} -
-
return -1; -
} -
-
private List<Car> assembleTestData() { -
List<Car> cars = new ArrayList<Car>(); -
//随机生成10000个索引,以便下一次批量写入 -
for (int i = 0; i < 10000; i++) { -
cars.add(new Car(RandomUtils.nextLong(1, 11111), RandomStringUtils.randomAscii(20), RandomStringUtils.randomAlphabetic(15), BigDecimal.valueOf(78000))); -
} -
return cars; -
} -
}
再下面的工作就比较简单了,可以编写一个RestController接受请求来测试或者CommandLineRunner,在系统启动时就加载上面的方法。
-
@SpringBootApplication -
@RestController -
public class ESPApplicatoin { -
-
public static void main(String[] args) { -
SpringApplication.run(ESPApplicatoin.class, args); -
} -
-
@Autowired -
IndexerService indexService; -
-
-
@RequestMapping(value = "bulkIndex",method = RequestMethod.POST) -
public void bulkIndex(){ -
try { -
indexService.bulkIndex(); -
} catch (Exception e) { -
e.printStackTrace(); -
} -
} -
}
CommandLineRunner方法类:
-
@Component -
public class AppLoader implements CommandLineRunner { -
@Autowired -
IndexerService indexerService; -
-
@Override -
public void run(String... strings) throws Exception { -
indexerService.bulkIndex(); -
} -
}
结束后,就可在通过地址http://localhost:9200/car_index/_search/来查看索引到底有无生效。注:要特别关注版本的兼容问题,如果用Es 5+的话,显然不能采用Spring Data Elasticsearch的方式。
| Spring Boot Version (x) | Spring Data Elasticsearch Version (y) | Elasticsearch Version (z) |
|---|---|---|
| x <= 1.3.5 | y <= 1.3.4 | z <= 1.7.2* |
| x >= 1.4.x | 2.0.0 <=y < 5.0.0** | 2.0.0 <= z < 5.0.0** |
(*) - require manual change in your project pom file (solution 2.)
(**) - Next big ES release with breaking changes
本文介绍如何使用SpringBoot结合Elasticsearch实现高效批量索引写入,包括项目配置、实体定义、批量索引服务实现及测试方法。

4256

被折叠的 条评论
为什么被折叠?



