起因
起因是一个批量根据es文档id更新指定字段的功能,经过上线后使用反馈,经常性的偶发文档无法更新的情况
处理
找到相关代码,自行写个demo代码批量跑下试下,原来的代码大概示意如下
public String len10() throws Exception{
Random random = new Random();
String[] ls = new String[500];
for (int i = 0; i < 500; i++) {
int finalI = i;
Runnable callable = new Runnable() {
@Override
public void run() {
String str = String.valueOf(random.nextInt(40)+10);
String str2 = String.valueOf(random.nextInt(40)+10);
String str3 = String.valueOf(random.nextInt(40)+10);
String action = "/XXXXXXX_index/XXXXXXX_type/_update_by_query";
String dateTime = "20"+str+"-01-12 23:"+str2+":"+str3;
String script = "{\n" +
" \"script\": {\n" +
" \"inline\": \"ctx._source.modify_time='"+dateTime+"'\"\n" +
" },\n" +
" \"query\": {\n" +
" \"bool\": {\n" +
" \"filter\": [{\n" +
" \"term\": {\n" +
" \"id\": \"2\"\n" +
" }\n" +
" }]\n" +
" }\n" +
" }\n" +
"}"
;
try {
String post = esRestClient.performRequest("POST", action, script);
System.out.println(post);
ls[finalI] = post;
} catch (IOException e) {
e.printStackTrace();
ls[finalI] = e.getMessage();
}
}
};
Thread thread = new Thread( callable);
thread.start();
}
Thread.sleep(50000);
return "";
}
大意就是起500个线程,更新索引中指定文档id为2的文档的modify_time字段,通过script来更新。
执行之后其实就可以看到大量异常信息了
HTTP/1.1 500 Internal Server Error
{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_per_minute] setting",
"bytes_wanted": 0,
"bytes_limit": 0
}
],
"type": "general_script_exception",
"reason": "Failed to compile inline script [ctx._source.modify_time='2024-03-25 09:44:48';] using lang [painless]",
"caused_by": {
"type": "circuit_breaking_exception",
"reason": "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_per_minute] setting",
"bytes_wanted": 0,
"bytes_limit": 0
}
},
"status": 500
}
根据返回内容Too many dynamic script compilations within one minute, max: [15/min]
我们可以知道,当我们使用script功能的时候,在ES中需要对该脚本进行编译,但是ES对脚本编译有个限制的配置,script.max_compilations_per_minute
,这个配置限制了每分钟能够进行编译的脚本的数量,防止过多编译工作对ES服务器带来负载。
上面的代码中相当于每次都提交了一条不同modify_time的script,当大批量数据执行的时候,则必然会超过限制报错。
最直接的方法,可以通过下面操作来修改该解析上限的配置
PUT /_cluster/settings
{
"transient": {
"script.max_compilations_per_minute": 100
}
}
不过这只是个治标不治本的方法,水多了加面、面多了加水,ES配置的max_compilations_per_minute的值不可能无限制的往上提升,
面对这种情况我们需要的其实很简单,只是单独的把script中的变量提出来,通过params参数传入变量即可,因为ES的编译是对script.inline/script.source内容进行编译的,如果直接把变量的值写在脚本内容中,则必然每次都会被认为是不同的脚本,每次都需要重新编译。而如果把变量的值提出来放到params参数中则就能解决这个问题了。
重新修改下测试的demo代码如下
public String len9() throws Exception{
Random random = new Random();
String[] ls = new String[500];
for (int i = 0; i < 500; i++) {
int finalI = i;
Runnable callable = new Runnable() {
@Override
public void run() {
String str = String.valueOf(random.nextInt(40)+10);
String str2 = String.valueOf(random.nextInt(40)+10);
String str3 = String.valueOf(random.nextInt(40)+10);
String dateTime = "20"+str+"-01-12 23:"+str2+":"+str3;
String action = "/error_handle_index/error_handle_index/_update_by_query";
String script = "{\n" +
" \"script\": {\n" +
" \"inline\": \"ctx._source.modify_time=params.time\",\n" +
"\"params\" : {\n" +
" \"time\":\""+dateTime+"\"\n" +
" }\n"+
" },\n" +
" \"query\": {\n" +
" \"bool\": {\n" +
" \"filter\": [{\n" +
" \"term\": {\n" +
" \"id\": \"2\"\n" +
" }\n" +
" }]\n" +
" }\n" +
" }\n" +
"}"
;
try {
String post = esRestClient.performRequest("POST", action, script);
System.out.println(post);
ls[finalI] = post;
} catch (IOException e) {
e.printStackTrace();
ls[finalI] = e.getMessage();
}
}
};
Thread thread = new Thread( callable);
thread.start();
}
Thread.sleep(50000);
return "";
}
再次执行确认没有之前的Too many dynamic script compilations within one minute
的异常返回了,说明这个问题解决成功了。
但是,接下来的是另一个问题,批量执行的时候最常见的409 Conflict
的问题。在这份demo代码中,大批量的同时对文档id为2的文档进行更新,则必然会发生409 Conflict
的情况的,这个需要通过另外的手段来处理。《ElasticSearch使用_delete_by_query删除大批数据,及409 Conflict版本冲突问题处理》
官方文档上专门一段内容提到过这个问题https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-scripting-using.html
Prefer parameters
The first time Elasticsearch sees a new script, it compiles it and stores the compiled version in a cache. Compilation can be a heavy process.
If you need to pass variables into the script, you should pass them in as named params
instead of hard-coding values into the script itself. For example, if you want to be able to multiply a field value by different multipliers, don’t hard-code the multiplier into the script:
"inline": "doc['my_field'] * 2"
Instead, pass it in as a named parameter:
"inline": "doc['my_field'] * multiplier", "params": { "multiplier": 2 }
The first version has to be recompiled every time the multiplier changes. The second version is only compiled once.
If you compile too many unique scripts within a small amount of time, Elasticsearch will reject the new dynamic scripts with a circuit_breaking_exception
error. By default, up to 15 inline scripts per minute will be compiled. You can change this setting dynamically by setting script.max_compilations_per_minute
.
翻译过来大意就是第一段的脚本内容每次multiplier参数变更的时候都需要重新编译,而第二种用了params传递参数的则只会编译一次
发表评论