起因

起因是一个批量根据es文档id更新指定字段的功能,经过上线后使用反馈,经常性的偶发文档无法更新的情况

处理

找到相关代码,自行写个demo代码批量跑下试下,原来的代码大概示意如下

public String len10() throws Exception{
    Random random = new Random();
    String[] ls = new String[500];
    for (int i = 0; i < 500; i++) {
        int finalI = i;
        Runnable callable = new Runnable() {
            @Override
            public void run() {
                String str = String.valueOf(random.nextInt(40)+10);
                String str2 = String.valueOf(random.nextInt(40)+10);
                String str3 = String.valueOf(random.nextInt(40)+10);
                String action = "/XXXXXXX_index/XXXXXXX_type/_update_by_query";
                String dateTime = "20"+str+"-01-12 23:"+str2+":"+str3;
                String script = "{\n" +
                        "  \"script\": {\n" +
                        "    \"inline\": \"ctx._source.modify_time='"+dateTime+"'\"\n" +
                        "  },\n" +
                        "  \"query\": {\n" +
                        "    \"bool\": {\n" +
                        "      \"filter\": [{\n" +
                        "        \"term\": {\n" +
                        "          \"id\": \"2\"\n" +
                        "        }\n" +
                        "      }]\n" +
                        "    }\n" +
                        "  }\n" +
                        "}"
                        ;
                try {
                    String post = esRestClient.performRequest("POST", action, script);
                    System.out.println(post);
                    ls[finalI] = post;
                } catch (IOException e) {
                    e.printStackTrace();
                    ls[finalI] = e.getMessage();
                }
            }
        };
        Thread thread = new Thread( callable);
        thread.start();
    }
    Thread.sleep(50000);
    return "";
}

大意就是起500个线程,更新索引中指定文档id为2的文档的modify_time字段,通过script来更新。

执行之后其实就可以看到大量异常信息了

HTTP/1.1 500 Internal Server Error
{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_per_minute] setting",
        "bytes_wanted": 0,
        "bytes_limit": 0
      }
    ],
    "type": "general_script_exception",
    "reason": "Failed to compile inline script [ctx._source.modify_time='2024-03-25 09:44:48';] using lang [painless]",
    "caused_by": {
      "type": "circuit_breaking_exception",
      "reason": "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead; this limit can be changed by the [script.max_compilations_per_minute] setting",
      "bytes_wanted": 0,
      "bytes_limit": 0
    }
  },
  "status": 500
}

根据返回内容Too many dynamic script compilations within one minute, max: [15/min]我们可以知道,当我们使用script功能的时候,在ES中需要对该脚本进行编译,但是ES对脚本编译有个限制的配置,script.max_compilations_per_minute,这个配置限制了每分钟能够进行编译的脚本的数量,防止过多编译工作对ES服务器带来负载。

上面的代码中相当于每次都提交了一条不同modify_time的script,当大批量数据执行的时候,则必然会超过限制报错。

最直接的方法,可以通过下面操作来修改该解析上限的配置

PUT /_cluster/settings
{
  "transient": {
    "script.max_compilations_per_minute": 100
  }
}

不过这只是个治标不治本的方法,水多了加面、面多了加水,ES配置的max_compilations_per_minute的值不可能无限制的往上提升,

面对这种情况我们需要的其实很简单,只是单独的把script中的变量提出来,通过params参数传入变量即可,因为ES的编译是对script.inline/script.source内容进行编译的,如果直接把变量的值写在脚本内容中,则必然每次都会被认为是不同的脚本,每次都需要重新编译。而如果把变量的值提出来放到params参数中则就能解决这个问题了。

重新修改下测试的demo代码如下

public String len9() throws Exception{
    Random random = new Random();
    String[] ls = new String[500];
    for (int i = 0; i < 500; i++) {
        int finalI = i;
        Runnable callable = new Runnable() {
            @Override
            public void run() {
                String str = String.valueOf(random.nextInt(40)+10);
                String str2 = String.valueOf(random.nextInt(40)+10);
                String str3 = String.valueOf(random.nextInt(40)+10);
                String dateTime = "20"+str+"-01-12 23:"+str2+":"+str3;
                String action = "/error_handle_index/error_handle_index/_update_by_query";
                String script = "{\n" +
                        "  \"script\": {\n" +
                        "    \"inline\": \"ctx._source.modify_time=params.time\",\n" +
                        "\"params\" : {\n" +
                        "            \"time\":\""+dateTime+"\"\n" +
                        "        }\n"+
                        "  },\n" +
                        "  \"query\": {\n" +
                        "    \"bool\": {\n" +
                        "      \"filter\": [{\n" +
                        "        \"term\": {\n" +
                        "          \"id\": \"2\"\n" +
                        "        }\n" +
                        "      }]\n" +
                        "    }\n" +
                        "  }\n" +
                        "}"
                        ;
                try {
                    String post = esRestClient.performRequest("POST", action, script);
                    System.out.println(post);
                    ls[finalI] = post;
                } catch (IOException e) {
                    e.printStackTrace();
                    ls[finalI] = e.getMessage();
                }
            }
        };
        Thread thread = new Thread( callable);
        thread.start();
    }
    Thread.sleep(50000);
    return "";
}

再次执行确认没有之前的Too many dynamic script compilations within one minute的异常返回了,说明这个问题解决成功了。

但是,接下来的是另一个问题,批量执行的时候最常见的409 Conflict的问题。在这份demo代码中,大批量的同时对文档id为2的文档进行更新,则必然会发生409 Conflict的情况的,这个需要通过另外的手段来处理。《ElasticSearch使用_delete_by_query删除大批数据,及409 Conflict版本冲突问题处理

官方文档上专门一段内容提到过这个问题https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-scripting-using.html


Prefer parameters

The first time Elasticsearch sees a new script, it compiles it and stores the compiled version in a cache. Compilation can be a heavy process.

If you need to pass variables into the script, you should pass them in as named params instead of hard-coding values into the script itself. For example, if you want to be able to multiply a field value by different multipliers, don’t hard-code the multiplier into the script:

  "inline": "doc['my_field'] * 2"

Instead, pass it in as a named parameter:

  "inline": "doc['my_field'] * multiplier",
  "params": {
    "multiplier": 2
  }

The first version has to be recompiled every time the multiplier changes. The second version is only compiled once.

If you compile too many unique scripts within a small amount of time, Elasticsearch will reject the new dynamic scripts with a circuit_breaking_exception error. By default, up to 15 inline scripts per minute will be compiled. You can change this setting dynamically by setting script.max_compilations_per_minute.

翻译过来大意就是第一段的脚本内容每次multiplier参数变更的时候都需要重新编译,而第二种用了params传递参数的则只会编译一次