月度归档: 2022年10月

Elasticsearch 设置字段默认值

本期新开发需求,其中给ES索引新增了一个字段,对应的,历史数据中这个字段的值就是null了,那么就需要给历史数据写上默认值。写了个执行语句如下

POST /your_es_index/your_es_index/_update_by_query
{
  "script": {
    "lang": "painless",
    "inline": "if (ctx._source.store_id == null) {ctx._source.store_id = 'default_store_id'}"
  }
}

用更新把null修改为目标默认值,查看下更新操作的变化

GET /your_es_index/your_es_index/_search
{
  "query": {
    "exists": {
      "field": "store_id"
    }
  }
}

执行前结果83条,执行后478条,ES索引中总记录条数478条,说明确实都刷成默认值了,测试环境数据比较少

不过这里还踩了坑,一开始不是这么写的,如下

POST /your_es_index/your_es_index/_update_by_query
{
  "script": {
    "lang": "painless",
    "source": "if (ctx._source.store_id == null) {ctx._source.store_id = 'default_store_id'}"
  }
}

一开始写的source,执行之后得到报错异常信息

{
 "error":{
  "root_cause":[{
   "type":"parse_exception",
   "reason":"expected one of [inline], [file] or [stored] fields, but found none"
  }],
  "type":"parse_exception",
  "reason":"expected one of [inline], [file] or [stored] fields, but found none"
 },
 "status":400
}

根据给出的信息“expected one of [inline], [file] or [stored] fields, but found none”,ES系统期望能接收到的字段为inline、file或者stored,查阅了相关资料后,判定为ES版本区别的原因,改为inline字段后就可以正常执行了。

相关信息

https://stackoverflow.com/questions/67488446/elasticsearch-bulk-update-geo-location-of-all-documents-with-curl

it looks like you're running an older version of ES. Try the command below which simply replaces source by inline as it was the norm in older versions

https://elasticsearch.cn/question/6458

其他的一些操作方法 https://blog.csdn.net/laoyang360/article/details/119012322


10月26,更新下,在生产环境执行的结果

执行后报了个异常

2022-10-25 17:10:48  java.io.IOException: listener timeout after waiting for [300000] ms
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:912)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:327)
......

超时了

由于变更执行语句需要重新走审批流程,所以先尝试重新执行了下,报了另外一个版本冲突的异常,其中一些信息我手动做了脱敏处理

2022-10-25 17:53:55  org.elasticsearch.client.ResponseException: method [POST], host [http://10.***.***.31:9***], URI [/the_es_*******_index/the_es_*******_index/_update_by_query], status line [HTTP/1.1 409 Conflict]
{"took":29772,"timed_out":false,"total":20462314,"updated":33984,"deleted":0,"batches":34,"version_conflicts":16,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"the_es_*******_index","type":"the_es_*******_index","id":"103141993495926871","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][103141993495926871]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"167146670228584497","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][167146670228584497]: version conflict, current version [4] is different than the one provided [3]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"100618429327498328","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][100618429327498328]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"168276494396132973","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][168276494396132973]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"164741654402007284","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][164741654402007284]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"166332349807514318","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][166332349807514318]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"101164299438739744","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][101164299438739744]: version conflict, current version [4] is different than the one provided [3]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"101444916306189041","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][101444916306189041]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"165709803817590009","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][165709803817590009]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"172236348446552109","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][172236348446552109]: version conflict, current version [4] is different than the one provided [3]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"171252183551964717","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][171252183551964717]: version conflict, current version [4] is different than the one provided [3]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"94363645778869762","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][94363645778869762]: version conflict, current version [4] is different than the one provided [3]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"169670413328115540","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][169670413328115540]: version conflict, current version [6] is different than the one provided [5]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"94091161628589223","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][94091161628589223]: version conflict, current version [7] is different than the one provided [6]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"169393862032975651","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][169393862032975651]: version conflict, current version [5] is different than the one provided [4]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409},{"index":"the_es_*******_index","type":"the_es_*******_index","id":"168775992414826034","cause":{"type":"version_conflict_engine_exception","reason":"[the_es_*******_index][168775992414826034]: version conflict, current version [4] is different than the one provided [3]","index_uuid":"GRt4moPETEeYEdxhhkPOxw","shard":"1","index":"the_es_*******_index"},"status":409}]}
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936)

咨询过平台服务的同事,超时只是客户端超时了,实际语句在系统中仍在继续运行。

通过执行查询语句,使用Elasticsearch的must_not来查询所修改字段仍无默认值的数据量

get /your_es_index/your_es_index/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "store_id"
        }
      }
    }
  }
}

确认到查询结果的数量是一直在变小的,那么确实是仍在运行的。

最终索引中总共2000多万条数据,执行消耗时间从下午5点到晚上8点半多执行完成,耗时3个半小时

最近写HIVE SQL的一点笔记【1】

几个点,一个个来

字段内容为JSON字符串,ETL过程中需要解析JSON字符串,并取出解析后的对象的某个值

处理方法,使用get_json_object()函数、get_json_object(string json_string, string path)

函数可以接收两个参数,第一个参数为需要解析的字段或者字符串,第二个字段为解析后取值的表达式,表达式固定以`$`符号开头。

举个栗子

直接查对象:

select get_json_object('{"abc":123}','$.abc') as result;

返回结果:

查询数组:

select get_json_object('["abc","bcd"]','$[1]') as result

返回结果:

也可以组合嵌套使用:

select get_json_object('["abc",{"bcd":123}]','$[1].bcd') as result
select get_json_object('{"def":["abc",{"bcd":123}]}','$.def[1].bcd') as result

如果查询一个不存在的key,则返回 null,比如

select get_json_object('{"aaa":["abc",{"bcd":123}]}','$.def[1].bcd') as result

或者解析的字符串格式不对,无法被解析成JSON对象,比如

select get_json_object('{"aaa}]}','$.def[1].bcd') as result