elasticsearch filter，ElasticSearch- 單節點 unassigned_shards 故障排查-JavaScript-86后生记录生活

elasticsearch filter，ElasticSearch- 單節點 unassigned_shards 故障排查

2023-11-12 阅读 19 评论 0

摘要：在部署ELK的單機環境，當連接Kibana時候提示下面錯誤，即使重啟整個服務也是提示Kibana server is not ready. {“message”:“all shards failed: [search_phase_execution_exception] all shards failed”,“statusCode”:503,“error”:“Service Unavailable”

在部署ELK的單機環境，當連接Kibana時候提示下面錯誤，即使重啟整個服務也是提示Kibana server is not ready.

{“message”:“all shards failed: [search_phase_execution_exception] all shards failed”,“statusCode”:503,“error”:“Service Unavailable”}
排查過程#
前段時間ELK服務還是正常的，進入容器去ping ip 也都沒問題，服務也都是Up 狀態； ElasticSearch 服務也可以通過http://localhost:9200/ 訪問到，但是就是kibana 不能連接ElasticSearch

ELK

再查看 kibana 日志發現如下信息, 其中包含了no_shard_available_action_exception, 看起來是分片的問題。

{
“type”: “error”,
“@timestamp”: “2020-09-15T00:41:09Z”,
“tags”: [
“warning”,
“stats-collection”
],
“pid”: 1,
“level”: “error”,
“error”: {
“message”: “[no_shard_available_action_exception] No shard available for [get [.kibana][doc][config:6.8.11]: routing [null]]”,
“name”: “Error”,
“stack”: “[no_shard_available_action_exception] No shard available for [get [.kibana][doc][config:6.8.11]: routing [null]] :: {“path”:”/.kibana/doc/config%3A6.8.11",“query”:{},“statusCode”:503,“response”:"{\“error\”:{\“root_cause\”:[{\“type\”:\“no_shard_available_action_exception\”,\“reason\”:\“No shard available for [get [.kibana][doc][config:6.8.11]: routing [null]]\”}],routing [null]]"
}
通過 ES可視化工具-cerebro 查看

elasticsearch filter？cerebro

實際當時情況是"紅色"的，而不是目前看到的 “黃色”， heap/disk/cup/load 基本都是紅色的, 可能因為當時手動刪除了幾個index原因

黃色雖然kibana可以訪問ES了，但是黃色代表ES仍然是不健康的

查看單節點Elasticsearch健康狀態#
curl -XGET http://localhost:9200/_cluster/health?pretty

{
“cluster_name” : “elasticsearch”,
“status” : “red”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 677,
“active_shards” : 677,
“relocating_shards” : 0,
“initializing_shards” : 4,
“unassigned_shards” : 948,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 5,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 599,
“active_shards_percent_as_number” : 41.559238796807854
}
從上面的 unassigned_shards 可以存在大量分片沒有被分配，當時看到的實際有1000多個。

查詢 UNASSIGNED 類型的索引名字#
curl -XGET http://localhost:9200/_cat/shards

elasticsearch fielddata、UNASSIGNED

故障原因大概確定了，應該就是unassigned_shards導致的下面就看如何解決

解決方案#
如果是集群環境，可以考慮使用 POST /_cluster/reroute 強制把問題分片分配到其中一個節點上了

但是對于目前的單機環境，從上面截圖可以看出存在5個 unassigned 的分片，新建索引時候，分片數為5，副本數為1，新建之后集群狀態成為yellow，其根本原因是因為集群存在沒有啟用的副本分片。

解決辦法就是，在單節點的elasticsearch集群，刪除存在副本分片的索引，新建索引的副本都設為0。然后再查看集群狀態

通過如果下命令，設置number_of_replicas=0,將副本調整為0. 如下圖所示，es變成了“綠色”

elasticsearch painless、curl -XPUT ‘http://localhost:9200/_settings’ -H ‘content-Type:application/json’ -d’
{
“number_of_replicas”: 0
}’
Fix-UNASSIGNED

知識點#
副本分片主要目的就是為了故障轉移，如果持有主分片的節點掛掉了，一個副本分片就會晉升為主分片的角色。

所以副本分片和主分片是不能放到一個節點上面的，可是在只有一個節點的集群里，副本分片沒有辦法分配到其他的節點上，所以出現所有副本分片都unassigned得情況。因為只有一個節點，如果存在主分片節點掛掉了，那么整個集群理應就掛掉了，不存在副本分片升為主分片的情況。

參考#
https://www.datadoghq.com/blog/elasticsearch-unassigned-shards/#monitoring-for-unassigned-shards
https://stackoverflow.com/questions/19967472/elasticsearch-unassigned-shards-how-to-fix
https://www.cnblogs.com/ningskyer/articles/5986642.html
作者：IT胖

出處：https://www.cnblogs.com/FLY_DREAM/p/14269859.html

版權：本作品采用「署名-非商業性使用-相同方式共享 4.0 國際」許可協議進行許可。

elasticsearch nested？專注于 DevOps 方向，現尋找遠程兼職或合作機會，歡迎有意向的單位或個人私信我。

原文链接：https://808629.com/173479.html