porthole/docs/ceph_maintenance.md
Kavva, Jagan Mohan (jk330k) c9b9c5aaeb Edit nccli string to utilscli for Ceph Utility Container
Updated the nccli string to utilscli to avoid AT&T specific Network
cloud terminology.

Change-Id: I8dae02559a422dab0bdb8007daaa4f86a67f087e
2019-08-08 06:19:58 +00:00

65 lines
1.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Ceph Maintenance
This MOP covers Maintenance Activities related to Ceph.
## Table of Contents ##
<!-- TOC depthFrom:1 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->
- Table of Contents
- 1. Generic commands
- 2. Replace failed OSD
## 1. Generic Commands ##
### Check OSD Status
To check the current status of OSDs, execute the following:
```
utilscli osd-maintenance check_osd_status
```
### OSD Removal
To purge OSDs in down state, execute the following:
```
utilscli osd-maintenance osd_remove
```
### OSD Removal By OSD ID
To purge OSDs by OSD ID in down state, execute the following:
```
utilscli osd-maintenance remove_osd_by_id --osd-id <OSDID>
```
### Reweight OSDs
To adjust an OSDs crush weight in the CRUSH map of a running cluster, execute the following:
```
utilscli osd-maintenance reweight_osds
```
## 2. Replace failed OSD ##
In the context of a failed drive, Please follow below procedure. Following commands should be run from utility container
Capture the failed OSD ID. Check for status `down`
utilscli ceph osd tree
Remove the OSD from Cluster. Replace `<OSD_ID>` with above captured failed OSD ID
utilscli osd-maintenance osd_remove_by_id --osd-id <OSD_ID>
Remove the failed drive and replace it with a new one without bringing down the node.
Once new drive is placed, delete the concern OSD pod in `error` or `CrashLoopBackOff` state. Replace `<pod_name>` with failed OSD pod name.
kubectl delete pod <pod_name> -n ceph
Once pod is deleted, kubernetes will re-spin a new pod for the OSD. Once Pod is up, the osd is added to ceph cluster with weight equal to `0`. we need to re-weight the osd.
utilscli osd-maintenance reweight_osds