Ceph Troubleshooting: Debugging CRUSH maps

The last days I’ve been busy trying to find a “error” on my CRUSH map.
I found that some of my OSD’s where underused or unused at all… I didn’t know why, cause I built a CRUSH map from scratch with the common architecture based on datacenter, rack & cluster. And It was correct from the ceph point of view (It was running on the cluster).

I decided to simplify the map to a much simpler one.
Something like this:
Simple CRUSH map

Continue reading “Ceph Troubleshooting: Debugging CRUSH maps”

Ceph Troubleshooting: Adding new monitors to cluster

In my ceph-to-production sprint I found that 4 monitors are not enough… It is a pair number and everybody knows what happens when there’s a pair number on a cluster and some of the quorum member fails :-)
So I decided to add 2 more for a total of 6. We have 2 CPD’s, If one of the CPD’s goes down, there will be 3 monitors online to elect the new master (I hope that works and not as the official documentation says)

That was the easy part…

Continue reading “Ceph Troubleshooting: Adding new monitors to cluster”

Ceph (I): Basic Architecture

Recently I did a basic architecture document (and training) of ceph.

This document will give you the basics to understand the role of the Ceph architecture pieces like can be:

    • Monitor Nodes
    • Disk (aka OSD) nodes
    • Metadata Nodes
    • Gateway Nodes

And what the words “RADOS” and “CRUSH” means (again is a very basic definition)

Enjoy with the document/presentation!