Ceph Troubleshooting: Adding new monitors to cluster

In my ceph-to-production sprint I found that 4 monitors are not enough… It is a pair number and everybody knows what happens when there’s a pair number on a cluster and some of the quorum member fails :-)
So I decided to add 2 more for a total of 6. We have 2 CPD’s, If one of the CPD’s goes down, there will be 3 monitors online to elect the new master (I hope that works and not as the official documentation says)

That was the easy part…

I follow the Official nautilus version documentation for adding new monitors to a running cluster (after the node bootstrap):

export NEWSERVERS=
ceph-deploy mon create ${NEWSERVERS}

Super Easy!!!

The result was not as expected:

2019-07-22 18:06:17.280 7f9e38817700  0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 2866690901346661405 != 18366658746669136663

I couldn’t find the solution all over the web, just some evidences pointing to the keyring. So I tried the manual process.
In that document you’ll take the keyring from the running cluster of monitors:

export TMPDIR=~/joining_ceph
mkdir ${TMPDIR}
cd  ${TMPDIR}
sudo ceph auth get mon. -o keyfile.txt
sudo ceph mon getmap -o mapfile.bin
for i in ${NEWSERVERS} ; do scp -r "${TMPDIR}" ${i}:${TMPDIR} ; done

On the new node:

export TMPDIR=~/joining_ceph
sudo ceph-mon -i $(hostname) --mkfs --monmap ${TMPDIR}/mapfile.bin --keyring ${TMPDIR}/keyfile.txt
sudo chown ceph. -R /var/lib/ceph/mon
# yes, this is manual run :-)
/usr/bin/ceph-mon -f --cluster ceph --id $(hostname) --debug_mon 10 -c /etc/ceph/ceph.conf

Again, the result was not the expected:

2019-07-22 18:14:51.841 7fea1eceb700  0 mon.pro-cephm-001@0(leader) e3 handle_probe ignoring fsid 666365e0-6661-6667-6669-6667b0be423d != aefcf666-f666-4666-a666-666b432e4666

So… there’s something wrong with fsid…

ceph-mon can use either --monmap ${TMPDIR}/mapfile.bin or --fsid ${CLUSTERFSID}, so I just change the mkfs command to:

sudo ceph-mon -i $(hostname) --mkfs --fsid ${CLUSTERFSID} --keyring ${TMPDIR}/keyfile.txt

And that finally worked!!!

The whole process is described here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.