In my ceph-to-production sprint I found that 4 monitors are not enough… It is a pair number and everybody knows what happens when there’s a pair number on a cluster and some of the quorum member fails :-)
So I decided to add 2 more for a total of 6. We have 2 CPD’s, If one of the CPD’s goes down, there will be 3 monitors online to elect the new master (I hope that works and not as the official documentation says)
That was the easy part…
I follow the Official nautilus version documentation for adding new monitors to a running cluster (after the node bootstrap):
export NEWSERVERS= ceph-deploy mon create ${NEWSERVERS}
Super Easy!!!
The result was not as expected:
2019-07-22 18:06:17.280 7f9e38817700 0 cephx: verify_authorizer could not decrypt ticket info: error: bad magic in decode_decrypt, 2866690901346661405 != 18366658746669136663
I couldn’t find the solution all over the web, just some evidences pointing to the keyring. So I tried the manual process.
In that document you’ll take the keyring from the running cluster of monitors:
export TMPDIR=~/joining_ceph mkdir ${TMPDIR} cd ${TMPDIR} sudo ceph auth get mon. -o keyfile.txt sudo ceph mon getmap -o mapfile.bin for i in ${NEWSERVERS} ; do scp -r "${TMPDIR}" ${i}:${TMPDIR} ; done
On the new node:
export TMPDIR=~/joining_ceph sudo ceph-mon -i $(hostname) --mkfs --monmap ${TMPDIR}/mapfile.bin --keyring ${TMPDIR}/keyfile.txt sudo chown ceph. -R /var/lib/ceph/mon # yes, this is manual run :-) /usr/bin/ceph-mon -f --cluster ceph --id $(hostname) --debug_mon 10 -c /etc/ceph/ceph.conf
Again, the result was not the expected:
2019-07-22 18:14:51.841 7fea1eceb700 0 mon.pro-cephm-001@0(leader) e3 handle_probe ignoring fsid 666365e0-6661-6667-6669-6667b0be423d != aefcf666-f666-4666-a666-666b432e4666
So… there’s something wrong with fsid…
ceph-mon
can use either --monmap ${TMPDIR}/mapfile.bin
or --fsid ${CLUSTERFSID}
, so I just change the mkfs command to:
sudo ceph-mon -i $(hostname) --mkfs --fsid ${CLUSTERFSID} --keyring ${TMPDIR}/keyfile.txt
And that finally worked!!!
The whole process is described here.