Ceph Troubleshooting: Debugging CRUSH maps

The last days I’ve been busy trying to find a “error” on my CRUSH map.
I found that some of my OSD’s where underused or unused at all… I didn’t know why, cause I built a CRUSH map from scratch with the common architecture based on datacenter, rack & cluster. And It was correct from the ceph point of view (It was running on the cluster).

I decided to simplify the map to a much simpler one.
Something like this:
Simple CRUSH map

Continue reading “Ceph Troubleshooting: Debugging CRUSH maps”

Ceph Troubleshooting: Adding new monitors to cluster

In my ceph-to-production sprint I found that 4 monitors are not enough… It is a pair number and everybody knows what happens when there’s a pair number on a cluster and some of the quorum member fails :-)
So I decided to add 2 more for a total of 6. We have 2 CPD’s, If one of the CPD’s goes down, there will be 3 monitors online to elect the new master (I hope that works and not as the official documentation says)

That was the easy part…

Continue reading “Ceph Troubleshooting: Adding new monitors to cluster”

Cloud-init: The great forgotten

A little time ago (more than 1 year ago) I began investigating cloud-init. I saw some redhat paper talking about cloud-init and it seemed to be really powerful simplifying massive vm deployments.
Someone close to me told me that: “don’t lose your time, we’ll use terraform/docker/k8s/whatever

But the inception was already done, I read the documentation and started testing the technology.

What I’ve seen is that cloud-init is everywhere: I think that all linux “cloud” vm’s are using it. It’s really sturdily and simple, it does what it is supposed to do… That is part of its greatness and of its weakness.

The good part is well known: cloud-init service starts when the vm starts and does what you tell it to do through a YAML script: Installs software, create users, perform basic configs…
Its weakness is that cloud-init is a very simple software designed for the cloud, if your cloud architecture is not standard, you will have to make some tricks to bypass them.

For example, I was not using dhcp for VM’s networking and booting a VM with cloud-init without dhcp is really tricky… You can see a YAML script for static network here:

All that has given me a background and a global vision to understand the inner technology used on “cloud” platform (any cloud platform)… it seems that time has proved me right :-)

Ceph (I): Basic Architecture

Recently I did a basic architecture document (and training) of ceph.

This document will give you the basics to understand the role of the Ceph architecture pieces like can be:

    • Monitor Nodes
    • Disk (aka OSD) nodes
    • Metadata Nodes
    • Gateway Nodes

And what the words “RADOS” and “CRUSH” means (again is a very basic definition)

Enjoy with the document/presentation!

Problems after upgrade mariadb

As you should know, there’s an important bug on TLS libs on every linux OS.
So upgrade is mandatory.
There’s also another little update in mariadb if you’re using it instead of mysql.
The problem is there, but not the indicator.

The problem in my case comes from the smtp server, I got the error:

535 5.7.8 Error: authentication failed: authentication failure

While tring to send an email (only sending, receiving works fine).
Looking in mail.log:

Mar  6 10:49:06 ciberterminal postfix/smtpd[29949]: warning: SASL authentication failure: Password verification failed
Mar  6 10:49:06 ciberterminal postfix/smtpd[29949]: warning: localhost.localdomain[127.0.0.1]: SASL authentication failed: authentication failure

Auth failed from SASL? WTF!
So i began debuging it:

/usr/sbin/saslauthd -a pam -c -m /var/spool/postfix/saslauthd -r -n 1 -d -VVVVVVVVVV

But it only shows:

do_auth: auth failure: [user=dodger@ciberterminal.net] [service=smtp] [realm=ciberterminal.net] [mech=pam] [reason=PAM auth error]

Anything else.
Searching internet, I saw that post, I forgot to look for errors on auth.log!
Here it is:

Mar  7 10:48:41 ciberterminal saslauthd[4552]: PAM unable to dlopen(pam_mysql.so): /lib/security/pam_mysql.so: symbol make_scrambled_password, version libmysqlclient_18 not defined in file libmysqlclient.so.18 with link time reference
Mar  7 10:48:41 ciberterminal saslauthd[4552]: PAM adding faulty module: pam_mysql.so
Mar  7 10:48:41 ciberterminal saslauthd[4552]: DEBUG: auth_pam: pam_authenticate failed: Module is unknown
Mar  7 10:48:41 ciberterminal saslauthd[4552]: do_auth         : auth failure: [user=trian@ciberterminal.net] [service=smtp] [realm=ciberterminal.net] [mech=pam] [reason=PAM auth error]

Continue reading “Problems after upgrade mariadb”

Script to check Oracle remotely (nagios or whatever)

For all of you that are not the lucky owner of an Enterprise Manager license or you simply want to use nagios or another monitoring engine to get status and graphs of oracle, probably you’ll be using the “check_remote_oracle” plugin.

If you’re a scripter/developer maybe you’ll understand me when you open an script and see an COMPLETELY UNREADABLE code, with no functions, no indentation… etc
And then you’ll have to change anything stupid inside the script and then f**k it does not work!

This is the story of that script, I installed it, then I have to do some changes… and then I decided to rewrite it completely…
I’m not the best scripter of the world, but I know how to use functions, indentation and that useless shit :-P

You can read the whole documentation on its wiki page.

Enjoy!

PS: updated the instructions, I forgot the information about the unprivileged user for connecting to Oracle instance.
PS2: Updated again, new control!