Twice now in the last 4 days at some point during the night websites go down because the server is unable to connect to the database. At this point everything else is still running (apache ect) just the database is dead.
When I log into ssh with root to investigate, I have read only permissions everywhere which is what I suspect is the cause of the mysql server dying.
I've checked the mysql logs, system logs, basically every log file I can find, nothing indicates an error anywhere around when the problems starts (or even during the entire day). It is like a switch is just flipped, I then restart the system and things are fine again... until a few days later?
There was 2G of free ram the last time this happened, 1.5G free the first time. Minimal cpu usage (< 30%).
Any ideas?
Answer
Disk errors are one possible cause of the "I have read only permissions everywhere" condition. Some types of hardware or kernel-level disk faults can lead to an inconsistent and corrupted filesystem, so the kernel will protectively force the filesystem into "read-only" mode when it detects such a fault. If the disk containing your root filesystem has a hiccup, anything trying to write to the disk would start to see permission errors. Programs that don't need to write to the disk (like Apache or SSHd) will probably continue to work just fine.
The next time the database fault happens, check your kernel log message buffer BEFORE YOU REBOOT for any indications of a disk error. You'll probably have to use the 'dmesg' command, because if your '/var/log' directory is part of the root filesystem, the syslog daemon won't be able to write the error messages to the '/var/log/messages' file on the disk. Also, the contents of the kernel log buffer will be lost when you reboot, so you might want to use 'ssh' or 'scp' to copy that data elsewhere.
Comments
Post a Comment