Linux System Administration
-
Resolving LVM Error: Device Excluded by a Filter During pvcreate
The Problem: The issue arose while provisioning a 2TB SAN LUN on a customer’s bare-metal Oracle Linux 8 server. The drive mounted as /dev/sdb but when running the command pvcreate /dev/sdb LVM responded with the error message “Device /dev/sdb excluded by a filter.” Nothing had been changed in the filter rules, but there was something on this specific drive that caused the LVM filter to block access…
Read More » -
Fixing Intermittent SSH Connection Closed by Remote Host Dropouts
The Problem: I would connect with SSH to a staging environment, begin the process of performing a prolonged database migration, then switch over to another terminal and later would have a disconnection from the server; I would now be told “Connection Closed by Remote Host” with absolutely no warning or chance for me to recover the migration that was midway through.…
Read More » -
Debugging Out of Memory (OOM) Killer Terminations via dmesg Logs
The first time I lost my MariaDB instance was around 3 AM. I woke up to see the monitoring dashboard completely flat. The server logs provided no indication of multiple access attempts. They represented zero in terms of activity. The only activity recorded was that the database server had been “panting” right before it was killed by the OOM (out-of-memory)…
Read More » -
Diagnosing High CPU Usage and File Corruption in systemd-journald
I distinctly recall the 7 am morning when I entered work and discovered a production server with all 16 cores sitting at 100 percent of their dedicated processing power. The entire dashboard monitoring this server was an all-red display, and my on-call mobile phone had already gone off on two different occasions prior to my arrival. Logging in to this…
Read More » -
Resolving Kernel Panic: VFS Unable to Mount Root File System After Update
You just completed apt upgrade. The server reboots hard stopping at “Kernel panic – not syncing: VFS: Unable to mount root fs on unknown-block(0,0)”. By now your Slack is a blaze & all you see is tons of messages on console, & none of them make sense. You know it is something between bootloader & initramfs and time is of essence.…
Read More » -
Deploying Auditd Rules for System Call Monitoring and Compliance
The moment I realized that we had been “flying blind” actually happened just after I received an admin ping from a junior admin about unusual CPU usage on one of our production web app servers. There was no other indication of an issue such as anomalous network traffic or a suspicious process running in “top”, all that was there were…
Read More » -
Implementing Secure Chroot Jails for SFTP Users via sshd_config
I’m haunted by the memory of that frantic phone call from a small business owner who needed help. His contract with a freelance developer was coming to an end in two weeks. The freelance developer requested access to the shared staging environment via SFTP to upload files, but the client wanted to limit their access and prevent them from snooping…
Read More » -
Automating Logical Volume Management (LVM) Disk Expansion via Ansible Playbooks
At 12 a.m., I remember looking at a dashboard and seeing a critical application down due to /var/lib/mysql disk space having reached 100%. I had put together an LVM resize script using Bash months ago, and the script had failed without a message indicating failure when I mistakenly omitted the pvresize command for the newly added disk. The result of that night was…
Read More » -
Architecting a High-Availability Load Balancer with HAProxy and Keepalived
A few months ago, my self-hosted Nextcloud instance went offline. It wasn’t due to a malfunctioning hard drive or a poorly designed network loop. The issue was caused by an HAProxy VM that became unresponsive. Until I went to the server and performed a hard reboot, I couldn’t access any of my services. That experience led me to implement redundancy…
Read More » -
Configuring Secure Centralized Logging Using Rsyslog and TLS Certificates
The memories of three separate SSH sessions I had to connect to three different virtual machines (VMs) in order to look through their ‘/var/log/auth.log’ files are pretty clear in my head as of now. I’m saying that I’m currently looking at the logs of three different VMs in order to determine what caused the failed brute-force attack on SSH. As…
Read More »