Droplet Out of Disk Space

I had been running a site on a Digital Ocean droplet for a little over two months, when suddenly, everything stopped working. After some quick digging, we discovered that the droplet was out of disk space. The site was barely being used, how was it out of disk space so quickly?

The server had 80GB of space, and the site on it was of decent size, but was only maybe 1 or 2 GB. It had a pretty big database, but no bigger than another 2 GB or so. Add in server OS files, and such, and we should only have been using around 6 – 8 GB of space, well under the 80 GB limit. We pulled a couple of big files off so we could work on the server again. First step was to find out where all the space was being used:

$ df -h 
Filesystem Size Used Avail Use% Mounted on 
udev       2.0G 0    2.0G  0%   /dev 
tmpfs      395M 704K 394M  1%   /run 
/dev/vda1  78G  76G  2G    97%  / 
tmpfs      2.0G 0    2.0G  0%   /dev/shm 
tmpfs      5.0M 0    5.0M  0%   /run/lock 
tmpfs      2.0G 0    2.0G  0%   /sys/fs/cgroup 
/dev/vda15 105M 3.6M 101M  4%   /boot/efi 
tmpfs      395M 0    395M  0%   /run/user/3828

It was clear that we were somehow using almost all of our space on our main drive, over 76 GB! I started by looking at the server logs. The logs were all pretty small. I took backups and then cleared them off of the server just in case. That barely gave us back 0.1 GB.

Next, I took a look at the folder sizes to see if I could identify where the space was being used, so I went to the root directory and ran:

du -sh *

This gives me the size of all of the folders in that top level directory. You can also run it on a single folder by replacing the asterisk with a directory name, like this:

du -sh /var

When I ran that, I got:

4.6 GiB [##########] /home
2.4 GiB [##### ] /var
1.7 GiB [### ] /usr
1.0 GiB [## ] swapfile
180.5 MiB [ ] /lib
76.4 MiB [ ] /boot
31.3 MiB [ ] /tmp
15.8 MiB [ ] /bin
14.9 MiB [ ] /sbin
11.4 MiB [ ] /opt
7.9 MiB [ ] /etc
700.0 KiB [ ] /run

Nothing out of the ordinary here, and everything is about what we expected. Next, I checked the size of the database:

SELECT table_schema "DB Name",
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) "DB Size in MB" 
FROM information_schema.tables 
GROUP BY table_schema;

Note: If you want more information on this query, check this entry on the MySQL forums.

This gave me:

+--------------------+---------------+
| DB Name            | DB Size in MB |
+--------------------+---------------+
| mysql              | 2.6           |
| information_schema | 0.0           |
| performance_schema | 0.0           |
| sys                | 0.0           |
| main_b             | 1711.6        |
| side_db            | 261.4         |
| staging_main_db    | 714.3         |
| staging_side_db    | 254.7         |
+--------------------+---------------+
8 rows in set (0.35 sec)

Again, this is all pretty expected. Nothing really out of the ordinary here. Some big tables, but nothing gigantic that would be taking more than a GB or two. So what was using all of the space?

After doing some more research, I discovered that depending on how MySQL is configured on the server, logging could be an issue. I had already checked the server logs, but I discovered that MySQL can keep a separate set of logs called binary logs. Binary logs keep a record of all of the transactions and data modifications that happen within your MySQL server. Our databases were of reasonable size, but we had lots and lots of changes happening all the time, especially on our staging servers where we were testing data imports. We were creating hundreds of thousands of records, deleting them, and creating them again over and over again not to mention the normal high usage of our live database by the customers. I checked the status of the binary logs by logging into MySQL and running:

mysql> show binary logs;
+---------------+------------+-----------+
| Log_name | File_size | Encrypted |
+---------------+------------+-----------+
| binlog.000185 | 616772951 | No |
| binlog.000186 | 881295737 | No |
| binlog.000187 | 906990972 | No |
| binlog.000188 | 977310172 | No |
| binlog.000189 | 670360359 | No |
| binlog.000190 | 940248170 | No |
| binlog.000191 | 686950433 | No |
| binlog.000192 | 339544067 | No |
| binlog.000193 | 496748286 | No |
| binlog.000194 | 333139054 | No |
| binlog.000195 | 261974523 | No |
| binlog.000196 | 281051743 | No |
+---------------+------------+-----------+
218 rows in set (0.12 sec)

This was a lot of logs. I knew it was safe to delete any logs on the staging server as that was just a total playground area for us and would be completely deleted once we were done with the current phase of this project. I logged into MySQL and ran a query to purge all of the logs from before today:

mysql> PURGE BINARY LOGS BEFORE '2020-04-14 22:46:26'; 
Query OK, 0 rows affected (0.15 sec)

I ran df -h again to see how we were doing, and we were down 32GB of space! I then did some more research and found out that the binary logs are stored for 30 days by default. After consulting with the team and the client, we determined that we only needed to keep 7 days (one week) of logs on the live server and we did not need to keep logs for longer than 2 days on the staging server. To update the settings on the live server, I logged into MySQL and ran:

mysql> SET GLOBAL binlog_expire_logs_seconds = 604800;
Query OK, 0 rows affected (0.00 sec)

This set the global variable for expiring the binary logs to 604,800 seconds, which is 7 days. I then repeated the command for the staging server and set it to 172,800 or 2 days.

But then, we started having issues again. Turns out, our server was under a high enough usage that the logs could still pile up too high, even with the shorter expiration date. So we made the decision to disable the logging completely from servers that didn’t need it. We did this by editing the MySQL cnf file (this could someplace like /etc/mysql.cnf orĀ  /etc/mysql/mysql.conf/mysqld.cnf). We were using MySQL 8, so we added:

[mysqld]
disable_log_bin

This disabled the master and binary logging. To activate this, you need to reboot MySQL or the server. After that, we could confirm this was working by logging into MySQL and running:

mysql> show binary logs;
ERROR 1381 (HY000): You are not using binary logging

I could repeat the same thing for the master logs:

mysql> show master logs;
ERROR 1381 (HY000): You are not using binary logging

The logging was disabled now, so we did not need to worry about the binary logs crashing our server any longer.

However, our server was still filling up. We did some more investigating (running du -sh * as root in the top directory and then filtering down based on the largest directories) and found a directory called /var/lib/automysqlbackup which was taking up about half of our server space. It is an automatic backup utility for MySQL, but it is not recommended for use on servers with high database usage, and we already have a MySQL backup happening through Digital Ocean which is stored off of our server, so this was a completely unnecessary backup. I traced things back and discovered that there was a cron file being run daily:

/etc/cron.daily/automysqlbackup

Since we already had an off-server daily backup of our database, we did not need a second one happening on our server. I opened the automysqlbackup cron and commented out all of the lines.

I then went in to the directory where the backups were being stored:

/var/lib/automysqlbackup

and deleted all of the backup files out of the daily, weekly, and monthly directories. Within each of these three directories, there was a directory for each of our databases with the backup files within each one.

I kept looking around, and I found /var/log/journal was almost 2GB big. We definitely don’t need that large of a log file store on a dev server (or possibly even a live server, depending on the site). So I ran:

journalctl --vacuum-size=500M

This cleaned up everything except the most recent 500M and reset the limit to 500M in the future, so we don’t have to worry about these getting too large in the future.

One last thing I noticed was that even though I had purged the binary logs and set it to not log anymore, there were still about 15 binary logs left on the server. Using the PURGE query did not get rid of them. I ended up having to edit binlog.index manually and then deleting the files by hand. I could not find any other recommendations online for how to get rid of them when PURGE doesn’t work, but I would not recommend deleting them by hand on an important server as I have no idea what kinds of consequences this may have.

Related links:

 

Leave a Reply

Your email address will not be published. Required fields are marked *