#whenuserstellonthemselves
Last night a rather inexperienced user of our cluster (#HPC #SLURM) said she was receiving messages that she was violating her QOS limits. Ok, a quick look and I found out that she was actually DoS'ing our controller:
# grep -c 'server_thread_count over limit' /etc/slurm/logs/slurmctldlog
105014
All 105,014 messages were logged from July 4, 17:32 until last night at 21:51.
The user was placing `squeue -u` commands inside of a for loop that had at least 20 iterations each. She also included multiple `sbatch` commands, too.
The best part? The account she was querying was another user. The even better part? A group of 5 students were all using the same submission script(s). In total, 94,594 jobs were submit in ~5 days.