To actually answer your question, you need some kind of job scheduling service that manages the whole operation. Whether that’s SSM or Ansible or something else. With Ansible, you can set a parallel parameter that will say that you only update 3 or so at a time until they are all done. If one of those upgrades fails, then it will abort the process. There’s a parameter to make it die if any host fails, but I don’t recall it right now.
Join us on c/buttcoin@awful.systems