One of the things you will have to deal with when working on the web application is dealing with software downtime. It somehow just happens and in most cases will happen when you are sleeping or on a weekend when you are away from your computer, whenever you are away from having access to a laptop and never when you are online. That’s just the way it is.
After you have access to a computer, the priority is always to get the application back up. Customers have been waiting to access their data.
After the application is back up, it is essential to know why the application went down. I hope you collect as many logs as possible from the application about metrics you want to track and error logs from the application and the server. I hope those logs are rotated and cleared out frequently. I usually clear out logs after a month.
If you are running a web application, Nginx or apache will usually leave traces of their processes. Your server monitoring software will monitor for CPU and memory spikes and leave traces for why there was a spike. Your application performance monitoring software will have data about transactions that took place just before the software went down. I hope everyone running a web application has these monitoring setups on an application used by many people.
It’s good to know why the application went down. You can build measures around preventing it from happening again. Maybe it goes down again in future, but not for the reason it went down this time. Slowly overtime downtimes are less frequent. A few of my applications are now in the stage where they rarely go down. I still have all the monitoring application running. It’s always good to know the reason why.