#alerts-generalare an important source of information about the health of the environment and should be monitored during working hours.
productiontracker. See production queue usage for more details.
The Situation Room Permanent Zoom. The Zoom link is in the
The Situation Room Permanent Zoomas soon as possible.
#production. If the alert is flappy, create an issue and post a link in the thread. This issue might end up being a part of RCA or end up requiring a change in the alert rule.
/pd triggerin the
@advocateshandle at the start of an incident.
@sre-oncall- at mention this usergroup in Slack and it will ping the current oncall.
#productionSlack Channel will tell you this with
/chatops run oncall prod.
/incident reportin Slack (e.g
#production) and follow the prompts. Please ensure that the severity of the incident warrants paging in the EOC and that you, as the reporter, stay online until EOC has had a chance to come online and get up to speed.
/incident declarein Slack (e.g
#production) and follow the prompts. The incident declaration is orchestrated through IMA (incident management automation) and has the following capabilities:
Degradedas any sustained 5 minute time period where a service is below its documented Apdex SLO or above it's documented error ratio SLO.
Outage(Status = Disruption) as a 5 minute sustained error rate above the Outage line on the error ratio graph
#incident-managementroom in Slack.
#incident-managementchannel for internal updates
Near misses are like a vaccine. They help the company better defend against more serious errors in the future, without harming anyone or anything in the process.