Good podcast https://www.infoq.com/articles/staff-engineers-impact-incidents/?utm_source=notification_email&utm_campaign=notifications&utm_medium=link&utm_content=&utm_term=weekly

  • Staff engineers can provide examples of – and coach teammates in – productive behaviors like transparency, admitting knowledge gaps, and questioning assumptions to help prevent incidents.
  • Bolstering a supportive, inclusive engineering culture provides another layer of defense against incidents. As culture stewards, staff engineers should continually invest in psychological safety.
  • Staff engineers have the skills to excel as incident commanders during outages, including coordination across workstreams, communicating with stakeholders, and preventing responder burnout.
  • Staff engineers should get involved in post-mortems to raise the quality of root cause analysis and push for pragmatic action items tied to culture gaps.
  • Improving the underlying cultural issues prevents more incidents than procedural gates.
    • Testing - the change wasn’t tested in a pre-production environment first to verify it worked as intended.
    • Code Review – the change was approved in the code review without any questions or discussion.
    • Deployment Verification - the change wasn’t verified after it had been deployed to production to make sure it was working as expected.