Senior Software Engineer Microsoft
Senior Software Engineer Azure CXP AzRel Microsoft
BlueSky: @chris-ayers.com LinkedIn: - chris-l-ayers Blog: https://chris-ayers.com/ GitHub: Codebytes Mastodon: @Chrisayers@hachyderm.io Twitter: @Chris_L_Ayers
A system is considered "reliable" if it can consistently serve users under normal or abnormal conditions.
Source: New Relic 2024 Observability Forecast
https://uptime.is/five-nines
Establish resilience expectations before selecting technology:
Focus on failure domains, redundancy strategy, and dependency design before implementation.
Failure Examples
Active-Active: Multiple instances process requests simultaneously.
Active-Passive: Primary instance processes traffic; secondary is on standby.
Automated, policy-driven environments (Landing Zones, AVM, APRL) reduce variance & misconfiguration risk.
Customer Responsibilities:
Embed failure-aware logic: timeouts, retries, backoff, bulkheads, circuit breakers, idempotency, hedging.
Select patterns based on observed failure modes; measure impact via SLIs & error budget consumption.
Engineer for transient failure as the norm: fast fail, bounded retries, measurable outcomes.
Emphasize fast detection, validated recovery paths, and continuous improvement through data & drills.
Error Budget Response Guide:
Tools & frameworks that reduce variance, enforce policy, and speed safe delivery.
Utilize established Azure reference architectures for reliability.
Leverage available documentation and best practices for consistency and effectiveness.
Learn about designing mission-critical applications on Azure for high availability, reliability, and performance:
Guidance for web apps on Azure, offering prescriptive architecture, code, and configuration aligned with the Well-Architected Framework: