Which of the following BEST describes the most important rationale for NOT seeking an SLO of 100% availability?
Comprehensive and Detailed Explanation From Exact Extract:
The SRE Book clearly states: ''A target of 100% availability is neither realistic nor economically viable at scale.'' Complex distributed systems inherently experience failures, network issues, hardware faults, and dependency outages. SRE emphasizes embracing this reality through error budgets, which assume some failure and allow engineering resources to be used efficiently.
The primary reason not to set 100% availability is that it is impossible to achieve reliably and leads to wasted engineering effort. SRE states: ''Chasing perfect reliability leads to dramatically increasing costs with diminishing returns.''
Option A captures this rationale precisely.
Options B, C, and D are secondary or incorrect interpretations and do not come directly from SRE principles.
Thus, A is the correct SRE-aligned answer.
Site Reliability Engineering, Chapter: ''Service Level Objectives.''
The Site Reliability Workbook, sections on Error Budgets and realistic SLOs.
What is the benefit of strategically burning the Error Budget to zero every month?
Comprehensive and Detailed Explanation From Exact Extract:
Burning the error budget to zero --- strategically, not accidentally --- helps ensure the correct balance between release velocity and system stability, which is the fundamental purpose of error budgets. Error budgets exist to encourage a healthy level of risk-taking up to the point where user experience is not impacted.
From the Site Reliability Engineering Book, SLO chapter:
''Error budgets provide a mechanism for balancing innovation and reliability by allowing measured risk-taking while ensuring user expectations are met.''
The SRE Workbook adds:
''Teams should aim to use their full error budget. Not using it implies missed opportunities to deliver features or improvements.''
This means that strategically burning the error budget to zero ensures:
Teams are shipping value at maximum safe velocity
Reliability goals are still respected
Risk is managed and intentional
Why other options are incorrect:
B Capacity measurement is unrelated to error budget consumption.
C Error budgets should not be continually revised unless business needs change.
D Conversations with partners may occur, but this is not the primary benefit.
Thus, the correct answer is A.
Site Reliability Engineering Book, ''Service Level Objectives''
SRE Workbook, ''SLO Engineering''
Microservices are independent services that are developed, deployed, and maintained separately.
Which of the following BEST justifies the use of this application architecture?
Comprehensive and Detailed Explanation From Exact Extract:
SRE supports microservices architecture because it improves reliability by reducing blast radius, allowing independent deployments, and enabling scalable autonomous teams. The SRE Book notes: ''Microservices enable teams to independently iterate and improve reliability without the constraints of large monolithic systems.'' (SRE Book -- Distributed Systems). One of the strongest reasons to adopt microservices is modernizing and refactoring large legacy monoliths, allowing them to be broken into independently deployable, maintainable components.
Option A is therefore the best justification.
Options B, C, and D may involve architectural choices, but they do not explain why microservices are the preferred architecture for reliability and scalability.
Thus, A is correct.
Site Reliability Engineering, Chapters on Distributed Systems and Microservice Reliability Patterns.
In a blameless post-mortem, those involved report
Comprehensive and Detailed Explanation From Exact Extract:
A blameless post-mortem is a foundational SRE practice that encourages truthful, detailed reporting after an incident. The purpose is to learn, not punish. Google SRE emphasizes that engineers must feel psychologically safe to report what they did, what they assumed, and why they made those decisions.
From the Site Reliability Engineering Book, Chapter ''Postmortem Culture'':
''Blameless postmortems encourage engineers to share the full details of their actions and assumptions without fear of punishment, enabling learning and preventing repeated failures.''
The book further states:
''Understanding the assumptions made during an incident is critical to uncovering systemic issues.''
Thus:
Engineers must report without fear of retribution
They must report assumptions and decisions made during the incident
Therefore, the correct answer is C. Both A and B.
Why the other options are insufficient:
A Only partially correct
B Only partially correct
D Testing data may be included, but it is not the defining feature of blameless postmortems
Site Reliability Engineering Book, ''Postmortem Culture''
SRE Workbook, ''Learning from Incidents''
Which of the following BEST describes a business continuity plan?
Comprehensive and Detailed Explanation From Exact Extract:
A Business Continuity Plan (BCP) is a critical component of organizational resilience. While not unique to SRE, SRE strongly intersects with continuity planning because reliable systems must continue functioning during disruptions. According to Google's SRE principles, reliability extends beyond typical outages and includes ''ensuring services continue to operate even under exceptional conditions.'' (SRE Book -- Chapter: Addressing Risks). A business continuity plan specifically outlines how essential operations are maintained during major disruptions such as natural disasters, data center outages, or large-scale system failures.
Option A---''The way the organization maintains operations during a disaster''---matches the formal definition of BCP.
Option B refers to disaster recovery (DR), which is separate; DR focuses on restoring systems, not maintaining ongoing operations.
Option C refers to configuration management activities, not continuity.
Option D refers to risk management, which informs BCP but does not define it.
Therefore, A is the correct answer because it directly reflects the purpose of continuity planning as supported by reliability-focused guidance.
Site Reliability Engineering: How Google Runs Production Systems, Chapters: ''Addressing Risks,'' ''Managing Critical State.''
The Site Reliability Workbook, Sections discussing resilience and continuity in distributed systems.
Jennifer Adams
5 days agoMelissa Young
25 days agoSandra Baker
1 month agoMaria Young
29 days agoCarol Anderson
1 month agoSandra Peterson
1 month agoAmy Stewart
1 month agoDennis Bailey
27 days agoPansy
2 months agoKenny
2 months agoBette
2 months agoEveline
3 months agoArdella
3 months agoAlonzo
3 months agoJuan
4 months agoFiliberto
4 months agoGraciela
4 months agoRonnie
4 months agoShanice
5 months agoBethanie
5 months agoMaile
5 months agoCherry
5 months agoFelicia
6 months agoJarod
6 months agoLisbeth
6 months agoBok
6 months agoDesmond
7 months agoElbert
7 months agoLonna
7 months agoRosendo
7 months agoPearlie
8 months agoDarell
8 months agoRoy
8 months agoHermila
8 months agoClorinda
8 months agoLemuel
8 months agoSelene
9 months agoTeddy
9 months agoMichal
9 months agoCasey
11 months agoTommy
11 months agoTammara
12 months agoDelsie
1 year agoGianna
1 year agoCarlton
1 year agoHoney
1 year agoGeraldo
1 year agoEric
1 year agoAlease
1 year agoAntonio
1 year agoFelix
1 year agoMiriam
1 year agoKasandra
1 year agoGennie
1 year agoAmina
1 year agoSkye
1 year agoRikki
1 year agoBulah
1 year agoNadine
1 year agoTom
1 year agoMyra
2 years agoTasia
2 years agoRolland
2 years agoAshley
2 years agoGaston
2 years agoLeonida
2 years agoValentine
2 years agoAlecia
2 years agoTony
2 years agoKristeen
2 years agoLachelle
2 years agoStephen
2 years agoLaine
2 years agoRaelene
2 years agoAnnamae
2 years agoYasuko
2 years agoNickie
2 years agoTess
2 years agoEstrella
2 years agoEmilio
2 years agoDorothy
2 years ago