法律与合规
What is a Service Level Indicator (SLI)?
What is a Service Level Indicator (SLI)?
A Service Level Indicator (SLI) is a quantitative measure of a service’s quality or reliability. SLIs reflect the user experience, focusing on aspects such as request success, latency, or data delivery correctness. Common SLI categories include availability, latency, throughput, error rate, durability, freshness, and correctness. Think of an SLI as a raw measurement, like a speedometer, that informs SLOs and, subsequently, SLAs.
How do SLIs relate to SLOs and SLAs?
Although related, concepts of SLOs, SLIs, and SLAs serve different purposes and are at different levels of hierarchy.
- Hierarchy: This hierarchy ensures engineering efforts align with both customer expectations and overall business risk tolerance.
- Purpose: SLIs themselves are not goals; they provide the evidence to determine whether SLO goals are being met.
- Bridge: Error budgets provide a practical bridge between SLOs and release velocity.
|
概念 |
定义 |
|
SLI |
The raw measurements of service performance. |
|
The targets for those measurements (acceptable thresholds) are over a specific time window. |
|
|
SLA |
The contractual commitments to customers are often based on those targets and include penalties. |
How do you choose relevant SLIs?
Here are some points to keep in mind when selecting SLIs relevant to you:
- Think about the most important journey or capability of your user or service.
- Pick the ones that best show the user’s perspective of success. Select the ones with a very clear impact on user happiness.
- Get input from the three departments: engineering, product, and support, which will likely have different viewpoints.
- Determine 2-3 essential SLIs to begin with; update them periodically as your product, architecture, or user expectations change.
How do SLIs define performance thresholds?
Usually, SLIs define performance thresholds. The supporting parameters are:
- Values: Should align with the perceived quality of service provided.
- 合规性: Over a specified period, these metrics can be used to work out service level agreement compliance.
- 基准: A threshold could be “p95 login latency under 200 ms,” which gives an exact level of performance to evaluate against.
- 警报: These thresholds are the basis of SLO alerting when they are violated. An SLO miss is recorded specifically at that point in time.
How do SLIs measure availability, latency, and errors?
When thinking about measuring SLIs, keep in mind the following:
- Availability: This is a binary thing – is the service operational or not? (e.g., uptime divided by total time).
- Latency: Response time is the metric considered. An example can be the p95 response time.
- Errors: Measuring the frequency of failed requests could be done by determining the ratio of failed to total requests.
- Clearly define the criteria for what is considered a “good” or “valid” event.
- Measure from the user’s perspective when possible.
How do SLIs reflect data quality?
SLIs in data systems are used to assess (if data is fresh, accurate, complete, and long-lasting primarily). They reflect the overall quality of data. Data quality SLIs lie at the core of effective monitoring, ensuring that data pipelines, datasets, and other data assets are suitable for analytics, reporting, and AI/ML workflows.
Consider including lineage and validation checks to support data-quality SLIs and provide more detailed insights.
How frequently should SLIs be measured?
To detect significant changes in service performance, SLIs should be completed frequently enough (but also in a way that noise is reduced and sensitivity is not lost). The proper measurement frequency depends on the specific service, the error budget that has been defined, and the degree to which the user experience is affected.
Common Intervals:
- Continuous or near-real-time collection
- Aggregated windows of 1 minute, 5 minutes, or longer.
- The reasonable intervals are between every 10 seconds and several minutes.
什么是社会认同,它如何应用于SaaS?
社会认同是一种流行的心理策略,人们会考虑用户反馈和印象来做出不同的购买决策。
SaaS企业使用社会认同来证明他们的产品能够提供积极的体验并满足用户需求。
建立信任对于SaaS业务至关重要,因为客户寻求的是长期解决方案,而不是一次性购买。
What are common SLI metrics?
Consider observing the following:
- Golden Signals: Latency, traffic/throughput, errors, and saturation are frequently used as SLIs.
- User-Facing Services: Usually, availability, latency, error rate, and throughput are enough to gauge performance.
- Data Systems: Such systems require metrics that measure freshness, correctness, and durability.
- Strategy: Generally, it is best to choose a handful of important SLIs for each service to reduce noise and limit the overhead.
For payments specifically, consider the following ones:
- Payment Authorization Rate: The ratio of successful authorizations to total attempts (Correctness/Success).
- 购物车结账页面 Load Time: The time it takes for the hosted checkout page to become interactive (Latency).
- Webhook Delivery Freshness: The delay between a successful payment and the vendor’s server receiving the notification (Freshness/Latency).
结论
Service Level Indicators (SLIs) represent key quantitative indicators of quality and reliability from the user’s perspective. By wisely choosing and measuring SLIs such as availability, latency, and error rate levels, the team can delineate performance thresholds, plan support, and maintain data quality.