Legal e Conformidade
What is a Service Level Indicator (SLI)?
What is a Service Level Indicator (SLI)?
A Service Level Indicator (SLI) is a quantitative measure of a service’s quality or reliability. SLIs reflect the user experience, focusing on aspects such as request success, latency, or data delivery correctness. Common SLI categories include availability, latency, throughput, error rate, durability, freshness, and correctness. Think of an SLI as a raw measurement, like a speedometer, that informs SLOs and, subsequently, SLAs.
How do SLIs relate to SLOs and SLAs?
Although related, concepts of SLOs, SLIs, and SLAs serve different purposes and are at different levels of hierarchy.
- Hierarchy: This hierarchy ensures engineering efforts align with both customer expectations and overall business risk tolerance.
- Purpose: SLIs themselves are not goals; they provide the evidence to determine whether SLO goals are being met.
- Bridge: Error budgets provide a practical bridge between SLOs and release velocity.
|
Conceito |
Definição |
|
SLI |
The raw measurements of service performance. |
|
The targets for those measurements (acceptable thresholds) are over a specific time window. |
|
|
SLA |
The contractual commitments to customers are often based on those targets and include penalties. |
How do you choose relevant SLIs?
Here are some points to keep in mind when selecting SLIs relevant to you:
- Think about the most important journey or capability of your user or service.
- Pick the ones that best show the user’s perspective of success. Select the ones with a very clear impact on user happiness.
- Get input from the three departments: engineering, product, and support, which will likely have different viewpoints.
- Determine 2-3 essential SLIs to begin with; update them periodically as your product, architecture, or user expectations change.
How do SLIs define performance thresholds?
Usually, SLIs define performance thresholds. The supporting parameters are:
- Values: Should align with the perceived quality of service provided.
- Conformidade: Over a specified period, these metrics can be used to work out service level agreement compliance.
- Referência: A threshold could be “p95 login latency under 200 ms,” which gives an exact level of performance to evaluate against.
- Alertas: These thresholds are the basis of SLO alerting when they are violated. An SLO miss is recorded specifically at that point in time.
How do SLIs measure availability, latency, and errors?
When thinking about measuring SLIs, keep in mind the following:
- Availability: This is a binary thing – is the service operational or not? (e.g., uptime divided by total time).
- Latency: Response time is the metric considered. An example can be the p95 response time.
- Errors: Measuring the frequency of failed requests could be done by determining the ratio of failed to total requests.
- Clearly define the criteria for what is considered a “good” or “valid” event.
- Measure from the user’s perspective when possible.
How do SLIs reflect data quality?
SLIs in data systems are used to assess (if data is fresh, accurate, complete, and long-lasting primarily). They reflect the overall quality of data. Data quality SLIs lie at the core of effective monitoring, ensuring that data pipelines, datasets, and other data assets are suitable for analytics, reporting, and AI/ML workflows.
Consider including lineage and validation checks to support data-quality SLIs and provide more detailed insights.
How frequently should SLIs be measured?
To detect significant changes in service performance, SLIs should be completed frequently enough (but also in a way that noise is reduced and sensitivity is not lost). The proper measurement frequency depends on the specific service, the error budget that has been defined, and the degree to which the user experience is affected.
Common Intervals:
- Continuous or near-real-time collection
- Aggregated windows of 1 minute, 5 minutes, or longer.
- The reasonable intervals are between every 10 seconds and several minutes.
O que é prova social e como ela se aplica ao SaaS?
A prova social é uma tática psicológica popular em que as pessoas consideram o feedback e as impressões dos usuários para tomar diferentes decisões de compra.
As empresas de SaaS usam a prova social para demonstrar que seus produtos oferecem experiências positivas e atendem às necessidades dos usuários.
Construir confiança é fundamental para os negócios de SaaS, pois os clientes buscam soluções de longo prazo em vez de compras únicas.
What are common SLI metrics?
Consider observing the following:
- Golden Signals: Latency, traffic/throughput, errors, and saturation are frequently used as SLIs.
- User-Facing Services: Usually, availability, latency, error rate, and throughput are enough to gauge performance.
- Data Systems: Such systems require metrics that measure freshness, correctness, and durability.
- Strategy: Generally, it is best to choose a handful of important SLIs for each service to reduce noise and limit the overhead.
For payments specifically, consider the following ones:
- Payment Authorization Rate: The ratio of successful authorizations to total attempts (Correctness/Success).
- Finalizar compra Load Time: The time it takes for the hosted checkout page to become interactive (Latency).
- Webhook Delivery Freshness: The delay between a successful payment and the vendor’s server receiving the notification (Freshness/Latency).
Conclusão
Service Level Indicators (SLIs) represent key quantitative indicators of quality and reliability from the user’s perspective. By wisely choosing and measuring SLIs such as availability, latency, and error rate levels, the team can delineate performance thresholds, plan support, and maintain data quality.