12. előadás: rollback stratégiák

Name: 12. előadás: rollback stratégiák
Uploaded: 2025-05-20T09:51:44.000Z
Duration: 2 h 36 min 30 s

How to Handle Software Release Failures?

Introduction to the Topic

The session begins with a light-hearted note, indicating it is the last opportunity for discussion today.

The speaker introduces the final topic: discussing not just successful software development but also failures and how to manage them effectively.

Continuous Integration and Delivery

The typical workflow involves writing code, creating unit tests, and integration tests, followed by an automated pipeline that produces results.

Continuous integration ends when pipelines run successfully, producing binary files; however, these are not yet released into production.

Transitioning to continuous delivery means that if everything goes well, updates automatically reach the client. This process requires careful planning and execution.

Importance of Reliability in Software

Until a product is released to clients, developers focus on integrating and testing code thoroughly; however, issues may still arise post-release.

Monitoring systems are crucial for identifying problems after deployment. If errors occur in production, they must be addressed swiftly.

Rollback Strategies

A common initial response to failure is rolling back to a previous version. However, this can be complex once a new version has been deployed.

The discussion will cover methodologies for managing rollbacks effectively and tools available for this purpose.

Key Considerations for Reliability

Quality Assurance (QA) aims at ensuring reliability; however, merely conducting internal tests does not guarantee long-term reliability from the user's perspective.

Users perceive reliability based on their experience with new releases. Delays in updates due to manual testing do not reflect positively on system reliability.

Real-world Implications of Software Failures

Once a new version is deployed, any bugs discovered by users initiate a countdown regarding system performance and user satisfaction.

For instance, if an e-commerce platform releases an update that disrupts transactions significantly within minutes of launch, it highlights the urgency of having effective monitoring and rollback strategies in place.

Deployment Strategies

Effective deployment isn't instantaneous; there are strategies involved in releasing updates publicly while minimizing risks associated with potential failures.

Monitoring Application Performance

Importance of Monitoring

The speaker emphasizes that the method of operation (how to go out and return) is less important than achieving effective monitoring.

Monitoring is deemed essential, having been touched upon in previous discussions regarding logging and observability.

Types of Monitoring

The focus should be on monitoring the application's operational health rather than tracking user actions or specific login events.

Infrastructure resources such as CPU, memory, risk, and bandwidth must be monitored continuously for optimal performance.

Application Performance Monitoring (APM)

APM stands for Application Performance Monitoring; it assesses whether an application meets its functional requirements.

Key metrics include latency, error rates, and throughput. For example, a payment system's transaction processing capacity can indicate performance changes post-software updates.

Real User Monitoring (RUM)

RUM allows observation of actual user behavior when users consent to being monitored. This differs from aggregated logs by focusing on individual user activities.

An example provided involves banking services where unusual patterns in user transactions could signal issues with new software releases.

Automation in Monitoring

Logging alone is ineffective without active monitoring; automation tools are necessary to alert teams about anomalies.

Threshold settings can trigger alerts if error rates exceed predefined limits, while AI-based detection systems analyze data for significant changes.

Tools for APM

Notable tools mentioned include New Relic and Datadog for application performance monitoring and log collection. These tools help manage infrastructure effectively.

Understanding Rollback Strategies in Software Development

Importance of Rollback in Development Workflows

The speaker discusses tools like Pingdom and Speed Curve, emphasizing the need for awareness about various performance measurement tools even if they haven't been used directly.

A well-defined workflow is crucial; it allows teams to respond effectively when performance metrics indicate issues, leading to potential rollbacks.

The term "rollback" is introduced as a strategy that may not always mean reverting to a previous state but serves as a conceptual framework for managing software releases.

Benefits of Having a Rollback Strategy

Developers often face stress when releasing software without backup options. Knowing there’s a rollback plan can alleviate this pressure.

The speaker compares the stress of software deployment to launching a Mars rover, highlighting the anxiety involved when no alternative plans exist.

A solid rollback strategy provides developers with peace of mind, allowing them to work more freely and innovate without fear of catastrophic failures.

Externalities of Calm Development Teams

A relaxed development environment fosters creativity and agility, encouraging teams to experiment with new technologies without fear.

When developers know there's a safety net (a rollback option), they are more likely to take risks that could lead to innovation.

Key Considerations for Implementing Rollbacks

Before discussing specific rollback strategies, it's essential to identify what needs rolling back—primarily focusing on software code and persistent data.

The simplest form of rollback involves reverting buggy code versions; however, this alone is often insufficient due to dependencies on other system components.

Challenges with Persistent Data During Rollbacks

Persistent data (like databases or configuration files) must be considered during rollbacks since simply reverting code may not resolve underlying issues.

Developers should ensure compatibility between old and new versions during migrations; failing this can complicate rollbacks significantly.

Migration Issues in ORM Frameworks

Migration and Infrastructure Management in Software Development

Importance of Correct Migration

If a migration is not executed correctly, it can lead to significant issues, such as losing user passwords when changing database schemas (e.g., deleting a column for passwords).

Infrastructure Considerations

The infrastructure running applications must be managed carefully during migrations. Containerized applications simplify this process by allowing easy replacement of old containers with new versions.

In cloud-native solutions, new software versions often introduce additional configuration requirements or resources that need to be accounted for during rollbacks.

Rollback Strategies

Tools designed for rollback and roll-forward strategies aim to address the complexities involved in managing infrastructure changes alongside software updates.

Deployment Strategies Overview

Different deployment strategies exist, some of which may not apply to smaller companies. For instance, large companies like Netflix have more complex rollout processes compared to smaller teams.

The discussion emphasizes starting with simpler strategies before moving on to more advanced ones that can still be applicable in smaller projects.

Immutable Infrastructure Concept

Immutable infrastructure refers to creating environments that cannot be modified after their initial setup. This approach prevents issues arising from version changes affecting existing infrastructures.

When deploying a new software version, instead of modifying the existing infrastructure, a completely new environment is created where the updated version is deployed without impacting the old one.

This strategy allows for quick rollbacks; if problems arise with the new version, reverting back involves simply switching back to the previous environment without complications.

Blue-Green Deployment Strategy

Blue-Green Deployment and Canary Release Strategies

Understanding Blue-Green Deployment

Blue-green deployment involves maintaining two separate infrastructures: an old one and a new one. Traffic is gradually redirected from the old to the new infrastructure, allowing both to coexist.

This method is not a strict cutover; it requires having two infrastructures ready. A sudden switch would not qualify as blue-green deployment.

A more strategic approach involves directing a small percentage of users (e.g., 5%) to the new infrastructure while monitoring performance before increasing traffic.

Exploring Canary Releases

The term "canary release" derives from canaries used in mines to detect toxic gases. If the canary dies, miners know it's unsafe, similar to how this strategy tests new features on a small user group first.

In this approach, functionality is rolled out to a small percentage of users without needing separate infrastructures. Users may see different features based on their access rights.

The essence of canary releases lies in monitoring feedback from early adopters (the "canaries") to prevent widespread issues for the majority of users.

Ringed Release Strategy

Ringed release is closely related to canary releases but focuses on user groups rather than percentages. It often includes alpha and beta testers who receive updates sequentially based on their testing phase.

For example, alpha testers get early versions first; if successful, beta testers follow suit before wider public release.

Unlike canary releases where everyone uses the same software version with feature toggles, ringed releases involve distinct versions that can be rolled back if necessary.

User Engagement in Testing

Engaging trusted users for feedback during development allows for pre-release testing without public backlash. These users are informed about potential issues with early versions they test.

This collaborative approach fosters trust and ensures that any problems identified by these users do not escalate into larger public scandals once released widely.

Canary Releases and Feature Flags in Software Development

Understanding Canary Releases

Canary releases are built on infrastructure, meaning they rely on existing hardware and software systems to function effectively.

Unlike traditional software updates, canary releases allow for testing new versions without significant infrastructural changes; users simply receive the updated version on their devices.

The integration of different versions must be managed intelligently, as users may not experience a clear separation between versions A, B, or C.

Mobile Application Rollouts

Mobile applications rarely operate independently; they typically require backend support to function correctly during rollouts.

Rollout and rollback strategies depend heavily on the technology stack and components involved in the application.

Introduction to Feature Flags

Feature flags (or toggles) allow developers to enable or disable features without requiring a full software update.

This approach facilitates releases that do not necessitate any changes in software or hardware infrastructure.

Benefits of Feature Flags

Feature flags help separate development readiness from marketing release timelines, allowing features to be tested while hidden from users until an appropriate time.

They enable teams to deploy fully developed features that remain inactive until activated by project managers at strategic moments (e.g., during sales events).

Managing Risks with Feature Flags

The use of feature flags allows for quick deactivation if a newly released feature does not perform well after deployment.

This mechanism ensures that development and release processes are decoupled, enhancing flexibility in managing user experiences.

Tools for Implementing Feature Flags

Tools like "Configet" facilitate the management of feature flags by providing simple database queries that determine whether specific features should be visible to users.

Feature Flags and Their Importance

Understanding Feature Flags

Feature flags are minimal information features that provide responses to feature flag queries, ensuring high availability and redundancy in software applications.

Relying solely on a database for feature flags can lead to software failures if the API fails, causing issues with code execution due to unresponsive feature flags.

The value of these features lies in their quick integration into applications while eliminating potential query-related problems.

Tracking and User Identification

Many software solutions incorporate tracking alongside feature flags, allowing user identification during feature flag queries.

The distinction between minimal information feature flags and those requiring user identification is crucial; both can perform some level of user identification.

A/B Testing with Feature Flags

Product managers often use feature flags for A/B testing different versions of functionalities (e.g., button placement), analyzing which version performs better based on user interaction data.

Consistency in user experience is vital; users should not see conflicting versions of a feature when interacting with the application.

Use Cases for Minimal Information vs. Detailed Tracking

Minimal information approaches do not track specific users, suitable for industries where privacy is paramount or required by law.

In contrast, detailed tracking may be necessary in corporate environments where management wants insights into employee interactions with new features.

Technical Implementation Considerations

Implementations often utilize hashing algorithms to categorize users consistently without revealing personal data, ensuring fair distribution across test groups.

The mathematical operations applied to hash values determine how users are assigned to different functionality groups while maintaining consistency over time.

Shadow Deployment: Running Multiple Versions

Concept of Shadow Deployment

Shadow Deployment Strategy Explained

Overview of Shadow Deployment

Shadow deployment involves running two versions of software simultaneously, allowing for comparison and testing without impacting users.

This strategy is particularly useful when transitioning from an old version to a new one, ensuring both can operate concurrently for validation purposes.

Use Cases and Sensitivity

Shadow deployment is ideal in sensitive environments where immediate failure detection is crucial; it helps prevent significant issues before they affect users.

For example, minor UI changes (like moving a button) do not require shadow deployment since their impact can be quickly assessed.

Pricing Strategies and Risks

Complex pricing strategies necessitate careful monitoring during deployments; errors can lead to substantial financial losses if not managed properly.

A case study illustrates how a pricing error at EMAG resulted in massive losses due to incorrect listings, highlighting the importance of robust testing.

Implementation Mechanics

In practice, shadow deployment allows for real-time comparisons between old and new logic. If discrepancies arise in outputs (e.g., pricing), alerts are generated for immediate action.

The system logs any parameter combinations that yield different results, facilitating troubleshooting while maintaining user experience.

Benefits of Shadow Deployment

Real-Time Testing Advantages

Running tests in production provides immediate feedback on system performance; discrepancies trigger notifications for quick resolution.

While this method incurs additional computational costs (running two systems), its value lies in preventing major failures during critical operations.

Feature Flags: An Accessible Alternative

Introduction to Feature Flags

Feature flags are simpler to implement than shadow deployments and allow developers to toggle features on or off easily across various platforms.

They provide flexibility by enabling controlled feature releases without extensive development time or resources.

Time Bound Release Concept

Understanding Time Bound Releases

Contrary to its name, a time bound release functions as an automatic rollback mechanism rather than a traditional release process.

Rollback Strategies in Software Development

Understanding Rollback Mechanisms

When faced with high call volumes (e.g., 1500 clients), a system may automatically trigger a rollback at the end of the day if no contrary decision is made.

Tools like Spinnaker and Hardlest support rollout and rollback frameworks, making it easier to manage deployment strategies effectively.

Database Considerations for Rollbacks

The discussion shifts to how rollbacks can be managed from a database perspective, emphasizing that code rollbacks are relatively straightforward.

Code versioning systems simplify the process of reverting changes; however, infrastructure management requires different strategies.

Forward Compatibility in Database Migrations

Databases should be designed for forward compatibility during updates, allowing future software versions to interact with existing databases without issues.

A common example includes renaming columns; this practice is not forward compatible as it disrupts older software's ability to read new data structures.

Key Principles of Forward Compatible Migrations

The goal is for older software versions to read and write data in updated databases without losing access or functionality.

Typically, developers do not prepare for scenarios where future unknown software will write to the database; instead, they ensure that new versions accommodate legacy systems.

Strategies for Schema Changes

To maintain backward compatibility, rather than renaming columns, new columns are added alongside old ones. This allows both old and new software versions to function correctly with the database.

Shadow Tables as an Alternative Approach

Instead of modifying existing tables directly, shadow tables can be created. These tables mirror existing structures but include additional features or options.

Shadow tables operate parallelly with original tables, requiring careful programming or framework support (e.g., Change Data Capture tools like Debezium).

Database Versioning and Migration Strategies

Database Versioning Logic

Discusses the importance of supporting a versioned database logic within the ORM frameworks, emphasizing that it should be integrated into the database layer.

Highlights how ORM frameworks maintain their own tables to track current versions and migrations, noting that they do not concern themselves with the intersection of database and code versions during data operations.

Explains that when working with specific database versions, developers must ensure their code is prepared to handle potential discrepancies in functionality based on version compatibility.

Transactionality in Migrations

Introduces the concept of transactionality in migrations, stating that not all systems or databases support this feature.

Raises questions about handling partially successful migrations, particularly when certain operations fail due to constraints within the database schema.

Emphasizes that some operations like creating new tables may not be transactional across all database engines, which can lead to inconsistencies if a rollback is necessary.

GitOps and Release Management

Suggests using GitOps strategies for release management by integrating scripts into pipelines for automated deployment processes.

Describes how reverting changes can be managed through Git commits; if a new feature is undesirable, it can be reverted easily through established pipeline configurations.

Clarifies that GitOps emphasizes code-level operations where every change is tracked in Git, allowing seamless rollbacks via automated pipelines.

Automation and Monitoring Pipelines

Discusses setting up automated pipelines triggered by various events (e.g., time-based triggers), which can facilitate nightly builds or monitor metrics for performance issues.

Illustrates how pipelines can check out repositories, perform reverts as needed, and trigger redeployments automatically based on defined conditions.

Summary of Strategies

Presents an overview table comparing different strategies discussed throughout the session regarding their energy investment and impact on infrastructure.

Canary Releases and Rollback Strategies

Overview of Deployment Strategies

The discussion begins with the challenges faced during deployment, highlighting that issues can arise, necessitating a switch to another infrastructure for resolution.

Canary releases are introduced as a method where applications are rolled out gradually. This approach allows for quick adjustments but may complicate rollbacks if users have already received the new version.

Feature flags enable instant configuration changes, allowing product managers to toggle features on or off without extensive downtime or user disruption.

Importance of Monitoring and Observability

Emphasis is placed on the need for monitoring during deployments to ensure no critical functionalities fail, particularly in pricing features.

The speaker notes that rollback strategies are often not utilized; however, they must be reliable when needed. A proactive approach is essential to maintain confidence in development processes.

Best Practices for Rollback Implementation

It’s crucial to configure alert systems effectively to avoid notification fatigue, which can lead teams to ignore important alerts about potential issues.

Regular practice of rollback scenarios is recommended. Teams should simulate rollbacks periodically to ensure preparedness and familiarity with procedures.

Training and Preparedness

The importance of conducting drills or mock scenarios is highlighted. These exercises help teams understand their roles during an actual rollback situation.

A typical workflow involves randomly selecting releases in development environments for rollback practice, ensuring team members know how to respond effectively.

Final Thoughts on Rollback Strategy Planning

Understanding Service Integration and Rollback Strategies

Importance of Blast Radius in Service Deployment

The concept of "blast radius" is crucial when planning service deployments, as services do not operate in isolation.

In user-facing applications, new features can be released without prior communication; however, most cases involve integrated services that require coordination during rollouts.

If a new version fails and requires rollback, it's essential to inform integration partners about the issue promptly.

Communication with Stakeholders

When releasing updates, knowing whom to notify is vital. For client-facing updates, less communication may be needed compared to integrations where partners must be informed.

Establishing contact lists for stakeholders helps track who needs to be notified about releases and potential issues.

Tools and Technologies for Rollback Strategies

A table was mentioned that outlines various technologies used in rollback strategies based on the deployment method.

GitOps-based rollback strategies utilize automation pipelines effectively; tools like Helm are commonly used with Kubernetes for managing rollbacks.