Distributed workflow in microservices (Orchestration vs Choreography)
Orchestration and choreography are two fundamental approaches to managing interactions and workflows in a distributed system, like microservices. Understanding their benefits and drawbacks is essential for designing an efficient and scalable architecture.
Orchestration
In orchestration, there is a central coordinator (often called an orchestrator) that controls the interaction between services.
Benefits:
- Centralized Control: Simplifies understanding and managing the workflow, as the orchestrator provides a single point of control.
- Easier Error Handling: Error handling is a major part of many domain workflows, and it becomes easier. Retries if one or more domain services suffer from a short-term outage.
- Visibility and Monitoring: It’s easier to monitor and log the process as it’s controlled from a central point.
- State management: Having an orchestrator makes the state of the workflow queriable, providing a place for other workflows and other transient states.
Drawbacks:
- Single Point of Failure: The orchestrator can become a bottleneck and point of failure, impacting system resilience. (Redundancy can be added but with additional complexity)
- Scalability Issues: The central orchestrator might become overwhelmed with traffic, limiting scalability. All communication must go through the mediator, creating a potential throughput bottleneck that can harm responsiveness.
- Complexity and Coupling: This can lead to increased coupling between services and the orchestrator, hindering independent service scaling and evolution.
- Flexibility Limitations: Changes in workflow require changes in the orchestrator, potentially impacting all services involved.
Choreography
In choreography, each service in the system knows when to execute its operations and with whom to interact. There’s no central point of control; instead, each service works independently, often based on events.
Benefits:
- Decentralization: Reduces the risk of a single point of failure and avoids bottlenecks.
- Scalability: Each service can scale independently, improving the system’s overall scalability.
- Flexibility: Services can be updated, added, or removed with minimal impact on the overall system.
- Resilience: The failure of a single service has less impact on the entire process.
Drawbacks:
- Complexity in Monitoring and Debugging: Tracing and understanding the workflow can be challenging due to the lack of a central control point.
- Error handling: Without central coordination, there’s a risk of redundant or conflicting actions.
- State management: No centralized state holder hinders ongoing state management.
- Recoverability: recoverability becomes more difficult without an orchestrator to attempt retries and other remediation efforts.
Use case discussion
Orchestration: Online E-commerce Order Processing
Scenario: An online store processes customer orders, involving various microservices like order, inventory, payment, and notification.
Order Service: Starts the process when a customer places an order.
Inventory Service: Checks and reserves the ordered items.
Payment Service: Processes the payment.
Notification Service: Sends order confirmation and shipping updates to the customer.
The Order Service orchestrates the entire process. It first communicates with the Inventory Service to reserve items. Upon successful reservation, it instructs the Payment Service to process payment. Once payment is confirmed. Throughout this process, the Notification Service sends relevant updates to the customer. The orchestration ensures that each step is completed in sequence and manages any failures (like payment failure or inventory issues) by coordinating appropriate compensating transactions or notifications.
Choreography: Real-Time Data Processing in IoT (Internet of Things)
Scenario: An IoT system in a smart home, where various devices like sensors, lights, and thermostats send data to the cloud for analytics and alarm processing.
1)Temperature/Humidity Sensor from IoT device: Detects room temperature/humidity and publishes an event.
2)Telemetry gateway: Receives the telemetry event and routes it to the rule engine and time-series service.
3-a)Rule engine: Validate the temperature or humidity with a predefined threshold or machine learning algorithm and publish an event.
4-a)Alarm Service: Reacts to an event to generate an alarm and sends a command back to the device to adjust its configuration.
5-a)Notification service: Sends alerts to users and administrators for alarm.
3-b)Time-series service: Validates, enriches, and transforms the data to stream it.
4-b)Analytics: Stream data to a dashboard for live real-time analytics.
In an IoT system, a temperature/humidity sensor detects environmental conditions and sends this data to a telemetry gateway. This gateway forwards the information to two paths. One path involves a rule engine, which checks if the temperature or humidity crosses certain thresholds, possibly using machine learning. If thresholds are exceeded, it triggers an alarm service to generate an alert and instructs the device to adjust settings. Simultaneously, a notification service informs users and administrators about the alarm. The second path sends data to a time-series service, which processes and streams it for real-time analytics, displayed on a dashboard. In case of a malformed message or issues event will be sent to dead letter for further review and analysis.
What are the main challenges in implementation?
Orchestration
Overcentralization: If the order processing service becomes too centralized, handling all aspects of inventory, payment, and shipping, it becomes a single point of failure. For instance, if this service goes down, the entire order process grinds to a halt, affecting all customers and operations.
Inadequate Error Handling: Suppose the orchestrator doesn’t handle a payment failure properly. This could lead to an order being marked as complete without successful payment, causing financial discrepancies.
Ignoring Performance Impacts: If the orchestrator isn’t designed to efficiently handle high volumes of orders, especially during peak sale times, it could slow down the entire process, resulting in delayed orders and poor customer experience.
Complex Transaction Management: In a scenario where an item runs out of stock after the payment is processed but before the shipping is arranged, the system needs to handle this by either offering a substitution or a refund. Poor management here can lead to customer dissatisfaction and inventory inaccuracies.
Choreography
Event Chaos: If the temperature sensor constantly sends minor temperature fluctuations as events, and alarms are generated it can create alert fatigue, increased cost, and many potential device commands.
Lack of Monitoring and Observability: Without proper monitoring, if a motion sensor fails to trigger the alarm in case of a break-in, it might be challenging to diagnose whether the issue was with the sensor not detecting motion or the alarm service not responding to the event.
Inconsistent Communication Patterns: If the time-series service expects a certain data format from the temperature sensor but receives a different format, it may fail to react appropriately, affecting analytics and the machine learning process for an organization.
Real-world example
Uber uses orchestration for its ride-hailing services, where multiple services need to work in a coordinated manner to manage ride requests, driver location updates, and fare calculations.
- How It Works: An orchestration service coordinates these tasks, ensuring that when a ride request is made, the driver’s location is updated, the ride is matched, and the fare is calculated in a sequential and controlled process.
Netflix uses a choreographed approach in its microservices architecture for various operations like video streaming, recommendations, and user account management.
- How It Works: Each microservice operates independently, communicating via events. For example, when a new movie is added, an event is published by the content service, which is consumed by the recommendation service to update user recommendations without direct coordination.
Tools and technlogies
Orchestration
Spring Cloud: A set of tools for building cloud-native apps. Within Spring Cloud, several modules like Spring Cloud Netflix and Spring Cloud Kubernetes are used for microservices orchestration.
Kubernetes: Kubernetes can orchestrate containerized Java applications. It automates the deployment, scaling, and management of containerized applications.
Choreography
Apache Kafka: A distributed streaming platform that is often used for building real-time data pipelines and streaming apps. It’s robust and scalable, making it suitable for event-driven architectures in choreography.
RabbitMQ: A messaging broker that enables applications to communicate with each other through a message queuing system. It’s used for asynchronous processing and is effective for decoupled systems in a choreographed setup.
Summary
If an engineer has a workflow that needs a higher scale and typically has few error conditions, it's more tilting towards choreography. However, as workflow complexity goes up, the need for an orchestrator rises proportionally.
Ultimately, the sweet spot for choreography lies with workflows that need responsiveness and scalability, and either don’t have complex error scenarios or they are infrequent.
On the other hand, orchestration is best suited for complex workflows that include boundary and error conditions. While this style doesn’t provide as much scale as choreography, it greatly reduces complexity in most cases.