Skip to main content

3. Existing Solutions

There are several enterprise solutions which implement CDC systems for microservices. There are also a number of open-source tools that can be used in concert to create a comprehensive solution. Scalability, cost, and development effort need consideration when deciding which solution fits best for their purpose.

3.1 Enterprise Solutions

Although CDC can be used for implementation of the outbox pattern, most enterprise CDC solutions focus on traditional source-to-sink replication between data stores and have not often been designed for the specific use case of data syncing between microservices.

What is a sink?

CDC terminology uses the term “sink” to describe the destination to which data is transferred from a given source. Sinks can vary, but in CDC pipeline contexts are typically data warehouses (e.g. Snowflake), data lakes (e.g. Amazon S3), caches (e.g. Redis), or other databases that the source data should update in some manner. For example, a common CDC pattern would be capturing data from a relational database like PostgreSQL and transforming it for ingestion to a large data store like Snowflake for use in analytics.

However, there are several solutions, such as Confluent 1 and Striim 2, which do provide managed CDC for this case. Such solutions typically function well, however, they come with tradeoffs. These services are expensive and require recurring costs. Additionally, allowing the pipeline to be hosted by a managed service leads to decreased data privacy and less control over infrastructure.

Striim and Confluent Logos
Figure 1: Striim and Confluent.

3.2 DIY Solutions

An alternative to enterprise solutions is to build a DIY framework. DIY solutions can be built by utilizing open-source tools such as Debezium 3 and Apache Kafka 4, which offer extensive flexibility for data customization. Custom options include, but are not limited to schema evolution, data transformation, and topic customization. Building a DIY solution utilizing some of the aforementioned open-source tools may be a good fit for those teams that prefer to have more control over their infrastructure, with the option for extensive customizations in their CDC pipeline, while avoiding recurring costs from using an enterprise solution. These benefits come at the cost of managing the complex configurations of these tools, which may hinder a team's ability to deploy a CDC pipeline quickly. Without extensive research or experience in the problem domain and these technologies, even experienced developers will require considerable time to build a production ready DIY system.

Kafka and Debezium Logos
Figure 2: Apache Kafka and Debezium.

Footnotes

  1. Confluent Developer: Your Apache Kafka® Journey begins here. (n.d.). Confluent.

  2. "Striim (2024, October 28). Real-time data integration and streaming platform."

  3. "Debezium (n.d.) Debezium."

  4. “Apache Kafka. (n.d.). Apache Kafka.”