In this post I will show how to build simple data-enriching pipeline
using Kotlin coroutines.
I will use Channels and Actors abstractions provided
by kotlinx-coroutines.
In Actors model “actors” are the universal primitives of concurrent computation.
In response to a message that it receives, an actor can: make local decisions,
create more actors, send more messages,
and determine how to respond to the next message received.
Actors may modify their own private state,
but can only affect each other through messages (avoiding the need for any locks).
Let’s start with high-level definition of the pipeline:
(👤Producer) -✉️→ 📬(👤Enricher) -✉️→ 📬(👤Updater)
RawData RichData
- Pipeline will have Producer Actor, which will get some raw data
from database or some mock data
and will send it to pipeline for enrichment.
- Then Enricher Actor will handle raw data object and add some attributes to it.
- Finally, Updater Actor will store enriched data to the database.
For the sake of simplicity, let’s implement squaring function:
we will take integers as raw data and will enrich them by squaring.
Let’s define our data model:
data class RawData(val value: Int)
Enriched data will be represented by data class RichData
:
data class RichData(val value: Int, val square: Int)
Following Actors model,
we will use Kotlin Actors to represent processing units in a pipeline
and Channels to communicate with Actors.
In Kotlin actors are implemented as part of kotlinx-coroutines library:
An actor is an entity made up of a combination of a coroutine,
the state that is confined and encapsulated into this coroutine,
and a channel to communicate with other coroutines.
A simple actor can be written as a function, but an actor with a complex state
is better suited for a class.