Skip to main content

New - Streaming realtime enrichment in Matano

· 4 min read

Matano now supports realtime streaming enrichment for log sources, allowing you to enrich your data in realtime as it is ingested into Matano. This powerful new feature allows you to add contextual information directly into your data without the need for a join or lookup later on.

Enrichment overview

Enrichment in Matano refers to adding contextual information to your data. This can be anything from adding a user's name to adding geolocation information based on an IP address. Enrichment allows you to add context to your data, making it easier to understand and analyze. Matano already supports enrichment, in the form of enrichment tables.

Previously available enrichment mechanisms

Previously, these enrichment tables were only ingested into Apache Iceberg tables, allowing you to perform SQL joins, as well as being made available inside Python detections using a lookup helper method. While this were powerful features, they come with some limitations, as they require a separate lookup, which can cause performance issues and ergonomic challenges, especially in SQL as it requires writing a join even for a simple lookup.

How realtime enrichment works

Realtime enrichment works by allowing you to add enrichment data into your data during the transformation step. As a recap, Matano contains an embedded transformation engine that allows you to write transformation scripts using the Vector Remap Language (VRL). This transformation is run in realtime as data is ingested into Matano.

Realtime streaming enrichment

The new realtime enrichment feature works by adding a new function available inside your VRL transformation scripts, called get_enrichment_table_record. This function allows you to lookup a value by a key from an enrichment table.

The realtime enrichment feature is designed to be highly performant. The enrichment data is stored in a highly optimized custom format, and is cached in memory for fast lookups.

Because this lookup happens during the transformation step, the enrichment data is added directly into your data, allowing you to access it directly without having to perform any lookups or joins.

How to use realtime enrichment

To use realtime enrichment, we use the get_enrichment_table_record function inside our VRL transformation scripts. This function takes two arguments, the first is the name of the enrichment table, and the second is the value to lookup. The function returns a record, which can be used to access the fields inside the enrichment table.

For example, let's say we have an enrichment table called users, which contains the following fields (the enrichment table has a single lookup key on user_id):

  • user_id
  • user_name
  • user_email

We can use the get_enrichment_table_record function to lookup a user's name, given their user ID, like so:

user_info = get_enrichment_table_record("users", user_id)

An example of using realtime enrichment

Let's look at a concrete example. Say we have a log source that contains the following fields:

  • user_id
  • ip_address
  • timestamp

We want to enrich this data with the user's name, as well as the user's email address. We have an enrichment table called users, with a single lookup key on user_id, which contains the following fields:

  • user_id
  • user_name
  • user_email

We can use the get_enrichment_table_record function to lookup the user's name and email address, and add two new fields to our data, user.name and user.email, containing the user's name and email address, like so:

user_info = get_enrichment_table_record("users", .user_id)
.user.name = user_info.user_name
.user.email = user_info.user_email

Get started

You can start using the realtime enrichment feature today. Read the complete reference documentation here. We'll also be further expanding our enrichment capabilities in the future, including dedicated support for geolocation and IP address enrichment. Stay tuned for more updates!