Skip to main content

S3 Source Ingestion

When you use S3 ingestion, Matano ingests data from your log source using an S3 bucket. You can either use a default Matano provided bucket or bring your own bucket if you have existing data that you would like to forward to your Matano data lake.

Using the Matano provided sources bucket

Matano creates a managed S3 bucket for you to use for S3 ingestion. You can use this bucket to ingest data into Matano.

To retrieve the value of the Matano sources bucket, use the matano info command. See Retrieving resource values.

When sending data to the Matano provided sources bucket, upload files to the <log_source_name> prefix where log_source_name is the name of your log source that you specified in your log_source.yml file.

For example, to upload data for a log source named serverlogs, you would upload data to the following key prefixes:

serverlogs/d007cb0d-7c00-43aa-b9a9-f7cc37e780dc.json
serverlogs/80cd10db-7760-4e34-830d-b98342dd180a.json

If you are getting started, don't have existing data, or need your raw data to be a specific location, prefer using the Matano provided sources bucket.

Bringing your own bucket

If you have existing data, or need to have your raw data in a specific location, you can configure Matano to ingest data from a provided S3 location.

In your log_source.yml, specify the S3 Bucket, object prefix that your data is located at, and the access role ARN that Matano can use to subscribe & read data from your bucket:

# log_source.yml

ingest:
s3_source:
bucket_name: "my-org-logs-bucket"
key_prefix: "my_key/mypath"
access_role_arn: "arn:aws:iam:123456789101:role/my-access-role-arn"

Configuring a Matano S3 Access Role

If you are bringing your own bucket, you need to ensure that you have correctly set up permissions by creating an access role that Matano can assume and configuring bucket notifications to an SNS topic in your account that Matano will subscribe to.

To simplify the process of deploying the necessary resources for an access role in a satellite account (account other than the one in which Matano is deployed), we recommend using one one of the following Matano distributed template files to deploy via IaC (CloudFormation/Terraform) or the AWS Console UI (CloudFormation):

Note: If you would like to setup the role and accompanying resource manually, you can follow the Manual Setup tab.

CloudFormation: Setup a Matano S3 Cross Account Access Role

1) Click Download to save the CloudFormation template file. This will be used to setup the necessary resources and the access role in the account that contains the bucket you are onboarding to Matano.

2) Fill out the required input parameters MatanoMasterAccountId (the account in which Matano is deployed) as well as BucketName01 to BucketName20 (up to 20 buckets), filling in the name for each bucket that you would like to be onboard from the account at this time.

3) Deploy the CloudFormation stack in your AWS account, and note the value for the output parameter RoleArn to use as the ingest.s3_source.access_role_arn in your log source configuration.

When you have finished setting up the Matano S3 Access Role in your account, use the output role ARN to setup your log source (ingest.s3_source.access_role_arn key of the log source configuration) in Matano 🎉!

Note: You can only deploy the Matano log source after you have finished setting up the role, as Matano needs to assume this role to setup a subscription the first time the log source is created.

Advanced options

Using a key pattern to match non consecutive key patterns

The key_prefix configuration lets you specify a key prefix to match to a log source. However, you may want to use the same bucket source for multiple log sources and find that there is no simple consecutive prefix that matches a log source. In this case, you can specify a regex pattern in the ingest.s3_source.key_pattern configuration option. Matano will use this pattern to match an incoming key to a log source.

For example, in the following configuration:

# log_source.yml
ingest:
s3_source:
bucket_name: "my-org-logs-bucket"
key_prefix: "AWSLogs"
key_pattern: "AWSLogs/.*/CloudTrail"

A wildcard is used to match the account ID as part of the key pattern to the log source.

To specify minimum IAM identity permissions, Matano will continue to use the key_prefix configuration. If no key_prefix is provided, permission to read all objects in the source bucket will added to the identity policy.

This will allow the Matano system identity based policy to be able to decrypt ingestion data.

Expanding records (Custom, JSON Array, etc.)

When you send data to Matano, you need to communicate how Matano should split the data into individual records. Matano assumes your data is line delimited by default so if you are using a line delimited format like JSON Lines or CSV, Matano will automatically split your data and you do not need to provide any additional configuration.

If you data is not line delimited, you must tell Matano how to expand your data into records using a VRL expression. Provide the VRL expression under the ingest.expand_records_from_payload key in your log_source.yml, as follows:

ingest:
expand_records_from_payload: "parse_json!(.__raw).Records"

Your VRL expression will receive the raw payload as __raw and must return an array of records.

For example, if your data is a JSON document with following format:

{
"Data": [{ "name": "john" }]
}

You would use the following VRL expression to expand your data:

parse_json!(.__raw).Data