Skip to main content

Dataset Samples

Both Nexla API and Nexla CLI support methods to fetch a sample set of records from any dataset. The returned samples may come from a live sample of the corresponding dataset, or a cached copy, depending on the status of that dataset.

Furthermore, by modifying the payload request you can choose to fetch different types of information about the sampled data.

Fetch Input and Output Samples

Each Nexla dataset processes data by applying transforms to an input set of records to create corresponding output records. You can choose to fetch input and corresponding output samples of a dataset to view how samples got processed by the dataset. The input item is a sample data object matching the source schema or parent data set output schema. The output item is the same sample after the data set's transforms (if any) have been applied.

Fetch Input & Output Samples: Request
GET /data_sets/{data_set_id}/samples
Fetch Input & Output Samples: Response
[
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
}
},
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
}
},
{
"input": {
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
},
"output": {
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
}
}
]

Fetch Input and Output Samples

Call methods below to receive only output samples from the data set, i.e. records to which the data set's transforms have been applied.

Fetch Output Samples: Request
GET /data_sets/{data_set_id}/samples?output_only=true
Fetch Output Samples: Response
[
{
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
{
"Date": "2016-12-11",
"ShortVolume": "32.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "32.0"
},
{
"Date": "2016-12-11",
"ShortVolume": "42.0",
"ShortExemptVolume": "0.0",
"TotalVolume": "42.0"
}
]

Fetch Samples with Metadata

Nexla associates metadata with each record that flows through it. You can choose to fetch samples with their associated metadata by setting include_metadata=1 request parameter in Nexla API or --metadata option in Nexla CLI. In this case, each sample will be prepacked with two attributes: rawMessage that contains the actual record data, and nexlaMetaData that contains the corresponding metadata for that record.

Fetch Samples With Metadata: Request
GET /data_sets/{data_set_id}/samples?include_metadata=1
...
Must Have Parameter: include_metadata=1
Other Optional Parameters:
count: Total number of samples to fetch
output_only=1: Fetch only output records from a dataset

Fetch Samples With Metadata: Response
[
{
"input": {
"rawMessage": {
"product_id": 1234,
"price": 5.0
},
"nexlaMetaData": {
"trackerId": "u9704::test/test_201808.csv:1:1:1:1528366194380;NA",
"sourceType": "S3",
"ingestTime": 1546916322079,
"sourceOffset": 43808,
"sourceKey": "test/test_201808.csv",
"bucket": "test",
"topic": "dataset-5081-source-9704",
"resourceType": "SOURCE",
"resourceId": 9704,
"nexlaUUID": null,
"eof": false,
"runId": 1555400123,
"tags": [{}],
"transformTime": 1555400123895,
"transformTimeISO8601": "2019-04-16T07:35:23.895Z"
},
"error": null
},
"output": {
"rawMessage": {
"product_id": 1234,
"price": 5.0,
"modified_price": 6.0
},
"nexlaMetaData": {
"trackerId": "u9704::test/test_201808.csv:1:1:1:1528366194380;NA",
"sourceType": "S3",
"ingestTime": 1546916322079,
"sourceOffset": 43808,
"sourceKey": "test/test_201808.csv",
"bucket": "test",
"topic": "dataset-5081-source-9704",
"resourceType": "SOURCE",
"resourceId": 9704,
"nexlaUUID": null,
"eof": false,
"runId": 1555400123,
"tags": [{}],
"transformTime": 1555400123904,
"transformTimeISO8601": "2019-04-16T07:35:23.904Z"
},
"error": null
}
}
]