Writing Data

Data Destinations

Data destination (also called data sink) resources describe external destinations for output data from specific datasets. Like data sources, they usually require client credentials to allow connecting and writing to the external system. No matter where the data the data has to be written out, all information where, when, and how to write the data out is contained in these Nexla resources.

List All Destinations

Both Nexla API and Nexla CLI support methods to list all destinations in the authenticated user's account. A successful call returns detailed information like id, owner, type, credentials, activation status, and output configuration about all destinations.

List All Destinations: Request
GET /data_sinks
Example:
curl https://api.nexla.io/data_sinks \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
List All Destinations: Response
[
{
"id": 5854,
"owner": {
"id": 2,
"full_name": "Jeff Williams"
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": ["owner"],
"name": "Amazon S3 test",
"description": null,
"status": null,
"data_set_id": 8092,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "manual",
"mapping": {
"item_id": ["item_id"],
"item_name": ["item_name"],
"store_code": ["store_code"],
"city_code": ["city_code"],
"item_price": ["item_price"],
"discount": ["discount"],
"discounted_price": ["discounted_price"]
},
"fields_order": [
"item_id",
"item_name",
"store_code",
"city_code",
"item_price",
"discount",
"discounted_price"
],
"tracker_mode": "NONE"
},
"data_format": "csv",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/echo/nexla_outputs",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}/{HH}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 8092,
"name": "echo"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
},
{
"id": 5752,
"owner": {
"id": 2,
"full_name": "Jeff Williams"
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": ["owner"],
"name": "test",
"description": null,
"status": null,
"data_set_id": 7728,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/test",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 7728,
"name": "test"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-04-26T06:40:02.000Z",
"created_at": "2019-04-26T06:40:02.000Z",
"tags": []
}
]

Show One Destination

Fetch a specific destination accessible by the authenticated user. A successful call returns detailed information like id, owner, type, credentials, activation status, and output configuration about that destination.

In case of Nexla API, add an expand query param with a truthy value to get more details about the destination. With this parameter, full details about the related resources (destination's dataset, credentials, etc) will also be returned.

Show One Destination: Request
GET /data_sinks/{data_sink_id}
Example
curl https://api.nexla.io/data_sinks/5854 \
-H "Authorization: Bearer <Access-Token>" \
-H "Accept: application/vnd.nexla.api.v1+json"
Show One Destination: Response
{
"id": 5854,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": [
"owner"
],
"name": "Amazon S3 test",
"description": null,
"status": null,
"data_set_id": 8092,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "manual",
"mapping": {
"item_id": [
"item_id"
],
"item_name": [
"item_name"
],
"store_code": [
"store_code"
],
"city_code": [
"city_code"
],
"item_price": [
"item_price"
],
"discount": [
"discount"
],
"discounted_price": [
"discounted_price"
]
},
"fields_order": [
"item_id",
"item_name",
"store_code",
"city_code",
"item_price",
"discount",
"discounted_price"
],
"tracker_mode": "NONE"
},
"data_format": "csv",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/echo/nexla_outputs",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}/{HH}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 8092,
"name": "echo"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}

Create A Destination

Both Nexla API and Nexla CLI support methods to create a new data destination in the authenticated user's account. The only required attribute in the input object is the data destination name; all other attributes are set to default values. Specify data_set_id to associate what data should be written into that destination, data_credential to authorize the destination location, and sink_config to control how the data should be written out to that destination.

Create Destination: Request
POST /data_sinks
Example Request Body
...
{
"name": "Test Destination",
"description": null,
"sink_type": "dropbox",
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"data_credentials": 8342,
"data_set_id": 22194
}
Create Destination: Response
{
"id": 5855,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
...
},
"access_roles": [
"owner"
],
"name": "Test Destination",
"description": null,
"status": null,
"data_set_id": 22194,
"data_map_id": null,
"sink_type": "dropbox",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 22194,
...
},
"data_credentials": {
"id": 8342,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}

Create with Credentials

Data destinations usually require some credentials for making a connection and ingesting data. You can refer to an existing data_credentials resource or create a new one in the create data destinations. In this example, an existing credentials object is used:

Create with Credentials: Request
POST /data_sinks
Example Request Body
...
{
"name": "Test Destination",
"description": null,
"sink_type": "dropbox",
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"data_credentials": 8342,
"data_set_id": 22194
}'

Here, the required attributes for creating a new data_credentials resource are included in the request:

Create with Credentials: Request
POST /data_sinks
Example Request Body
...
{
"name": "Test Destination",
"description": null,
"sink_type": "dropbox",
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"data_set_id": 22194,
"data_credentials": {
"name": "FTP CREDS",
"credentials_type": "ftp",
"credentials_version": "1",
"credentials": {
"credentials_type": "ftp",
"account_id": "XYZ",
"password": "123"
}
}
}'

In either case, a successful POST on /data_sinks with credential information will return a response including the full data destination and the encrypted form of its associated data credentials resource:

Create with Credentials: Response
{
"id": 5855,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
...
},
"access_roles": [
"owner"
],
"name": "Test Destination",
"description": null,
"status": null,
"data_set_id": 22194,
"data_map_id": null,
"sink_type": "dropbox",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "auto",
"tracker_mode": "NONE"
},
"data_format": "json",
"sink_type": "dropbox",
"path": "/nexlatests/dataout/rel22",
"output.dir.name.pattern": "demo/{yyyy}/{MM}/{dd}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 22194,
...
},
"data_credentials": {
"id": 8342,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}

Update A Destination

Nexla API supports methods to update any property of an existing destination the authenticated user has access to.

Update Destination: Request
PUT /data_sinks/<data_sink_id>
Example Request Body
...
{
"name": "Updated S3 Data Sink",
}
Update Destination: Response
{
"id": 5023,
"owner": {
"id": 82,
...
},
"org": {
"id": 1,
"name": "Nexla",
"email_domain": "nexla.com",
"email": null
},
"access_roles": [
"owner"
],
"name": "Updated S3 Data Sink",
"description": null,
"status": null,
"data_set_id": 8092,
"data_map_id": null,
"sink_type": "s3",
"sink_format": null,
"sink_config": {
"mapping": {
"mode": "manual",
"mapping": {
"item_id": [
"item_id"
],
"item_name": [
"item_name"
],
"store_code": [
"store_code"
],
"city_code": [
"city_code"
],
"item_price": [
"item_price"
],
"discount": [
"discount"
],
"discounted_price": [
"discounted_price"
]
},
"fields_order": [
"item_id",
"item_name",
"store_code",
"city_code",
"item_price",
"discount",
"discounted_price"
],
"tracker_mode": "NONE"
},
"data_format": "csv",
"sink_type": "s3",
"path": "customer-solutions.nexla.com/echo/nexla_outputs",
"output.dir.name.pattern": "{yyyy}-{MM}-{dd}/{HH}"
},
"sink_schedule": null,
"managed": false,
"data_set": {
"id": 8092,
"name": "echo"
},
"data_credentials": {
"id": 5216,
...
},
"updated_at": "2019-07-17T11:56:40.000Z",
"created_at": "2019-07-17T11:56:40.000Z",
"tags": []
}

Delete A Destination

Nexla API supports methods to delete any destination that the authenticated user has administrative/ownership rights to. A successful request to delete a data destination returns Ok (200) with no response body.

Delete Destination: Request
DELETE /data_sinks/{data_sink_id}
Delete Destination: Response
Empty response with status 200 for success
Error response with reason if destination could not be deleted

Control Data Output

Activate and Pause Destination

Associate a data set with a destination to control what data will be written out to a destination. Each destination can only have one dataset that writes data to it. You can associate a dataset with a destination by setting the data_set_id property of the destination.

You can control data from being written out by one of two ways:

  1. You can control the status of the associated dataset (see relevant methods in the dataset page). This will prevent dataset from even processing data to be written out.
  2. You can use the methods below to activate or pause the destination. This method is better suited for scenarios where same dataset is configured for writing to multiple destinations.
Activate Destination: Request
PUT /data_sinks/{data_sink_id}/activate

On the flip side, call the pause method to immediately stop data write on that destination.

Pause Destination: Request
PUT /data_sinks/{data_sink_id}/pause

Validate Destination Configuration

All configuration about where and when to scan data is contained with the sink_config property of a data destination.

As Nexla provides quite a few options to fine tune and control exactly how and where you want to write your data out, it is important to ensure the sink_config contains all required parameters to successfully scan data. To validate the configuration of a given data destination, send a POST request on endpoint /data_sinks/<data_sink_id>/config/validate.

You can send optional json config as input body, if there is no input config in request then stored sink_config will be used for validation.

Validate Destination Configuration: Request
POST /data_sinks/{data_sink_id}/config/validate
Validate Destination Configuration: Response
{
"status": "ok",
"output": [
{
"name": "credsEnc",
"value": null,
"errors": [
"Missing required configuration \"credsEnc\" which has no default value."
],
"visible": true,
"recommendedValues": []
},
{
"name": "credsEncIv",
"value": null,
"errors": [
"Missing required configuration \"credsEncIv\" which has no default value."
],
"visible": true,
"recommendedValues": []
},
{
"name": "sink_typ",
"value": null,
"errors": [
"Missing required configuration \"sink_type\" which has no default value.",
"Invalid value null for configuration sink_type: Invalid enumerator"
],
"visible": true,
"recommendedValues": []
}
]
}

Monitor Destination

Use the methods listed in this section to monitor all data write history for a destination.

Lifetime Write Metrics

Lifetime write metrics methods return information about total data written out through a destination since its creation. Metrics contain information about the number of records written out as well the estimated volume of data.

Lifetime Write Metrics: Request
GET /data_sinks/5001/metrics
Lifetime Write Metrics: Response
{
"status": 200,
"metrics": {
"records": 4,
"size": 582
}
}

Aggregated Write Metrics

Aggregated write metrics methods return information about total data written out every day from a destination. Metrics contain information about the number of records written out as well the estimated volume of data.

Aggregations can be fetched in different aggregation units. Use the method below to fetch reports aggregated daily:

Daily Write Metrics: Request
GET /data_sink/5001/metrics?aggregate=1
...
Optional Payload Parameters:
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"page": <integer page number>,
"size": <number of entries in page>
}
Daily Write Metrics: Response
{
"status": 200,
"metrics": [
{
"time": "2017-02-08",
"record": 53054,
"size": 12476341
},
{
"time": "2017-02-09",
"record": 66618,
"size": 15829589
},
{
"time": "2017-02-10",
"record": 25832,
"size": 6645994
}
]
}

Destination metrics can also be batched by the ingestion frequency of the originating data. Use the methods below to view destination metrics per ingestion cycle.

Aggregated By Ingestion Frequency: Request
GET /data_sinks/5001/metrics/run_summary
...
Optional Payload Parameters:
{
"runId": <starting from unix epoch time of ingestion events>,
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"page": <integer page number>,
"size": <number of entries in page>
}
Aggregated By Ingestion Frequency: Response
{
"status": 200,
"metrics": {
"1539970426049": {
"records": 1364,
"size": 971330,
"errors": 0
},
"1539990426049": {
"records": 330,
"size": 235029,
"errors": 0
}
}
}

Granular Write Status Metrics

Apart from aggregated write metrics methods above that provide visibility into total number of records and total volume of data written out over a period of time, Nexla also provides methods to view granular details about data write events.

You can retrieve data write status of a file destination to find information like how many files have been written out fully, or are queued for being written out.

File Destination Write Status: Request
GET /data_sinks/6745/metrics/files_stats
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL"
}
File Destination Write Status: Response
{
"status": 200,
"metrics": {
"data": {
"COMPLETE": 17
},
"meta": {
"currentPage": 1,
"totalCount": 1,
"pageCount": 1
}
}
}

You can view write status and history per file of a file destination. The file destination write history methods below return one entry per file by aggregating all write events for each file.

Write History Per File: Request
/data_sinks/<data_sink_id>/metrics/files
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL",
"page": <integer page number>,
"size": <number of entries in page>
}
Write History Per File: Response
{
"status": 200,
"metrics": {
"data": [
{
"dataSetId": 11429,
"size": 7750996,
"writeStatus": "COMPLETE",
"sinkId": 6745,
"recordCount": 285,
"name": "/nexlatests/dataout/rel22/anyof/1/dataset-11429-000000000000.json",
"id": null,
"lastWritten": "2019-08-16T20:57:49Z",
"runId": 1565912396852,
"error": null
}
],
"meta": {
"currentPage": 1,
"totalCount": 1,
"pageCount": 1
}
}
}

You can also bypass per file aggregation and fetch full ingestion history of each file even if it was written out multiple times. This is relevant for scenarios where the destination has been configured to write out the same file name in every ingestion cycle

Raw File Write Status: Request
GET /data_sinks/<data_sink_id>/metrics/files_raw
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL",
"page": <integer page number>,
"size": <number of entries in page>
}
Raw File Write Status: Response
{
"status": 200,
"metrics": [
{
"dataSetId": 11429,
"size": 7750996,
"writeStatus": "COMPLETE",
"sinkId": 6745,
"recordCount": 285,
"name": "/nexlatests/dataout/rel22/anyof/1/dataset-11429.json",
"id": null,
"lastWritten": "2019-08-16T20:57:49Z",
"runId": 1565912396852,
"error": null
},
{
"dataSetId": 11429,
"size": 7750996,
"writeStatus": "COMPLETE",
"sinkId": 6745,
"recordCount": 285,
"name": "/nexlatests/dataout/rel22/anyof/1/dataset-11429.json",
"id": null,
"lastWritten": "2019-08-15T20:57:49Z",
"runId": 1565912396852,
"error": null
}
]
}

Other Monitoring Events

See the section on Monitoring resources for method to view destination errors, notifications, quarantine samples, and audit logs.