Data Flows

Data flows define and describe the path of data through the Nexla platform, from source to destination. The primary resources in any flow are its data sets, which are chained together in acyclic tree structures and are associated with resources describing the source, sharing and destinations of the data.

Flow resources are nested JSON objects. The root object contains a flows array containing one or more complete data flows, which normally begin at a data set associated with a data source and terminate in a data set or data sink.

Each dataset object in a data flow contains a resource object, which may be null, and a children array, which may be empty. It also contains various attributes describing the data set itself, e.g. name, description, etc.

The resource object describes an associated data source, sharers, or destinations, if any exists at that point in the flow.

The children array contains all downstream data sets connected to the current one, UNLESS the data set is upstream from the data set for which the data flow request was made, in which case only the branch leading to that data set is included.

If a flow terminates with a dataset associated with one or more destinations to which the outgoing data is to be written, those data destination objects are contained in a data_sinks array within the resource.

The following example shows the basic tree structure of a flow, with node level details omitted:

{
"flows": [
{
"id": 1,
"parent_data_set_id":null,
"data_source":{
"id": 10
},
"data_sinks":[],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": [
{
"id": 2,
"parent_data_set_id":1,
"data_sinks":[ ... ],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": [
{
"id": 3,
"parent_data_set_id":1,
"data_sinks":[...],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": []
}
]
}
]
}
],
"data_sources": [ ... ],
"data_sets": [ ... ],
"data_sinks": [ ... ],
"data_credentials": [ ... ],
"orgs": [ ... ],
"users": [ ... ]
}

The response object also contains arrays of expanded resource objects for each resource included in the returned flows.

List All Flows

Use the endpoint below to view all of the user's data flow resources.

List All Flows: Request
GET /data_flows
List All Flows: Response
{
"flows": [
{
"id": 5059,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks":[],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": [
{
"id": 5061,
"parent_data_set_id": 5059,
"data_sinks":[],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": [
{
"id": 5062,
"parent_data_set_id": 5061,
"data_sinks": [
5029,
5030
],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
},
{
"id": 5060,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
},
{
"id": 5063,
"parent_data_set_id": null,
"data_source": {
"id": 5024
},
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5065,
"parent_data_set_id": 5063,
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5066,
"parent_data_set_id": 5065,
"data_sinks": [
5031,
5032
],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
}
],
"data_sources": [
{
"id": 5023,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Source 1",
"status": "PAUSED",
"description": "Simple reference data source. Uses default settings and does not require ingestion.",
"source_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [
5028
]
},
...
],
"data_sets": [
{
"id": 5059,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": null,
"data_source_id": 5023,
"name": "Reference Data Set 1",
"description": "Pre-canned data set for reference data source.",
"status": "PAUSED",
"data_sinks": [],
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
...
],
"data_sinks": [
{
"id": 5029,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 1",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [
5028
]
},
...
],
"data_credentials": [
{
"id": 5028,
"owner_id": 2,
"org_id": 1,
"name": "Reference Flow Credentials 1",
...
},
...
]
}

Show Flows for a Data Source

Use the methods below to retrieve only the flows connected to a particular data source.

Show Flows For A Source: Request
GET /data_flows/data_source/{data_source_id}
Show Flows For A Source: Response
{
"flows": [
{
"id": 5059,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks":[],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": [
{
"id": 5061,
"parent_data_set_id": 5059,
"data_sinks":[],
"sharers":{
"sharers":[],
"external_sharers":[]
},
"children": [
{
"id": 5062,
"parent_data_set_id": 5061,
"data_sinks": [
5029,
5030
],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
}
],
"data_sources": [
{
"id": 5023,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Source 1",
"status": "PAUSED",
"description": "Simple reference data source. Uses default settings and does not require ingestion.",
"source_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [
5028
]
}
],
"data_sets": [
{
"id": 5059,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": null,
"data_source_id": 5023,
"name": "Reference Data Set 1",
"description": "Pre-canned data set for reference data source.",
"status": "PAUSED",
"data_sinks": [],
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
...
],
"data_sinks": [
{
"id": 5029,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 1",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [
5028
]
},
...
],
"data_credentials": [
{
"id": 5028,
"owner_id": 2,
"org_id": 1,
"name": "Reference Flow Credentials 1",
"description": null,
"credentials_type": "s3",
"verified_status": "200 Ok",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
]
}

Show Flows for a dataset

Use the method below with any dataset id to get the full description of the flow to which the dataset belongs. Note that the response to can be the same for two different data set ids if the data sets are both part of the same flow.

Flow For A Dataset: Request
GET /data_flows/{data_set_id}
Flow For A Dataset: Response
{
"flows": [
{
"id": 5059,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5061,
"parent_data_set_id": 5059,
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5062,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
}
],
"data_sources": [
{
"id": 5023,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Source 1",
"status": "PAUSED",
"description": "Simple reference data source. Uses default settings and does not require ingestion.",
"source_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_sets": [
{
"id": 5059,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": null,
"data_source_id": 5023,
"name": "Reference Data Set 1",
"description": "Pre-canned data set for reference data source.",
"status": "PAUSED",
"data_sinks": [],
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
{
"id": 5061,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": 5059,
"data_sinks": [],
"name": null,
"description": null,
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
{
"id": 5062,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"status": "PAUSED",
"name": null,
"description": null,
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
],
"data_sinks": [
{
"id": 5029,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 1",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
},
{
"id": 5030,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 2",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_credentials": [
{
"id": 5028,
"owner_id": 2,
"org_id": 1,
"name": "Reference Flow Credentials 1",
"description": null,
"credentials_type": "s3",
"verified_status": "200 Ok",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
]
}

Show Flows to a Destination

Use the endpoint below to retrieve only the flows connected to a particular data destination. Note that the response for a flow connected to a data_sinksets on the branch from the data source that lead directly to the destination.

Flows To A Destination: Request
GET /data_flows/data_sink/{data_sink_id}
Flows To A Destination: Response
{
"flows": [
{
"id": 5059,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5061,
"parent_data_set_id": 5059,
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5062,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
}
],
"data_sources": [
{
"id": 5023,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Source 1",
"status": "PAUSED",
"description": "Simple reference data source. Uses default settings and does not require ingestion.",
"source_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_sets": [
{
"id": 5059,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": null,
"data_source_id": 5023,
"name": "Reference Data Set 1",
"description": "Pre-canned data set for reference data source.",
"status": "PAUSED",
"data_sinks": [],
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
{
"id": 5061,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": 5059,
"data_sinks": [],
"name": null,
"description": null,
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
{
"id": 5062,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"status": "PAUSED",
"name": null,
"description": null,
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
],
"data_sinks": [
{
"id": 5029,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 1",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
},
{
"id": 5030,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 2",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_credentials": [
{
"id": 5028,
"owner_id": 2,
"org_id": 1,
"name": "Reference Flow Credentials 1",
"description": null,
"credentials_type": "s3",
"verified_status": "200 Ok",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
]
}

Update Flow

Most updates to data flow configurations must be done directly with PUT requests on the component resources such as data_sources, data_sets. and data_sinks. However, /data_flows does support a few composite updates:activate, pause, and delete. These are cascaded across all components of the flow when applicable.

Additionally, Nexla CLI supports methods to export and import full flow specifications.

Control Data Flow

Activate Full Flow

Use the methods below to activate all the component resources of a flow. If the root data source is not already activated, it will be activated.

Activate Full Flow: Request
PUT /data_flows/data_source/{data_source_id}/activate
Activate Full Flow: Response
{
"flows": [
{
"id": 5059,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5061,
"parent_data_set_id": 5059,
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5062,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
}
],
"data_sources": [
{
"id": 5023,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Source 1",
"status": "ACTIVE",
"description": "Simple reference data source. Uses default settings and does not require ingestion.",
"source_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_sets": [
{
"id": 5059,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": null,
"data_source_id": 5023,
"name": "Reference Data Set 1",
"description": "Pre-canned data set for reference data source.",
"status": "ACTIVE",
"data_sinks": [],
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
{
"id": 5061,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": 5059,
"data_sinks": [],
"name": null,
"description": null,
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
{
"id": 5062,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"status": "ACTIVE",
"name": null,
"description": null,
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
],
"data_sinks": [
{
"id": 5029,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 1",
"status": "ACTIVE",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
},
{
"id": 5030,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 2",
"status": "ACTIVE",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_credentials": [
{
"id": 5028,
"owner_id": 2,
"org_id": 1,
"name": "Reference Flow Credentials 1",
"description": null,
"credentials_type": "s3",
"verified_status": "200 Ok",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
]
}

Pause Full Flow

Use the methods below to pause all the component resources of a flow. /data_flows/<data_set_id>/pause and /data_flows/data_sink/<data_sink_id>/pause are also supported, but only pause the flow from the requested resource downwards. To pause the entire flow from a downstream resource, include the ?all=1 query parameter.

Pause Full Flow: Request
PUT /data_flows/data_source/{data_source_id}/pause
Pause Full Flow: Response
{
"flows": [
{
"id": 5059,
"parent_data_set_id": null,
"data_source": {
"id": 5023
},
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5061,
"parent_data_set_id": 5059,
"data_sinks": [],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": [
{
"id": 5062,
"parent_data_set_id": 5061,
"data_sinks": [5029, 5030],
"sharers": {
"sharers": [],
"external_sharers": []
},
"children": []
}
]
}
]
}
],
"data_sources": [
{
"id": 5023,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Source 1",
"status": "PAUSED",
"description": "Simple reference data source. Uses default settings and does not require ingestion.",
"source_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
}
],
"data_sets": [
{
"id": 5059,
"owner_id": 2,
"org_id": 1,
"parent_data_set_id": null,
"data_source_id": 5023,
"name": "Reference Data Set 1",
"description": "Pre-canned data set for reference data source.",
"status": "PAUSED",
"data_sinks": [],
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
},
...
],
"data_sinks": [
{
"id": 5029,
"owner_id": 2,
"org_id": 1,
"name": "Reference Data Sink 1",
"status": "PAUSED",
"description": null,
"sink_type": "s3",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z",
"data_credentials": [5028]
},
...
],
"data_credentials": [
{
"id": 5028,
"owner_id": 2,
"org_id": 1,
"name": "Reference Flow Credentials 1",
"description": null,
"credentials_type": "s3",
"verified_status": "200 Ok",
"tags": [],
"created_at": "2018-05-16T17:56:17.000Z",
"updated_at": "2018-05-16T17:56:17.000Z"
}
]
}

Delete Flows

Issue a DELETE request to any of the /data_flows endpoints with a specific data source, dataset or destination id to delete the resource at that id and its downstream resources. Include the all=1 query parameter to delete the entire flow, including upstream resources.

The presence of any ACTIVE resources in the data flow to be delete will cause the request to fail with a Method-Not-Allowed (405) error and the JSON response will list the resources that must be paused.

A successful request to delete a data flow returns Ok (200) with no response body.

Delete Flow: Request
DELETE /data_flow/data_sources/{data_source_id}
...
Alernate:
/data_flow/data_sets/{data_set_id}
/data_flow/data_sinks/{data_sink_id}
Delete Flow: Response
{
"data_sources": [5023],
"data_sets": [5059, 5061, 5062],
"message": "Active flow resources must be paused before flow deletion!"
}

Import & Export Flows

Nexla CLI supports methods to export flow specification into a JSON file, and subsequently import the JSON specification into a new flow in the same or a different user account.

Export a Flow

Use this method to export one or more pipelines originating from a data source. This method will list out pipelines originating from that flow and allow you to export one or all of the pipelines into a local JSON file.

For example, this call triggers export of pipelines for source 6862 to a local file ~/Desktop/export_6862.json

Import A Flow

Use this method to import flow from a previously exported JSON file. This is a quick way to spin out replicas of a data flow with modifications as needed.

For example, The call below imports the previously exported pipeline into a new pipeline.