Data source resources describe sources of data to be ingested, including details about source type, ingestion schedule, associated data sets, and credentials for accessing the source. No matter where the data to be ingested resides, all information where, when, and how to ingest the data is contained in these Nexla resources.
A data source may have one or more datasets associated with it. These correspond to distinct schemas detected by Nexla in the source.
Both Nexla API and Nexla CLI support methods to list all sources in the authenticated user's account. A successful call returns detailed information like id, owner, type, credentials, activation status, and ingestion configuration about all sources.
Fetch a specific source accessible to the authenticated user. A successful call returns detailed information like id, owner, type, credentials, activation status, and ingestion configuration about that source.
In case of Nexla API, add an expand query param with a truthy value to get more details about the source. With this parameter, full details about the related resources (detected datasets, credentials, etc) will also be returned.
Both Nexla API and Nexla CLI support methods to create a new data source in the authenticated user's account.
The only required attribute in the input object is the data source name; all other attributes are set to default values.
Data sources usually require some credentials for making a connection and ingesting data. You can refer to an existing data_credentials resource or create a new one in the POST call to /data_sources. In this example, an existing credentials object is used:
Nexla API
Create with Credentials: Request
POST /data_sources
Example Request Body
...
{
"name": "Example S3 Data Source",
"source_type": "s3",
"data_credentials": 5001
}
Here, the required attributes for creating a new data_credentials resource are included in the request:
Nexla API
Create with Credentials: Request
POST /data_sources
Example Request Body
...
{
"name": "Example FTP Data Source",
"source_type": "ftp",
"data_credentials": {
"name": "FTP CREDS",
"credentials_type": "ftp",
"credentials_version": "1",
"credentials": {
"credentials_type": "ftp",
"account_id": "XYZ",
"password": "123"
}
}
}
In either case, a successful POST on /data_sources with credential information will return a response including the full data source and the encrypted form of its associated data credentials resource:
Nexla API supports methods to delete any source that the authenticated user has administrative/ownership rights to.
If the source is paused and none of its detected datasets have associated downstream resources Nexla can delete the source safely. A successful request to delete a data source returns Ok (200) with no response body.
If the source is active or there are downstream resources that will be impacted Nexla will not trigger deletion and instead return a failure message informing about the reason for denying deletion of the source.
Nexla API
Delete Source: Request
DELETE /data_sources/{data_source_id}
Nexla API
Delete Source: Response
Empty response with status 200 for success
Error response with reason if source could not be deleted
Trigger Nexla to start ingesting data immediately by calling the activation method on that source. Note that Nexla source usually contains parameters to schedule automatic ingestion based on cron intervals or completion of other jobs. This activation method triggers an ingestion in addition to the scheduled automatic source ingestion.
Nexla API
Nexla CLI
Activate Source: Request
PUT /data_sources/{data_source_id}/activate
On the flip side, call the pause method to immediately stop ingestion on that source. Any subsequent scheduled ingestion intervals will be ignored as long as the source is paused.
For file type sources, Nexla can be configured to reingested an already scanned file. This is useful if the file originally failed ingestion due to file errors and the file has been modified.
To re-ingest files for a data source, issue POST request on endpoint /data_sources/<data_source_id>/file/ingest with file path as body. The file path must start with the root of the location that the source points to.
All configuration about where and when to scan data is contained with the source_config property of a data source.
As Nexla provides quite a few options to fine tune and control exactly what slice of your data location you want to ingest and how, it is important to ensure the source_config contains all required parameters to successfully scan data. To validate the configuration of a given data source, send a POST request on endpoint /data_sources/<data_source_id>/config/validate.
You can send optional json config as input body, if there is no input config in request then stored source_config will be used for validation.
Nexla API
Validate Source Configuration: Request
POST /data_sources/{data_source_id}/config/validate
Nexla API
Validate Source Configuration: Response
{
"status":"ok",
"output":[
{
"name":"credsEnc",
"value":null,
"errors":[
"Missing required configuration \"credsEnc\" which has no default value."
],
"visible":true,
"recommendedValues":[]
},
{
"name":"credsEncIv",
"value":null,
"errors":[
"Missing required configuration \"credsEncIv\" which has no default value."
],
"visible":true,
"recommendedValues":[]
},
{
"name":"source_type",
"value":null,
"errors":[
"Missing required configuration \"source_type\" which has no default value.",
"Invalid value null for configuration source_type: Invalid enumerator"
You can inspect the data that a source points to. These methods can be handy when trying to figure out the exact source_config properties to be set in the data source.
You can inspect the tree structure of file and database sources to a particular depth. Note that not all data source types have a natural tree structure.
The following example shows the required request body structure for a /probe/tree call on an S3 data source.
You can also get metadata and sample content from a file within a source. Note that the request payload must contain path of file starting from the root of the location that data_source points to.
When a data source is activated it will scan all data to detect unique schemas and create a dataset for each schema.
You can choose to test what potential schemas might be detected out of a part of a data source, for ex a specific file. The format of the request body object depends on the data source type. S3 data sources require a bucket attribute and accept an optional prefix. FTP data sources require only a file attribute, which must contain the full path to an ftp-based file.
Nexla API
Test Potential Detected Schemas: Request
POST /data_sources/<data_source_id>/probe/schemas
{
"bucket" : "ftp-nexla.com",
"prefix" : "finance/data"
}
The response to a successful /probe/schemas call contains an array of objects representing potential data sets. Each object contains source_schema and data_samples attributes along with other meta-data.
Lifetime ingestion metrics methods return information about total data ingested for a source since its creation. Metrics contain information about the number of records ingested as well the estimated volume of data.
Aggregated ingestion metrics methods return information about total data ingested every day for a source. Metrics contain information about the number of records ingested as well the estimated volume of data.
Aggregations can be fetched in different aggregation units. Use the method below to fetch reports aggregated daily:
Nexla API
Nexla CLI
Daily Ingestion Metrics: Request
GET /data_sources/5001/metrics?aggregate=1
...
Optional Payload Parameters:
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"page": <integer page number>,
"size": <number of entries in page>
}
Nexla API
Nexla CLI
Daily Ingestion Metrics: Response
{
"status":200,
"metrics":[
{
"time":"2017-02-08",
"record":53054,
"size":12476341
},
{
"time":"2017-02-09",
"record":66618,
"size":15829589
},
{
"time":"2017-02-10",
"record":25832,
"size":6645994
}
]
}
Sources can be configured to scan for data at a specific ingestion frequency. Use the methods below to view ingestion metrics per ingestion cycle.
Nexla API
Nexla CLI
Aggregated By Ingestion Frequency: Request
GET /data_sources/5001/metrics/run_summary
...
Optional Payload Parameters:
{
"runId": <starting from unix epoch time of ingestion events>,
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
Apart from aggregated ingestion metrics methods above that provide visibility into total number of records and total volume of data ingested over a period of time, Nexla also provides methods to view granular details about ingestion events.
You can retrieve ingestion status of a file source to find information like how many files have been read fully, failed ingestion, or queued for ingestion in next ingestion cycle.
Nexla API
File Source Ingestion Status: Request
GET /data_sources/5001/metrics/files_stats
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL"
}
Nexla API
File Source Ingestion Status: Response
{
"status":200,
"metrics":{
"data":{
"COMPLETE":17
},
"meta":{
"currentPage":1,
"totalCount":1,
"pageCount":1
}
}
}
You can view ingestion status and history per file of a file source. The file source ingestion history methods below return one entry per file by aggregating all ingestion events for each file.
Nexla API
Nexla CLI
Ingestion History Per File: Request
/data_sources/5001/metrics/files
...
Optional Parameters
{
"from": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"to": <UTC datetime in '%Y-%m-%dT%H:%M:%S' format>,
"status": "one of NOT_STARTED/IN_PROGRESS/COMPLETE/ERROR/PARTIAL",