Migrating Flows from Dev To Production

Tutorial Goal

In this tutorial we will use Nexla CLI to clone a development/QA Data Flow into a production Data Flow. We will also look at modifying flow specification (ex: exact path to production data) as part of this migration workflow.

While migration from QA to production is one common use case of this recipe, you might find these steps handy for getting started with new flows which are almost the same as some of your existing Data Flows. Note that even though we are using Nexla CLI for this tutorial, we could very well have used Nexla UI or Nexla API for achieving the same goal.

tip

The Philosophy

When working with large volumes of data, we recommend starting with a development/QA Data Flow that acts on a small slice of data. This faciliates quick iterations till you are you fully satisfied with flow specifications. We also recommend maintaing this development/QA Data Flow throughout the lifecycle of your production Data Flow so that you can quickly and safely test out new requirements before turning them live.

And so we have tools to faciliate easy migration and management of QA and production Data Flows!

Step 1: Export Development Flow Specification

Let's first start with exporting the specifications of the Data Flow we want to migrate. We will use Nexla CLI's flows export method which exports one or more pipelines originating from a data source. This command generates 2 output files, one is the flow configuration file and another is the properties file. The configuration files contain all the rules for the flow. The properties file contains the credentials for the resources of the flow and metadata of the flow.

For this tutorial we want to migrate flow from Source ID 9505 to Destination ID 14450. So we will call nexla flows export -s=6862 --output_file ~/Desktop/export_9505.json which will save all the rules for this flow into a new local file ~/Desktop/export_9505.json.

Export QA Pipeline
➜  nexla flows export -s 9505 -o ~/Desktop/export_9505.json
+-----------+------+-----------------------------+-----------------------------+-----------+
| pipeline_id | source | detected_dataset | dataset_1 | destination |
+-----------+------+-----------------------------+-----------------------------+-----------+
| 1 | 9505 | 14325 (1 - nexla_test,PAUSED) | | 8102 (1234) |
+-----------+------+-----------------------------+-----------------------------+-----------+
| 2 | 9505 | 14325 (1 - nexla_test,PAUSED) | 14450 (1 - nexla_test,PAUSED) | |
+-----------+------+-----------------------------+-----------------------------+-----------+
Enter pipeline ids : 1
[2022-06-16 08:14:04 UTC] Creating template for dataset, sink and datamap
[2022-06-16 08:14:04 UTC] Scanning pipeline 1
[2022-06-16 08:14:08 UTC] Fetching source details
[2022-06-16 08:14:10 UTC] exporting json..

Note that if we wanted to export all branches from source 9505 we could have could have done that adding the -a option.

Export QA Pipeline
➜  nexla flows export -s 9505 -o ~/Desktop/export_9505.json -a
[2022-06-17 11:10:57 UTC] Getting all pipeline ids...
[2022-06-17 11:10:57 UTC] Found 2 pipelines, exporting them
[2022-06-17 11:10:58 UTC] Creating template for dataset, sink and datamap
[2022-06-17 11:10:58 UTC] Scanning pipeline 1 
[2022-06-17 11:11:01 UTC] Scanning pipeline 2 
[2022-06-17 11:11:05 UTC] Fetching source details
[2022-06-17 11:11:06 UTC] exporting json..

Step 2: Modify Necessary Source Or Destination Properties

Since we want to migrate from development to production data flow, it is quite likely that we might need to modify the actual path of input and output data in those connectors. Additionally, we might need to change the frequency of the production pipeline.

Now we can always edit the source and destination after creation of the production flows, but we might as well easily do in the exported JSON specification. We'll open up the specification file in our preferred text editor and modify the source_config and sink_config properties as needed. These two properties contain all the information for triggering the source/destination.

We might also need different credentials. We can either update that in the exported properties file, or assign new credentials during the import step.

Step 3: Create Production Flows

Now that we have the production specification ready, we will again use the Nexla CLI to import this specification and trigger the production flows.

Now, if the production pipeline needs to be created in a different Nexla environment or user account, we just need to switch the CLI context. Refer to Nexla CLI Authentication for details. For this tutorial we will create the production pipeline in same environment and user account as QA pipleine.

All we need to do is call Nexla CLI's flows import method with our specification file. The CLI will create a new source, datasets, and destination based on this specification. Additionally, it will either automatically use the properties file for assigning credentials or wait for user feedback depending on whether we attach the properties file with the import command or not.

Import Pipeline Specification: With Properties
->  nexla flows import -i ~/Desktop/export_9505.json -p ~/Desktop/export_9505_properties.json

[2022-06-17 12:33:50 UTC] Using credential 6952 from properties file
[2022-06-17 12:33:50 UTC] Creating source : nexla_test
[2022-06-17 12:33:52 UTC] Data Source created with ID: 11204
[2022-06-17 12:33:53 UTC] Creating Dataset : 1 - nexla_test
[2022-06-17 12:33:56 UTC] ID: 17284, Name: 1 - nexla_test
[2022-06-17 12:33:59 UTC] Created Dataset with ID, 17284 from dataset 17283
[2022-06-17 12:33:59 UTC] Created Dataset id ====> 17284
[2022-06-17 12:33:59 UTC] Parent dataset id for sink is ===> 17283
[2022-06-17 12:33:59 UTC] Creating Sink : 1234
[2022-06-17 12:34:02 UTC] Sink created with ID: 9535, and associated with dataset 17283

Import Pipeline Specification: Without Properties
->  nexla flows import -i ~/Desktop/export_9505.json

[2022-06-18 04:24:59 UTC] Credential Name given on Exported Pipeline :  sk21
[2022-06-18 04:24:59 UTC] Available gdrive credentials
[2022-06-18 04:24:59 UTC] credential_id      credential_name
[2022-06-18 04:24:59 UTC]     7041       sk21 (Copy) (Copy)
[2022-06-18 04:24:59 UTC]     7039       sk21 (Copy)
[2022-06-18 04:24:59 UTC]     6952       sk21
Enter credential_id : 6952
[2022-06-18 04:25:09 UTC] Creating source : nexla_test
[2022-06-18 04:25:11 UTC] Data Source created with ID: 11213
[2022-06-18 04:25:13 UTC] Creating Dataset : 1 - nexla_test
[2022-06-18 04:25:15 UTC] ID: 17290, Name: 1 - nexla_test
[2022-06-18 04:25:18 UTC] Created Dataset with ID, 17290 from dataset 17289
[2022-06-18 04:25:18 UTC] Created Dataset id ====> 17290
[2022-06-18 04:25:18 UTC] Parent dataset id for sink is ===> 17289
[2022-06-18 04:25:20 UTC] Credential Name given on Exported Pipeline :  Abs_test
[2022-06-18 04:25:20 UTC] credential_id      credential_name
[2022-06-18 04:25:20 UTC]     7040       Abs_test (Copy) (Copy)
[2022-06-18 04:25:20 UTC]     7038       Abs_test (Copy)
[2022-06-18 04:25:20 UTC]     6954       Abs_test
[2022-06-18 04:25:20 UTC]     6953       Azure Blob Storage_test
Enter credential_id : 6954
[2022-06-18 04:25:30 UTC] Creating Sink : 1234
[2022-06-18 04:25:32 UTC] Sink created with ID: 9546, and associated with dataset 17289

Tutorial Goal​

The Philosophy​

Step 1: Export Development Flow Specification​

Step 2: Modify Necessary Source Or Destination Properties​

Step 3: Create Production Flows​

Tutorial Goal

The Philosophy

Step 1: Export Development Flow Specification

Step 2: Modify Necessary Source Or Destination Properties

Step 3: Create Production Flows