Migrating Flows from Dev To Production

Tutorial Goal

In this tutorial we will use Nexla CLI to clone a development/QA Data Flow into a production Data Flow. We will also look at modifying flow specification (ex: exact path to production data) as part of this migration workflow.

While migration from QA to production is one common use case of this recipe, you might find these steps handy for getting started with new flows which are almost the same as some of your existing Data Flows. Note that even though we are using Nexla CLI for this tutorial, we could very well have used Nexla UI or Nexla API for achieving the same goal.


The Philosophy

When working with large volumes of data, we recommend starting with a development/QA Data Flow that acts on a small slice of data. This faciliates quick iterations till you are you fully satisfied with flow specifications. We also recommend maintaing this development/QA Data Flow throughout the lifecycle of your production Data Flow so that you can quickly and safely test out new requirements before turning them live.

And so we have tools to faciliate easy migration and management of QA and production Data Flows!

Step 1: Export Development Flow Specification

Let's first start with exporting the specifications of the Data Flow we want to migrate. We will use Nexla CLI's pipeline export method which exports one or more pipelines originating from a data source. Calling this method will list out pipelines originating from that flow and allows us to export one or all of the pipeline recipes into a local JSON file.

For this tutorial we want to migrate flow from Source ID 6862 to Destination ID 6520. So we will call nexla pipeline export -s=6862 --output_file ~/Desktop/export_6862.json which will save all the rules for this flow into a new local file ~/Desktop/export_6862.json.

Note that we could have chosen both pipelines below if we want to migrate both chains from source 6862.

Export QA Pipeline
➜ nexla pipeline export -s=6862 --output_file ~/Desktop/export_6862.json
| pipeline_id | source | detected_dataset | dataset_1 | sink |
| 1 | 6862 | 10393 (1 - small-load,PAUSED) | 10398 (1 - small-load,PAUSED) | 6513 (API Load: Small) |
| 2 | 6862 | 10393 (1 - small-load,PAUSED) | 10410 (Copy of 1 - small-load,PAUSED) | 6520 (API-Add Users) |
Enter pipeline ids : 2
[2020-06-25 19:09:28 UTC] Creating template for dataset, sink and datamap
[2020-06-25 19:09:28 UTC] Scanning pipeline 2
[2020-06-25 19:09:30 UTC] Plugin Script Sink Found
[2020-06-25 19:09:30 UTC] Fetching source details
[2020-06-25 19:09:30 UTC] exporting json..

Step 2: Modify Necessary Source Or Destination Properties

Since we want to migrate from development to production data flow, it is quite likely that we might need to modify the actual path of input and output data in those connectors. Additionally, we might need to change the frequency of the production pipeline.

Now we can always edit the source and destination after creation of the production flows, but we might as well easily do in the exported JSON specification. We'll open up the specification file in our preferred text editor and modify the source_config and sink_config properties as needed. These two properties contain all the information for triggering the source/destination.

We might also need different credentials, but we don't need to modify that in the specification. Nexla CLI will ask us to choose credentials when importing the specification.

Step 3: Create Production Flows

Now that we have the production specification ready, we will again use the Nexla CLI to import this specification and trigger the production flows.

Now, if the production pipeline needs to be created in a different Nexla environment or user account, we just need to switch the CLI context. Refer to Nexla CLI Authentication for details. For this tutorial we will create the production pipeline in same environment and user account as QA pipleine.

All we need to do is call Nexla CLI's pipeline import method with our specification file. The CLI will create a new source, datasets, and destination based on this specification. It will also ask us for credentials that are relevant for this source and destination.

Import Pipeline Specification
-> nexla pipeline import --input_file ~/Desktop/export_6862.json
[2020-06-25 19:17:58 UTC] Credential Name given on Exported Pipeline : GCS System User
[2020-06-25 19:17:58 UTC] Available gcs credentials
[2020-06-25 19:17:58 UTC] credential_id credential_name
[2020-06-25 19:17:58 UTC] 6285 GCS System User
[2020-06-25 19:17:58 UTC] 6027 Nexla GCS
Enter credential_id : 6027
[2020-06-25 19:18:16 UTC] Creating source : Copy of small-load
[2020-06-25 19:18:17 UTC] Data Source created with ID: 7804
[2020-06-25 19:18:17 UTC] Activating Source now ....
[2020-06-25 19:18:18 UTC] Successfully activated.
[2020-06-25 19:18:18 UTC] Waiting for Dataset to be detected
[2020-06-25 19:18:29 UTC] Detected Datset id is [12040]
[2020-06-25 19:18:31 UTC] detected_dataset_1 is matching with 12040
[2020-06-25 19:18:31 UTC] Creating Dataset : 1 - Copy of small-load
[2020-06-25 19:18:34 UTC] ID: 12041, Name: 1 - Copy of small-load
[2020-06-25 19:18:34 UTC] Created Dataset with ID, 12041 from dataset 12040
[2020-06-25 19:18:34 UTC] Created Dataset id ====> 12041
[2020-06-25 19:18:34 UTC] Parent dataset id for sink is ===> 12041
*** Create Sink "API-Add Users" from dataset 12041 in UI ***