Skip to main content

Connect to Databricks (Legacy UI)

Nexla's bi-directional connectors can both send data to and receive data from any data system. This means that once a user has created or gained access to a credential for any data system, building any data flow to ingest data from or send data to a location within that data system requires only a few simple steps.

1. Credentials

This section provides information about and step-by-step instructions for creating a new Databricks credential in Nexla.

1.1 Add a New Databricks Credential

  1. After selecting the data source/destination type, in the Authenticate.png screen, click AddANewCredential.png. This will open the Add New Credential window.

      AddNewCredential.png

  2. Enter a name for the credential in the Credential Name field.

      CredName.png

  3. Optional: Enter a description for the credential in the Credential Description field.

      CredDescription.png

  4. Select how the Databricks authentication information will be entered from the URL Format pulldown menu.

    • JDBC URL Select this option to enter the authentication information as a JDBC URL.
    • HTTP Path Parts Select this option to enter the authentication information as parts that will be combined by Nexla to create the connection string.

      URL_Format.png

    • To use the JDBC URL format, continue to Section 1.2.
    • To use the HTTP Path Parts format, continue to Section 1.3.

1.2 JDBC URL Format

  1. Enter the JDBC URL of the Databricks location in the JDBC URL field.

    The JDBC URL should be in the form of jdbc:spark:/....

      JDBC_URL.png

  2. Continue to Section 1.4.

1.3 HTTP Path Parts Format

  1. Enter the hostname of the Databricks database in the Host field.

    The hostname is typically an IP address or text in the format company.domain.com.

    Do not include the connection protocol.

      Host.png

  2. Enter the cluster port number to which the Databricks source connects in the Port field.

      Port.png

  3. Enter the HTTP path of the Databricks SQL endpoint in the HTTP Path field.

    The HTTP path is typically in the form sql/protocolv1/o/<id>/0916-102516-naves603.

    The HTTP path can be found under the JDBC settings in the Databricks console.

      HTTP_Path.png

  4. Enter the username associated with the Databricks account in the Username field.

      Username.png

  5. Enter the password associated with the Databricks account in the Password field.

      Password.png

  6. Continue to Section 1.4.

1.4 Configure the Databricks Environment Settings

  1. Optional: Enter the name of the Databricks database to which Nexla should connect in the Database Name field.

    In Databricks, the terms "database" and "schema" are used interchangeably. For more information about databases/schema and other data objects in Databricks, see this Databricks article.

      DatabaseName.png

  2. Optional: Enter the name of the Databricks schema to which Nexla should connect in the Schema Name field.

      SchemaName.png

  3. Select the type of cloud environment used by the Databricks instance.

    Typically, the Databricks cloud environment is used, but Nexla also supports connecting to Databricks instances that run in other cloud environments.

      CloudType.png

  4. Section 1.5 provides information about advanced settings available for Google BigQuery credentials along with step-by-step instructions for configuring each setting.

    • To configure any desired additional advanced settings for this credential, continue to Section 1.5, and complete the relevant steps.

    • To create this credential without configuring any advanced settings, continue to Section 1.6.

1.5 Advanced Settings

This section covers optional advanced credential settings. To create the Databricks credential without configuring advanced settings, skip to Section 1.6.

  1. Click AdvSettings.png at the bottom of the Add New Credential window to access additional available settings for the Databricks credential.
  • Access the Databricks database via an SSH Tunnel

    1. If the Databricks database from which data should be read is not publicly accessible, check the box next to RequiresSSH.png. This will append additional related fields to be populated in the Add New Credential window.

      Selecting this option allows Nexla to connect to a bastion host via SSH, and the database connection will then be provided through the SSH host.

        SSH_Fields.png

    2. Enter the SSH tunnel hostname or IP address of the bastion host running the SSH tunnel server that has access to the database in the SSH Tunnel Host field.

        SSH_TunnelHost.png

    3. Enter the port of the tunnel bastion host to which Nexla will connect in the SSH Tunnel Port field.

        SSH_TunnelPort.png

    4. Create an SSH username for Nexla in the bastion host, and enter that username in the Username for Tunnel field.

      Usually, the username is set as "nexla".

        TunnelUsername.png

1.6 Save and Create the Databricks Credential

  1. Once all of the relevant steps in the above sections have been completed, click Save.png at the bottom of the Add New Credential screen to save the credential and all entered information.

      Save2.png

  2. The newly added credential will now appear in a tile on the Authenticate.png screen and can be selected for use with a new data source or destination.

      CredentialsList.png

2. Add a Databricks Data Source

  1. Log into Nexla with your provided credentials to view the Nexla Dashboard.

    If you need credentials, contact support@nexla.com.

  2. Select Sources.png from the menu on the left.

  3. Click Create_New_Source.png in the upper right corner to begin adding a new data source.

  4. Select Databricks.png, and click Next.png in the upper right corner of the screen to begin adding the Databricks data source.

2.1 Configure the Databricks Data Source

In Nexla, the Deltabricks database source can be configured using either Table Mode or Query Mode.

Table Mode allows users to specify the database source through a simple selection method. This mode is equivalent to running a simple, optimized SELECT operation on any database table, while providing additional customization options to filter rows. To use this mode for configuration, see Section 2.2.

Query Mode allows users to perform a complex query to specify the database source. This mode provides a free-form query editor that can be used to perform any complex query written using the syntax and convention supported by the underlying database and/or warehouse. To use this mode for configuration, see Section 2.3.

2.2 Table Mode

  1. To configure the Databricks source using Table Mode, ensure that the Table_Mode2.png tab is selected.

    Table_Mode2.png
  2. Find the database location from which Nexla should read data. Expand files as necessary by clicking the Expand.png icon next to each.

  3. To select a file, hover over it, and click the Select.png button that appears.

    Once a file is selected, the button will display Selected.png, and the path of the selected location will be shown at the top of the list.

    Selected.png
  4. Optional: Click the Test.png button to the right of the mode-selection tabs to generate preview samples of data from the selected source at the bottom of the screen.

    Test2.png

2.3 Query Mode

  1. To configure the Databricks source using Query Mode, select the Query_Mode.png tab.

    Query_Mode2.png
  2. Enter the query specifying the database location from which Nexla should read data in the Custom Query to Fetch Data field, adhering to the Google BigQuery SQL syntax and convention.

    In this mode, Nexla supports any query that can be written following the Databricks syntax and convention, regardless of complexity.

    For more information about Databricks SQL syntax, see this Databricks SQL Query reference page.

    EnterQuery.png
  3. Optional: Click the Test.png button to the right of the mode-selection tabs to generate preview samples of the data selected according to the entered query at the bottom of the screen.

    QueryTest2.png
  4. Continue to Section 2.4.

2.4 Data Ingestion Scheduling

Nexla can be configured to scan the data source for data at a variety of frequencies, with options ranging from a one-time scan to scanning every 15 minutes. Optionally, users can also specify the time at which Nexla should scan the data source.

  1. In the Advanced Settings menu on the right, use the Scheduling pulldown menu to specify how often Nexla should fetch data from the source.

    The default setting configures Nexla to fetch data from the source once every day.

      FetchData.png

    • For options such as "Every N Hours" and "Every N Days", use the additional pulldown menu that appears when these options are selected to specify the value of N defining the fetching frequency.

        N_Value.png

  2. Optional: To set a specific time at which Nexla should fetch any new data from the source, check the Set box, and use the pulldown menus to select the desired time.

      SelectTime.png

2.5 Optional Advanced Settings

  • When the data source location is selected using Table Mode:
  1. Optional: Use the Table Scan Mode pulldown menu under the Data Selection heading to configure how Nexla should scan the table selected in Section 2.2 during each ingestion cycle.

    This option is useful when working with a source containing historical data that should not be scanned.

    By default, Nexla is configured to scan the entire selected table during each ingestion cycle, which is equivalent to running a SELECT clause on the table.

      TableScanMode.png

    • Read the whole table This option configures Nexla to scan the entire table, which is equivalent to running a SELECT clause on the table.
    • Start reading from a specific ID This option configures Nexla to begin scanning the table at a specific ID, which is stored in a numeric column.
    • Start reading from a specific ID and timestamp This option configures Nexla to begin scanning the table at a specific ID and timestamp.
    • Start reading from a specific timestamp This option configures Nexla to begin scanning the table at a specific timestamp, which is stored in a datetime column.
  • When the data source location is selected using Query Mode:
  1. Optional: If the query entered in Section 2.3 includes statements that should also be committed to the database after ingestion, select "True" from the Perform Database Commit After Read pulldown menu under Post Read Settings.

    Typically, a database commit does not need to be performed, and this setting can be left as "False".

      DatabaseCommit.png

2.6 Save and Create the Databricks Data Source

  1. Once all of the above steps have been completed, click Create.png in the upper right corner of the screen to save and create the new Databricks data source.

  2. The confirmation page indicates that the Databricks database has been successfully created as a data source.

      Success.png

  3. Optional: Edit the name of the newly added data source by clicking on the name field and entering the desired text.

      EditName.png

  4. Optional: Add a description of the data source by clicking on the Description.png field below the data source name and entering the desired text.

      EditDescription.png

    • To return to My Data Sources, click Done.png in the upper right corner of the screen.
  • To view the newly created data source, click View_Source.png.

  • To view Nexsets detected in the newly added source, click View_Detected_Nexsets.png.