The connector.databricks package provides a convenient interface for accessing and interacting with Databricks volumes and tables directly from R. This vignette will guide you through the process of connecting to Databricks, retrieving data, and performing various operations using this package.
This package is meant to be used with connector package, which provides a common interface for interacting with various data sources. The connector.databricks package extends the connector package to support Databricks volumes and tables.
Installation
You can install the connector.databricks from CRAN package using the following command:
# Install from CRAN
install.packages("connector.databricks")
Development version
To get a bug fix or to use a feature from the development version, you can install the development version of connector.databricks from GitHub.
pak::pak("novonordisk-opensource/connector.databricks")
Usage
Here is an example of how to connect to databricks and retrieve data:
library(connector.databricks)
# Connect to databricks tables using DBI
con <- connector_databricks_table(
http_path = "path-to-cluster",
catalog = "my_catalog",
schema = "my_schema"
)
# Connect to databricks volume
con <- connector_databricks_volume(
catalog = "my_catalog",
schema = "my_schema",
path = "path-to-file-storage"
)
When connecting to Databricks tables, authentication to databricks is handled by the odbc::databricks()
driver and supports general use of personal access tokens and credentials through Posit Workbench. See also odbc::databricks()
On more information on how the connection to Databricks is established. Currently, most package functions rely on brickster package.
When connecting to Databricks volumes, authentication is handled using brickster
package. See also this vignette on more information how the authentication is handled.
Hopefully in the future whole backend will rely completely only on brickster
package.
Both types of connections share similar interfaces for reading and writing data. Tables should be used with tabular types of data, while volumes should be used with unstructured data.
Example of how to use the connector object:
# List content
con$list_content_cnt()
# Write a file
con$write_cnt(iris, "iris.rds")
# Read a file
con$read_cnt("iris.rds") |>
head()
# Remove a file
con$remove_cnt("file_name.csv")
Usage with connector package
Here is an example how it can be used with connector package and configuration YAML file (for more information take a look at the connector package):
# Connect using configuration file
connector <- connector::connect(
config = system.file(
"config",
"example_yaml.yaml",
package = "connector.databricks"
)
)
# List contents in Volume
connector$volumes$list_content_cnt()
# Get databricks connection object from Tables
connector$tables$get_conn()
# Write a file
connector$volumes$write_cnt(iris, "Test/iris.csv")
# Read a file
connector$tables$read_cnt("example_data")