Package 'wubik'

Title: Helpful R Functions for Databricks at WashU
Description: This package provides helpful functions for using R on Databricks at WashU.
Authors: Matthew Schuelke [aut, cre]
Maintainer: Matthew Schuelke <[email protected]>
License: GPL (>= 3)
Version: 0.1.0
Built: 2024-11-02 04:27:34 UTC
Source: https://github.com/the-mad-statter/wubik

Help Index


As csv

Description

As csv

Usage

as_csv(x, ...)

Arguments

x

object to parse as csv

...

other arguments passed to write.csv()

Value

object x as parsed csv string

Examples

as_csv(mtcars)

As Datetime mm/dd/yyyy hh:mm:ss mi

Description

As Datetime mm/dd/yyyy hh:mm:ss mi

Usage

as_datetime_mdy_hms_mi(x, ...)

Arguments

x

character representation of datetime

...

additional arguments to lubridate::as_datetime

Value

datetime representation of the datetime

Examples

as_datetime_mdy_hms_mi("1/1/2001 8:00:32 PM")

collect: Collects all the elements of a SparkDataFrame and coerces...

Description

collect: Collects all the elements of a SparkDataFrame and coerces...

Usage

collect(x, ...)

Arguments

x

object to collect

...

additional arguments

See ⁠SparkR::[collect][SparkR::collect]⁠ for details.


Box FTPS Download

Description

Box FTPS Download

Usage

dbutils.box.ftps_download(
  remote,
  local = basename(remote),
  home = Sys.getenv("WUSTL_BOX_HOME"),
  user = Sys.getenv("WUSTL_BOX_USER"),
  pass = Sys.getenv("WUSTL_BOX_PASS"),
  verbose = FALSE,
  connecttimeout = 10000,
  ...
)

Arguments

remote

the path from which the content is to be downloaded.

local

file name or path of the local file to write..

home

prepended to remote to form the full remote path.

user

Box username (i.e., WashU email)

pass

unique password for external applications. Created at https://wustl.app.box.com/account

verbose

emit some progress output

connecttimeout

desired connection timeout in milliseconds

...

other arguments passed to curl::handle_setopt()

Examples

## Not run: 
dbutils.box.ftps_download(
  "my_img.png",
  "/dbfs/home/my_user/my_img.png"
)

## End(Not run)

Box FTPS Upload

Description

Box FTPS Upload

Usage

dbutils.box.ftps_upload(
  local,
  remote = basename(local),
  home = Sys.getenv("WUSTL_BOX_HOME"),
  user = Sys.getenv("WUSTL_BOX_USER"),
  pass = Sys.getenv("WUSTL_BOX_PASS"),
  verbose = FALSE,
  connecttimeout = 10000,
  ...
)

Arguments

local

file name or path of the local file to be uploaded.

remote

the path to which the content is to be uploaded.

home

prepended to remote to form the full remote path.

user

Box username (i.e., WashU email)

pass

unique password for external applications. Created at https://wustl.app.box.com/account

verbose

emit some progress output

connecttimeout

desired connection timeout in milliseconds

...

other arguments passed to curl::handle_setopt()

Examples

## Not run: 
dbutils.box.ftps_upload("/dbfs/home/my_user/my_img.png")

## End(Not run)

Box Read

Description

Box Read

Usage

dbutils.box.read(remote, ...)

Arguments

remote

the path from which the contents are to be read.

...

other arguments passed to curl::handle_setopt()

Value

file contents as character string

Examples

## Not run: 
dbutils.box.ftps_download("my_sql_file.sql")

## End(Not run)

Box Write

Description

Box Write

Usage

dbutils.box.write(x, remote, ...)

Arguments

x

file contents to write

remote

the path to which the content is to be uploaded.

...

other arguments passed to curl::handle_setopt()

Examples

## Not run: 
dbutils.box.write("hello world!", "hello world.txt")

## End(Not run)

Cluster persist app

Description

Cluster persist app

Usage

dbutils.cluster.persist_app(seconds = 3600)

Arguments

seconds

seconds to persist


Terminate Cluster

Description

Terminate Cluster

Usage

dbutils.cluster.terminate(
  cluster_id = Sys.getenv("DATABRICKS_CLUSTER_ID"),
  host = Sys.getenv("DATABRICKS_HOST"),
  token = Sys.getenv("DATABRICKS_TOKEN")
)

Arguments

cluster_id

id of the cluster

host

databricks host

token

databricks personal access token

Value

httr2 response object

Examples

## Not run: 
dbutils.cluster.terminate(
  cluster_id = "0123-456789-xxxxxxxx",
  host = "adb-1234567812345678.12.azuredatabricks.net"
)

## End(Not run)

Current user

Description

Current user

Usage

dbutils.credentials.current_user(domain = FALSE)

Arguments

domain

logical indicating whether to return username with or without the domain name

Value

username of the current user

Examples

## Not run: 
dbutils.credentials.current_user()

## End(Not run)

FileStore URL

Description

FileStore URL

Usage

dbutils.filestore_url(
  path,
  display_html = TRUE,
  user = dbutils.credentials.current_user(),
  host = Sys.getenv("DATABRICKS_HOST"),
  org_id = Sys.getenv("DATABRICKS_ORG_ID")
)

Arguments

path

to the desired file

display_html

print as clickable link or just the url as text

user

name of the user

host

the databricks host

org_id

the organization id

Examples

dbutils.filestore_url(
  "out.csv",
  FALSE,
  "dborker",
  "adb-1234567812345678.12.azuredatabricks.net",
  "1234567890123456"
)

Copy content

Description

Copy content

Usage

dbutils.fs.cp2(from, to)

Arguments

from

from path

to

to path

Value

TRUE if successful


Path exists

Description

Path exists

Usage

dbutils.fs.exists(x)

Arguments

x

path to test

Value

TRUE if path exists

Examples

## Not run: 
dbutils.fs.exists("~")

## End(Not run)

List directory

Description

List directory

Usage

dbutils.fs.ls2(x)

Arguments

x

directory to list

Value

a dplyr::tibble() of directory contents

Examples

## Not run: 
dbutils.fs.dir("~")

## End(Not run)

File system home

Description

File system home

Usage

dbutils.home.path(
  sys = c("file", "dbfs", "filestore", "abfs"),
  fmt = c("file", "spark"),
  user = dbutils.credentials.current_user(),
  abfs_group = "data-brokers",
  abfs_host = Sys.getenv("DATABRICKS_ABFSS_HOST")
)

Arguments

sys

which file system

fmt

how to format the address

user

name of the user

abfs_group

group to which the user belongs

abfs_host

host name

Value

path to home directory in desired file system

Examples

dbutils.home.path(user = "dborker")
dbutils.home.path(fmt = "spark", user = "dborker")
dbutils.home.path("dbfs", user = "dborker")
dbutils.home.path("dbfs", "spark", "dborker")
dbutils.home.path("filestore", user = "dborker")
dbutils.home.path("filestore", "spark", "dborker")
dbutils.home.path(
  "abfs",
  user = "dborker",
  abfs_host = "file-share-acmeincprodadls.dfs.core.windows.net"
)

Persist home directory

Description

Persist home directory

Usage

dbutils.home.persist(
  ephemeral_path = dbutils.home.path("file", "spark"),
  persistent_path = dbutils.home.path("dbfs", "spark")
)

Arguments

ephemeral_path

path to ephemeral home location

persistent_path

path to persistent home location

Value

TRUE if successful

Examples

## Not run: 
dbutils.home.persist()

## End(Not run)

Restore home directory

Description

Restore home directory

Usage

dbutils.home.restore(
  persistent_path = dbutils.home.path("dbfs", "spark"),
  ephemeral_path = dbutils.home.path("file", "spark")
)

Arguments

persistent_path

path to persistent home location

ephemeral_path

path to ephemeral home location

Value

TRUE if successful

Examples

## Not run: 
dbutils.home.restore()

## End(Not run)

Cluster-scoped init script add-sudo-.sh

Description

Cluster-scoped init script add-sudo-.sh

Usage

dbutils.ini.add_sudo_user_sh(pass, user = dbutils.credentials.current_user())

Arguments

pass

password

user

username

Examples

## Not run: 
dbutils.ini.add_sudo_user_sh("rosebud", "dborker")

## End(Not run)

Cluster-scoped init script install-ctakes.sh

Description

Cluster-scoped init script install-ctakes.sh

Usage

dbutils.ini.install_ctakes_sh()

Examples

## Not run: 
dbutils.ini.install_ctakes_sh()

## End(Not run)

Cluster-scoped init script install-databricks-cli-for-root.sh

Description

Cluster-scoped init script install-databricks-cli-for-root.sh

Usage

dbutils.ini.install_databricks_cli_for_root_sh()

Examples

## Not run: 
dbutils.ini.install_databricks_cli_for_root.sh()

## End(Not run)

Cluster-scoped init script install-databricks-cli-for-.sh

Description

Cluster-scoped init script install-databricks-cli-for-.sh

Usage

dbutils.ini.install_databricks_cli_for_user_sh(
  user = dbutils.credentials.current_user()
)

Arguments

user

user account for which to write .databrickscfg

Examples

## Not run: 
dbutils.ini.install_databricks_cli_for_user_sh()

## End(Not run)

Cluster-scoped init script install-odbc-driver.sh

Description

Cluster-scoped init script install-odbc-driver.sh

Usage

dbutils.ini.install_odbc_driver_sh()

Note

Because this script will pull values from Spark environment variables: DATABRICKS_HOST, DATABRICKS_HTTP_PATH, and DATABRICKS_TOKEN, ensure these environment variables are present during cluster imitation. To do so it is recommended that these values be stored in an Azure Key Vault and set in the cluster configuration UI: Spark -> Environment variables. For example: DATABRICKS_TOKEN={{secrets/wusm-prod-biostats-kv/DATABRICKS-TOKEN}}. Connection can be tested in a terminal with ⁠isql -v Databricks⁠.

Examples

## Not run: 
## Generate init script
dbutils.ini.install_odbc_driver_sh()

## R usage of driver
conn <- DBI::dbConnect(
  drv = odbc::odbc(),
  dsn = "Databricks"
)

print(DBI::dbGetQuery(
  conn,
  "SELECT * FROM cleansed.epic_clarity.clarity_orgfilter_pat_enc_hsp LIMIT 2"
))

## End(Not run)

Cluster-scoped init script install-openai.sh

Description

Cluster-scoped init script install-openai.sh

Usage

dbutils.ini.install_openai_sh(version = "0.28.1")

Arguments

version

desired version

Examples

## Not run: 
dbutils.ini.install_openai_sh()

## End(Not run)

Cluster-scoped init script install-rstudio-server.sh

Description

Cluster-scoped init script install-rstudio-server.sh

Usage

dbutils.ini.install_rstudio_server_sh()

Examples

## Not run: 
dbutils.ini.install_rstudio_server_sh()

## End(Not run)

Cluster-scoped init script for directory restorations

Description

Cluster-scoped init script for directory restorations

Usage

dbutils.ini.restore_directory_sh(persistent_path, ephemeral_path, script_name)

Arguments

persistent_path

path to the persistent directory

ephemeral_path

path to the ephemeral directory

script_name

name to use for the script file


Cluster-scoped init script restore-home-directory-for-.sh

Description

Cluster-scoped init script restore-home-directory-for-.sh

Usage

dbutils.ini.restore_home_directory_for_user_sh(
  user = dbutils.credentials.current_user(),
  persistent_path = dbutils.home.path("dbfs", "file", user),
  ephemeral_path = dbutils.home.path("file", "file", user)
)

Arguments

user

current user

persistent_path

path to the persistent home files

ephemeral_path

path to the ephemeral home files

Examples

## Not run: 
dbutils.ini.restore_home_directory_for_user_sh()

## End(Not run)

Cluster-scoped init script restore-r-library-for-.sh

Description

Cluster-scoped init script restore-r-library-for-.sh

Usage

dbutils.ini.restore_r_library_for_user_sh(
  user = dbutils.credentials.current_user(),
  persistent_path = dbutils.rlib.path("persistent", "file", user),
  ephemeral_path = dbutils.rlib.path("ephemeral", "file", user)
)

Arguments

user

current user

persistent_path

path to the persistent library

ephemeral_path

path to the ephemeral library

Examples

## Not run: 
dbutils.ini.restore_r_library_for_user_sh()

## End(Not run)

Cluster-scoped init script add-facl--to-.sh

Description

Cluster-scoped init script add-facl--to-.sh

Usage

dbutils.ini.setfacl_user_to_path_sh(
  user = dbutils.credentials.current_user(),
  path = dbutils.rlib.path(user = user),
  perms = "rwx"
)

Arguments

user

user to add

path

path for which to add user with given permissions

perms

desired permissions to grant

Examples

## Not run: 
## modify R libary path permissions
dbutils.ini.add_facl_user_to_path_sh()

## modify home directory permissions
dbutils.ini.add_facl_user_to_path_sh(path = dbutils.home.path())

## End(Not run)

Write cluster-scoped init script

Description

Write cluster-scoped init script

Usage

dbutils.ini.write(
  x,
  name = attr(x, "name"),
  user = dbutils.credentials.current_user(),
  path = sprintf("/Workspace/Users/%s/cluster_init_scripts/%s", user, name)
)

Arguments

x

script contents to write

name

name of the script

user

current user

path

path to write script

Value

TRUE if successful

Examples

## Not run: 
## Restore R Library for Current User
dbutils.ini.write(dbutils.ini.restore_r_library_for_user_sh())

## Write .Rprofile(s)
### for Current User
dbutils.ini.write(dbutils.ini.write_rprofile_for_user_sh())
### for Root
dbutils.ini.write(dbutils.ini.write_rprofile_for_root_sh())

## Write .Renviron
kvps <- tibble(key = "key_name", value = "key_value")
### for Current User
dbutils.ini.write(dbutils.ini.write_renviron_for_user_sh(kvps))
### for Root
dbutils.ini.write(dbutils.ini.write_renviron_for_root_sh(kvps))

## Databricks CLI
### for Current User
dbutils.ini.write(dbutils.ini.install_databricks_cli_for_user_sh())
### for Root
dbutils.ini.write(dbutils.ini.install_databricks_cli_for_root_sh())

## Install RStudio Server
dbutils.ini.write(dbutils.ini.install_rstudio_server_sh())

## Add Current User as Super User with Password
dbutils.ini.write(dbutils.ini.add_sudo_user_sh("rosebud"))

## Install ODBC Driver
dbutils.ini.write(dbutils.ini.install_odbc_driver_sh())

## Edit (Home) Directory Permissions for Current User
dbutils.ini.write(dbutils.ini.add_facl_user_to_path_sh())

## Install cTAKES
dbutils.ini.write(dbutils.ini.install_ctakes_sh())

## Install OpenAI
dbutils.ini.write(dbutils.ini.install_openai_sh())

## End(Not run)

Cluster-scoped init script write-Renviron-for-root.sh

Description

Cluster-scoped init script write-Renviron-for-root.sh

Usage

dbutils.ini.write_renviron_for_root_sh(key_values)

Arguments

key_values

dplyr::tibble() of key and value pairs.

Examples

## Not run: 
dbutils.ini.write_renviron_for_root_sh(
  dplyr::tibble(
    key = c("KEY_1", "KEY_2"),
    value = c("VALUE_1", "VALUE_2")
  )
)

## End(Not run)

Cluster-scoped init script write-Renviron-for-.sh

Description

Cluster-scoped init script write-Renviron-for-.sh

Usage

dbutils.ini.write_renviron_for_user_sh(
  key_values,
  user = dbutils.credentials.current_user()
)

Arguments

key_values

dplyr::tibble() of key and value pairs.

user

user for which to write .Renviron

Examples

## Not run: 
dbutils.ini.write_renviron_for_user_sh(
  dplyr::tibble(
    key = c("KEY_1", "KEY_2"),
    value = c("VALUE_1", "VALUE_2")
  )
)

## End(Not run)

Cluster-scoped init script write-Rprofile-for-root.sh

Description

Cluster-scoped init script write-Rprofile-for-root.sh

Usage

dbutils.ini.write_rprofile_for_root_sh(
  x = sprintf(".libPaths(c(\"%s\", .libPaths()))", dbutils.rlib.path("ephemeral",
    "file"))
)

Arguments

x

.Rprofile contents

Examples

## Not run: 
dbutils.ini.write_rprofile_for_root_sh()

## End(Not run)

Cluster-scoped init script write-Rprofile-for-.sh

Description

Cluster-scoped init script write-Rprofile-for-.sh

Usage

dbutils.ini.write_rprofile_for_user_sh(
  user = dbutils.credentials.current_user(),
  x = sprintf(".libPaths(c(\"%s\", .libPaths()))", dbutils.rlib.path("ephemeral",
    "file", user))
)

Arguments

user

user for which to write .Rprofile

x

.Rprofile contents

Examples

## Not run: 
dbutils.ini.write_rprofile_for_user_sh()

## End(Not run)

List details of installed packages

Description

List details of installed packages

Usage

dbutils.rlib.details(libpath = .libPaths())

Arguments

libpath

path(s) to libraries

Value

a dplyr::tibble() containing package name, path, and version

Examples

## Not run: 
dbutils.rlib.details()

## End(Not run)

Install the R Package Spark

Description

Install the R Package Spark

Usage

dbutils.rlib.install_spark(version = "3.3.1", ...)

Arguments

version

version to install

...

additional arguments passed to pak::pkg_install()

Examples

## Not run: 
dbutils.rlib.install_spark()

## End(Not run)

Library path

Description

Library path

Usage

dbutils.rlib.path(
  type = c("ephemeral", "persistent"),
  fmt = c("file", "spark"),
  user = dbutils.credentials.current_user()
)

Arguments

type

the type of path to generate

fmt

how to format the address

user

username of the current user

Value

path to the recommended ephemeral or persistent library locations

Examples

dbutils.rlib.path(user = "dborker")
dbutils.rlib.path(fmt = "spark", user = "dborker")
dbutils.rlib.path("persistent", user = "dborker")
dbutils.rlib.path("persistent", "spark", "dborker")

Persist R library

Description

Persist R library

Usage

dbutils.rlib.persist(
  ephemeral_path = dbutils.rlib.path("ephemeral", "spark"),
  persistent_path = dbutils.rlib.path("persistent", "spark")
)

Arguments

ephemeral_path

path to ephemeral library location

persistent_path

path to persistent library location

Value

TRUE if successful

Examples

## Not run: 
dbutils.rlib.persist()

## End(Not run)

Restore R library

Description

Restore R library

Usage

dbutils.rlib.restore(
  persistent_path = dbutils.rlib.path("persistent", "spark"),
  ephemeral_path = dbutils.rlib.path("ephemeral", "spark")
)

Arguments

persistent_path

path to persistent library location

ephemeral_path

path to ephemeral library location

Value

TRUE if successful

Examples

## Not run: 
dbutils.rlib.restore()

## End(Not run)

Set default library

Description

Set default library

Usage

dbutils.rlib.set_default(path = dbutils.rlib.path("ephemeral", "file"))

Arguments

path

path to desired library

Examples

## Not run: 
dbutils.rlib.set_default()

## End(Not run)

Load secrets

Description

Load secrets

Usage

dbutils.secrets.load(scope, select, pattern = "", replacement = "", ...)

Arguments

scope

name of the desired scope/key vault

select

character string containing a regular expression to be matched when selecting key vault secrets

pattern

character string containing a regular expression to be matched for replacement in the given key vault secret name

replacement

a replacement for matched pattern

...

further arguments passed to or from other methods

Note

Azure Key Vault does not allow ⁠_⁠ in secret names while - is a non-standard character for R variable names. Therefore, - in Key Vault secret names are replaced with ⁠_⁠ for R names after the the desired pattern-replacement has occurred.

Examples

## Not run: 
dbutils.secrets.load("wusm-prod-biostats-kv", ".+")

## End(Not run)

Load secrets from the Biostats Production Key Vault

Description

Load secrets from the Biostats Production Key Vault

Usage

dbutils.secrets.load.wusm_prod_biostats_kv(select = ".+", ...)

Arguments

select

character string containing a regular expression to be matched when selecting key vault secrets

...

further arguments passed to or from other methods

Examples

## Not run: 
dbutils.secrets.load.wusm_prod_biostats_kv()

## End(Not run)

Load secrets from the Databrokers Production Key Vault

Description

Load secrets from the Databrokers Production Key Vault

Usage

dbutils.secrets.load.wusm_prod_databrokers_kv(select = ".+", ...)

Arguments

select

character string containing a regular expression to be matched when selecting key vault secrets

...

further arguments passed to or from other methods

Examples

## Not run: 
dbutils.secrets.load.wusm_prod_databrokers_kv()

## End(Not run)

Retrieve secrets as a tibble

Description

Retrieve secrets as a tibble

Usage

dbutils.secrets.tbl_df(scope, select, pattern = "", replacement = "", ...)

Arguments

scope

name of the desired scope/key vault

select

character string containing a regular expression to be matched when selecting key vault secrets

pattern

character string containing a regular expression to be matched for replacement in the given key vault secret name

replacement

a replacement for matched pattern

...

further arguments passed to or from other methods

Examples

## Not run: 
dbutils.secrets.tbl_df("wusm-prod-biostats-kv", ".+")

## End(Not run)

Retrieve secrets as a tibble from the Biostats Production Key Vault

Description

Retrieve secrets as a tibble from the Biostats Production Key Vault

Usage

dbutils.secrets.tbl_df.wusm_prod_biostats_kv(select = ".+", ...)

Arguments

select

character string containing a regular expression to be matched when selecting key vault secrets

...

further arguments passed to or from other methods

Examples

## Not run: 
dbutils.secrets.tbl_df.wusm_prod_biostats_kv()

## End(Not run)

Retrieve secrets as a tibble from the Databrokers Production Key Vault

Description

Retrieve secrets as a tibble from the Databrokers Production Key Vault

Usage

dbutils.secrets.tbl_df.wusm_prod_databrokers_kv(select = ".+", ...)

Arguments

select

character string containing a regular expression to be matched when selecting key vault secrets

...

further arguments passed to or from other methods

Examples

## Not run: 
dbutils.secrets.tbl_df.wusm_prod_databrokers_kv()

## End(Not run)

Print an Init-Script

Description

Print an Init-Script

Usage

## S3 method for class 'init_script'
print(x, ...)

Arguments

x

object of class init_script

...

further arguments to be passed to or from methods


RStudio Server URL

Description

RStudio Server URL

Usage

rstudio_server_url(
  display_html = TRUE,
  host = Sys.getenv("DATABRICKS_HOST"),
  org_id = Sys.getenv("DATABRICKS_ORG_ID"),
  cluster_id = Sys.getenv("DATABRICKS_CLUSTER_ID"),
  port = 8787
)

Arguments

display_html

print as clickable link or just the url as text

host

the databricks host

org_id

the organization id

cluster_id

the cluster id

port

port for the server

Examples

rstudio_server_url(
  display_html = FALSE,
  host = "adb-1234567812345678.12.azuredatabricks.net",
  org_id = "1234567890123456",
  cluster_id = "1234-123456-1234abcd"
)

SQL Query

Description

SQL Query

Usage

sql(sqlQuery)

Arguments

sqlQuery

character string of sql syntax

See ⁠SparkR::[sql][SparkR::sql]⁠ for details.


Install wubik

Description

Install wubik

Usage

wubik_install(pkg = "the-mad-statter/wubik", ...)

Arguments

pkg

wubik github reference

...

additional arugments passed to pak::pkg_install()

Value

(Invisibly) A data frame with information about the installed package(s).

Examples

## Not run: 
wubik_install()

## End(Not run)