| Title: | Simultaneous Confidence Region Excursion Sets |
|---|---|
| Description: | Provides computational tools for estimating inverse regions and constructing the corresponding simultaneous outer and inner confidence regions. Acceptable input includes both one-dimensional and two-dimensional data for linear, logistic, functional, and spatial generalized least squares regression models. Functions are also available for constructing simultaneous confidence bands (SCBs) for these models. The definition of simultaneous confidence regions (SCRs) follows Sommerfeld et al. (2018) <doi:10.1080/01621459.2017.1341838>. Methods for estimating inverse regions, SCRs, and the nonparametric bootstrap are based on Ren et al. (2024) <doi:10.1093/jrsssc/qlae027>. Methods for constructing SCBs are described in Crainiceanu et al. (2024) <doi:10.1201/9781003278726> and Telschow et al. (2022) <doi:10.1016/j.jspi.2021.05.008>. |
| Authors: | Zhuoran Yu [aut, cre], Armin Schwartzman [aut], Junting Ren [aut], Julia Wrobel [aut] |
| Maintainer: | Zhuoran Yu <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-17 07:21:23 UTC |
| Source: | https://github.com/angelayustat/scores |
A spatial dataset containing historical (1971-1999) and future (2041-2069) mean summer (June–August) surface temperatures over North America, used for evaluating increase of mean summer temperature between the 20th and 21st centuries in North America, and constructing simultaneous confidence bands via generalized least squares (GLS) modeling.
data(climate_data)data(climate_data)
A list with the following components:
ZA list containing spatial data with three components:
x (longitude), y (latitude), and obs, a 3D array of observations with dimensions [lon, lat, n].
The first na slices of z come from mean summer temperature (June-August) in North America recorded from 1971 to 1999,
and the last nb slices contain mean summer temperature from 2041 to 2069.
maskA logical or numeric matrix of dimensions length(lon) × length(lat).
Values are set to 1 for land and NA elsewhere based on the elevation matrix orog > 0.
XA numeric design matrix with na + nb rows and 4 columns, constructed for generalized least squares (GLS) regression.
The rows correspond to spatial replicates from na current years and nb future years.
The columns are:
X1: Group indicator (0 for current years (1971-1999), 1 for future years (2041-2069))
X2: Intercept
X3: Centered time variable ta for current years (1971-1999) (0 for future years (2041-2069))
X4: Centered time variable tb for future years (2041-2069) (0 for current years (1971-1999))
correlationA character string set to "corAR1", indicating that an autoregressive correlation structure of order 1 (AR(1))
is used for GLS fitting.
The data are arranged on a regular longitude–latitude grid, with spatial masking for land-only analysis. AR(1) correlation structure is assumed for statistical modeling.
Processed from data-raw/climate_data.R using the readr package.
Sommerfeld, M., Sain, S., & Schwartzman, A. (2018). Confidence regions for spatial excursion sets from repeated random field observations, with an application to climate. Journal of the American Statistical Association, 113(523), 1327–1340. doi:10.1080/01621459.2017.1341838
This function computes Correlation and Multiplicity Adjusted (CMA) confidence bands for a specified group in a functional outcome regression model using parameter simulations approach with Gaussian multiplier bootstrap.
cma( data_df, object, fitted = TRUE, alpha = 0.05, outcome, domain, subset = NULL, id, nboot = NULL )cma( data_df, object, fitted = TRUE, alpha = 0.05, outcome, domain, subset = NULL, id, nboot = NULL )
data_df |
A functional data frame that contains both name and values for
variables including functional outcome, domain (e.g. time) and ID (e.g. subject names)
used to fit the model |
object |
A fitted Function-on-Scalar Regression (FoSR) object (e.g., from mgcv::gam()/mgcv::bam()). |
fitted |
Logical. Whether to estimate the simultaneous confidence bands for the fitted mean function or the fitted parameter function
Default is |
alpha |
Significance level for SCB. Default is 0.05. |
outcome |
A character string specifying the name of the outcome variable used in the model. |
domain |
A character string specifying the name of the domain variable (e.g. time) used in the model. |
subset |
An atomic character vector (e.g., c("user = 1", "age = 30"))
specifying the target function for constructing the SCB.
Each element must be of the form |
id |
A character string specifying the name of the ID variable. |
nboot |
An integer specifying the number of bootstrap samples used to construct the confidence bands. Default is 10,000. |
A list containing:
mu_hat |
Estimated mean function for the group of interest. |
domain |
The domain used. |
se_hat |
Standard errors of the estimated means. |
scb_low |
Lower bound of the simultaneous confidence band. |
scb_up |
Upper bound of the simultaneous confidence band. |
type |
A character description of the output type. |
Crainiceanu, C. M., Goldsmith, J., Leroux, A., & Cui, E. (2024). Functional Data Analysis with R. Chapman and Hall/CRC.
# example using pupil data if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re")+ s(id, by = Phi3, bs="re") + s(id, by = Phi4, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) results <- cma(pupil_fpca, fosr_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") mean_mod <- mgcv::gam(percent_change ~ s(seconds, k = 5, bs = "cr") + s(seconds, by = use, k = 5, bs = "cr"), data = pupil, method = "REML") results <- cma(pupil, mean_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id", nboot = 100) }# example using pupil data if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re")+ s(id, by = Phi3, bs="re") + s(id, by = Phi4, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) results <- cma(pupil_fpca, fosr_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") mean_mod <- mgcv::gam(percent_change ~ s(seconds, k = 5, bs = "cr") + s(seconds, by = use, k = 5, bs = "cr"), data = pupil, method = "REML") results <- cma(pupil, mean_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id", nboot = 100) }
Computes the inverse logit transformation.
expit(x)expit(x)
x |
A numeric input. |
Value between 0 and 1.
expit(0) # returns 0.5 expit(c(-2, 0, 2))expit(0) # returns 0.5 expit(c(-2, 0, 2))
The dataset ipad contains tablet-based task performance measures, pupillography
features, blood cannabinoid metabolite concentrations, and cardiovascular
measures collected 40 minutes after smoking (or after a rest period for
controls). Each row is one participant at this timepoint. The identifier
id has been converted to a factor, and the data have been filtered to
timept = 2 only.
data(ipad)data(ipad)
A tibble, one row per participant at timepoint 2.
Variables are grouped below.
idParticipant identifier (factor).
timeptTimepoint indicator (fixed at 2 = 40 minutes).
use_groupParticipant use group: 1 = Daily, 2 = Occasional, 3 = No Use.
recent_smokeRecent use at this timepoint: 0 = No Use, 1 = Use.
t_thc, t_thc_oh, t_thc_cooh, t_thc_gluc,
t_thc_cooh_gluc, t_cbg, t_cbd, t_cbn,
t_mmr1, t_mmr2.
p_fpc1–p_fpc6 (functional pupil components 1–6);
p_PMC_pctChg (percent change at point of minimum constriction);
p_auc (AUC of the pupillary constriction curve).
i_prop_false_timeout,
i_prop_failed1, i_prop_failed2,
i_judgement_time1, i_judgement_time2,
i_time_outside_reticle, i_time_on_edge,
i_prop_hit, i_correct_reaction2,
i_reaction_time_max2, i_reaction_time2,
i_rep_shapes12, i_rep_shapes34,
i_memory_time12, i_memory_time34,
i_composite_score.
h_hr (heart rate), h_dbp (diastolic blood pressure),
h_sbp (systolic blood pressure).
Participants completed an iPad test assessing reaction time, decision making,
working memory, and spatial-motor performance before and after cannabis use
(or a rest period for controls). This dataset retains the post (40-minute)
measurements only. Consider converting use_group and
recent_smoke to factors with informative labels for modeling/plotting.
Units for biochemical and physiological variables follow the original source.
Smith, S. J., Wrobel, J., Brooks-Russell, A., Kosnett, M. J., & Sammel, M. D. (2023). A Latent Variable Analysis of Psychomotor and Neurocognitive Performance After Acute Cannabis Smoking. Cannabis (Albuquerque, N.M.), 6(2), 123–132. doi:10.26828/cannabis/2023/000156
data(ipad)data(ipad)
This function is an internal function for constructing SCBs for functional data.
mean_response_predict( data_df, object, fitted = TRUE, outcome, domain, subset = NULL, id )mean_response_predict( data_df, object, fitted = TRUE, outcome, domain, subset = NULL, id )
data_df |
A functional data frame that contains both name and values for
variables including functional outcome, domain (e.g. time) and ID (e.g. subject names)
used to fit the model |
object |
A fitted Function-on-Scalar Regression (FoSR) model object (e.g., from mgcv::gam()/mgcv::bam()). |
fitted |
Logical. Whether to estimate the simultaneous confidence bands for the fitted mean function or the fitted parameter function
Default is |
outcome |
A character string specifying the name of the outcome variable used in the model. |
domain |
A character string specifying the name of the domain variable (e.g. time) used in the model. |
subset |
An atomic character vector (e.g., c("user = 1", "age = 30"))
specifying the target function for constructing the SCB.
Each element must be of the form |
id |
A character string specifying the name of the ID variable. |
A list containing the following elements:
Numeric vector of sorted unique domain used for prediction
Data frame with prediction results, containing:
mean: Predicted mean values
se: Standard errors
Linear predictor matrix (design matrix) used for confidence interval calculations
Vector of model coefficients for selected group
Variance-covariance matrix corresponding to the selected group coefficients
Crainiceanu, C. M., Goldsmith, J., Leroux, A., & Cui, E. (2024). Functional Data Analysis with R. Chapman and Hall/CRC.
if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re") + s(id, by = Phi3, bs="re") + s(id, by = Phi4, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) results <- mean_response_predict(pupil_fpca, fosr_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") mean_mod <- mgcv::gam(percent_change ~ s(seconds, k = 5, bs = "cr") + s(seconds, by = use, k = 5, bs = "cr"), data = pupil, method = "REML") results <- mean_response_predict(pupil, mean_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") }if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re") + s(id, by = Phi3, bs="re") + s(id, by = Phi4, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) results <- mean_response_predict(pupil_fpca, fosr_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") mean_mod <- mgcv::gam(percent_change ~ s(seconds, k = 5, bs = "cr") + s(seconds, by = use, k = 5, bs = "cr"), data = pupil, method = "REML") results <- mean_response_predict(pupil, mean_mod, fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") }
Visualizes simultaneous confidence regions of upper and lower excursion sets for discrete, 1D or 2D data, using contour or band plots. Supports plotting confidence regions at multiple levels and labeling contours.
plot_cs( SCB, levels, type = "upper", x, y = NULL, mu_hat = NULL, mu_true = NULL, together = TRUE, xlab = "X1", ylab = "X2", level_label = TRUE, min.size = 5, palette = "gray", color_level_label = "black" )plot_cs( SCB, levels, type = "upper", x, y = NULL, mu_hat = NULL, mu_true = NULL, together = TRUE, xlab = "X1", ylab = "X2", level_label = TRUE, min.size = 5, palette = "gray", color_level_label = "black" )
SCB |
A numeric list returned by |
levels |
A numeric vector or list of scalers for different levels or matrix
containing interval sets to construct the confidence regions.
If |
type |
A character specifying the type of inverse sets to fit.
Choices are |
x |
A numerical vector of x-axis coordinates for 1D and 2D cases.
For discrete coordinates, use a character vector.
The order of x should correspond to the order of |
y |
Optional vector of y-axis coordinates for 2D data. |
mu_hat |
A numeric array (1D) or matrix (2D) of estimated means.
If |
mu_true |
Optional numeric array (1D) or matrix (2D) of true means,
which overrides |
together |
Optional logical value for plotting option.
If |
xlab |
Optional character for the label of the x-axis. Default is |
ylab |
Optional character for the label of the y-axis. Default is |
level_label |
Optional logical input for level displaying option.
If |
min.size |
Optional logical input for minimum number of points
required for a contour to be labeled. Default is |
palette |
Optional character value for the name of the HCL color palette
to use when plotting multiple levels together. Default is |
color_level_label |
Optional character value for the color used
for contour level labels. Default is |
A ggplot2 object that includes both simultaneous confidence intervals
and simultaneous confidence region of excursion sets corresponding to levels assigned.
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
if (requireNamespace("mgcv", quietly = TRUE)) { # example using pupil data data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re") + s(id, by = Phi3, bs="re") + s(id, by = Phi4, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) pupil_multiplier <- SCB_functional_outcome(data = pupil_fpca, object = fosr_mod, method = "multiplier", outcome = "percent_change", domain = "seconds", subset= c("use = 1"), id = "id") pupil_multiplier <- tibble::as_tibble(pupil_multiplier) plot_cs(pupil_multiplier,levels = c(-18), x = pupil_multiplier$domain, mu_hat = pupil_multiplier$mu_hat, xlab = "", ylab = "", level_label = T, min.size = 40, palette = "Spectral", color_level_label = "black") } x <- rnorm(50) epsilon <- rnorm(50,0,sqrt(2)) y <- -1 + x + epsilon df <- data.frame(x = x, y = y) grid <- data.frame(x = seq(-1, 1, length.out = 50)) model <- "y ~ x" results <- SCB_linear_outcome(df_fit = df, model = model, grid_df = grid) results <- tibble::as_tibble(results) plot_cs(results, levels = c(0), x = seq(-1, 1, length.out = 50), mu_hat = results$Mean, xlab = "x1", ylab = "y", level_label = T, min.size = 40, palette = "Spectral", color_level_label = "black")if (requireNamespace("mgcv", quietly = TRUE)) { # example using pupil data data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re") + s(id, by = Phi3, bs="re") + s(id, by = Phi4, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) pupil_multiplier <- SCB_functional_outcome(data = pupil_fpca, object = fosr_mod, method = "multiplier", outcome = "percent_change", domain = "seconds", subset= c("use = 1"), id = "id") pupil_multiplier <- tibble::as_tibble(pupil_multiplier) plot_cs(pupil_multiplier,levels = c(-18), x = pupil_multiplier$domain, mu_hat = pupil_multiplier$mu_hat, xlab = "", ylab = "", level_label = T, min.size = 40, palette = "Spectral", color_level_label = "black") } x <- rnorm(50) epsilon <- rnorm(50,0,sqrt(2)) y <- -1 + x + epsilon df <- data.frame(x = x, y = y) grid <- data.frame(x = seq(-1, 1, length.out = 50)) model <- "y ~ x" results <- SCB_linear_outcome(df_fit = df, model = model, grid_df = grid) results <- tibble::as_tibble(results) plot_cs(results, levels = c(0), x = seq(-1, 1, length.out = 50), mu_hat = results$Mean, xlab = "x1", ylab = "y", level_label = T, min.size = 40, palette = "Spectral", color_level_label = "black")
Processes data by fitting a mean GAM model, extracting residuals, performing FPCA, and merging the results to create an enhanced dataset for functional regression analysis.
prepare_pupil_fpca(input_data, k_mean = 30, k_fpca = 15, example = "original")prepare_pupil_fpca(input_data, k_mean = 30, k_fpca = 15, example = "original")
input_data |
Raw pupil data |
k_mean |
Number of basis functions for mean model smooth terms (default: 30) |
k_fpca |
Number of knots for FPCA estimation (default: 15) |
example |
Choice for different model. If |
A tibble containing:
Original pupil variables
FPCA eigenfunctions (Phi1, Phi2,...)
Sorted by ID and domain
if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) processed_data <- prepare_pupil_fpca(pupil) processed_data <- prepare_pupil_fpca(pupil, k_mean = 5, k_fpca = 5) }if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) processed_data <- prepare_pupil_fpca(pupil) processed_data <- prepare_pupil_fpca(pupil, k_mean = 5, k_fpca = 5) }
Dataset contains functional observation of pupil size percent change after a light stimulus. Participants in the cannabis use group smoked cannabis flower or concentrate 40 minutes prior to the pupillometry measurement. Goal of this data is to understand differences in pupil response to light driven by acute cannabis users. Measurements were collected on the right eye.
data(pupil)data(pupil)
A tibble with 15000 rows and 10 variables:
Factor. Subject identifier (127 unique levels).
Character. Original usage group classification (e.g., "Daily - Flower", "No Use").
Numeric. Binary indicator of cannabis use 40 minute prior to the light stimulus. (1 = user, 0 = non-user)
Integer. Subject's age.
Numeric. Binary indicator of subject's gender: 1 = Female, 0 = Male.
Numeric. Body Mass Index.
Numeric. Alcohol use score.
Numeric. Time in seconds since light stimulus.
Numeric. Percent change relative to baseline.
Numeric. Percent change in the outcome of interest.
Processed from data-raw/pupil_load.R using the readr and dplyr packages.
Godbole, S., Leroux, A., Brooks-Russell, A., Subramanian, P. S., Kosnett, M. J., & Wrobel, J. (2024). A Study of Pupil Response to Light as a Digital Biomarker of Recent Cannabis Use. Digital biomarkers, 8(1), 83–92. doi:10.1159/000538561
This function builds simultaneous confidence bands through parametric and bootstrap approaches.
SCB_functional_outcome( data_df, object = NULL, method, fitted = TRUE, alpha = 0.05, outcome, domain, subset = NULL, id, nboot = NULL, method_SD = "t", weights = "rademacher" )SCB_functional_outcome( data_df, object = NULL, method, fitted = TRUE, alpha = 0.05, outcome, domain, subset = NULL, id, nboot = NULL, method_SD = "t", weights = "rademacher" )
data_df |
A functional data frame that contains both name and values for
variables including functional outcome, domain (e.g. time) and ID (e.g. subject names)
used to fit the model |
object |
A fitted Function-on-Scalar Regression (FoSR) object
(e.g., from mgcv::gam()/mgcv::bam()). Default is |
method |
A character string specifying the approach:
|
fitted |
Logical. Whether to estimate the simultaneous confidence bands for the fitted mean function or the fitted parameter function
Default is |
alpha |
Significance level for SCB. Default is 0.05. |
outcome |
A character string specifying the name of the outcome variable used in the model. |
domain |
A character string specifying the name of the domain variable (e.g. time) used in the model. |
subset |
An atomic character vector (e.g., c("user = 1", "age = 30"))
specifying the target function for constructing the SCB.
Each element must be of the form |
id |
A character string specifying the name of the ID variable. |
nboot |
An integer specifying the number of bootstrap samples used to construct the confidence bands. Default is 10,000 for cma, 5000 for Multiplier Bootstrap. |
method_SD |
Method for SD estimation: "t" or "regular". Default is "t". |
weights |
Multiplier type: "rademacher", "gaussian", or "mammen". Default is "rademacher". |
A list containing:
Estimated mean function for the group of interest.
The domain used.
Standard errors of the estimated means.
Lower bound of the simultaneous confidence band.
Upper bound of the simultaneous confidence band.
A character description of the output type.
# example using pupil data if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) # CMA approach results <- SCB_functional_outcome(data_df = pupil, object = fosr_mod, method = "cma", fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") # multiplier bootstrap results <- SCB_functional_outcome(data_df = pupil, object = fosr_mod, method = "multiplier", fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") mean_mod <- mgcv::gam(percent_change ~ s(seconds, k = 5, bs = "cr") + s(seconds, by = use, k = 5, bs = "cr"), data = pupil, method = "REML") # multiplier bootstrap pupil_multiplier <- SCB_functional_outcome(data = pupil, object = mean_mod, method = "multiplier", outcome = "percent_change", domain = "seconds", subset= c("use = 1"), id = "id") }# example using pupil data if (requireNamespace("mgcv", quietly = TRUE)) { data(pupil) pupil_fpca <- prepare_pupil_fpca(pupil) fosr_mod <- mgcv::bam(percent_change ~ s(seconds, k=30, bs="cr") + s(seconds, by = use, k=30, bs = "cr") + s(id, by = Phi1, bs="re") + s(id, by = Phi2, bs="re"), method = "fREML", data = pupil_fpca, discrete = TRUE) # CMA approach results <- SCB_functional_outcome(data_df = pupil, object = fosr_mod, method = "cma", fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") # multiplier bootstrap results <- SCB_functional_outcome(data_df = pupil, object = fosr_mod, method = "multiplier", fitted = TRUE, outcome = "percent_change", domain = "seconds", subset = c("use = 1"), id = "id") mean_mod <- mgcv::gam(percent_change ~ s(seconds, k = 5, bs = "cr") + s(seconds, by = use, k = 5, bs = "cr"), data = pupil, method = "REML") # multiplier bootstrap pupil_multiplier <- SCB_functional_outcome(data = pupil, object = mean_mod, method = "multiplier", outcome = "percent_change", domain = "seconds", subset= c("use = 1"), id = "id") }
Construct Simultaneous Confidence Bands for a Spatial Generalized Least Squares Model
SCB_gls_geospatial( sp_list, level = NULL, data_fit = NULL, w = NULL, correlation = NULL, corpar = NULL, groups = NULL, V = NULL, alpha = 0.1, N = 1000, mask = NULL )SCB_gls_geospatial( sp_list, level = NULL, data_fit = NULL, w = NULL, correlation = NULL, corpar = NULL, groups = NULL, V = NULL, alpha = 0.1, N = 1000, mask = NULL )
sp_list |
A list containing the spatial coordinates and the observations. Should include the following components:
|
level |
A optional numeric threshold value used to test whether the estimated mean surface significantly deviates from it. Default is NULL. |
data_fit |
A design matrix used to fit the generalized least squares
(GLS) model. Each row corresponds to an observation, and each column to a covariate
to be included in the model. Outcome/observation should not be included.
The first column is typically an intercept column,
which will contain only 1s, if an intercept is included in the model.
Categorical variables in |
w |
A numeric vector specifying the target function for constructing the SCB,
by giving a linear combination of the regression coefficients in the GLS model.
Default is |
correlation |
A character string specifying the name of
the correlation structure (e.g., |
corpar |
A list containing parameters to be passed to
the correlation structure function specified in |
groups |
A vector of group identifiers used to define
the within-group correlation structure (e.g., repeated measures, time blocks).
If not specified, defaults to |
V |
An optional array of known covariance matrices of
shape |
alpha |
A numerical value specifying the confidence level
for the Simultaneous Confidence Bands. Defalut is |
N |
An integer specifying the number of bootstrap samples to
construct the Simultaneous Confidence Bands. Default is |
mask |
An optional logical matrix same dimensions as
|
A list containing the following components:
scb_upA matrix of upper bounds for
the simultaneous confidence bands at each spatial location
corresponding to the target function specified by w.
scb_lowA matrix of lower bounds for
the simultaneous confidence bands at each spatial location
corresponding to the target function specified by w.
mu_hatA matrix of estimated mean values at each spatial location
corresponding to the target function specified by w.
norm_estA matrix of standardized test statistics (mu_hat - level) / SE.
thresThe bootstrap threshold used to construct the confidence bands.
xThe vector of x-coordinates corresponding to the columns of the spatial grid.
yThe vector of y-coordinates corresponding to the rows of the spatial grid.
Sommerfeld, M., Sain, S., & Schwartzman, A. (2018). Confidence regions for spatial excursion sets from repeated random field observations, with an application to climate. Journal of the American Statistical Association, 113(523), 1327–1340. doi:10.1080/01621459.2017.1341838
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
data(climate_data) # Construct confidence sets for the increase of the mean temperature (June-August) # in North America between the 20th and 21st centuries temp = SCB_gls_geospatial(sp_list = climate_data$Z, level = 2, data_fit = climate_data$X, w = c(1,0,0,0), correlation = climate_data$correlation, mask = climate_data$mask, alpha = 0.1) example_list <- list(x = climate_data$Z$x[50:60], y = climate_data$Z$y[40:50], obs = climate_data$Z$obs[50:60, 40:50,]) temp = SCB_gls_geospatial(sp_list = example_list, level = 2, data_fit = climate_data$X, w = c(1,0,0,0), correlation = NULL, mask = NULL, alpha = 0.1, N = 50)data(climate_data) # Construct confidence sets for the increase of the mean temperature (June-August) # in North America between the 20th and 21st centuries temp = SCB_gls_geospatial(sp_list = climate_data$Z, level = 2, data_fit = climate_data$X, w = c(1,0,0,0), correlation = climate_data$correlation, mask = climate_data$mask, alpha = 0.1) example_list <- list(x = climate_data$Z$x[50:60], y = climate_data$Z$y[40:50], obs = climate_data$Z$obs[50:60, 40:50,]) temp = SCB_gls_geospatial(sp_list = example_list, level = 2, data_fit = climate_data$X, w = c(1,0,0,0), correlation = NULL, mask = NULL, alpha = 0.1, N = 50)
This function fits a linear model and constructs simultaneous confidence bands (SCB) using a non-parametric bootstrap method for the mean outcome of regression on a fixed test set design matrix
SCB_linear_outcome( df_fit, model, grid_df = NULL, n_boot = 1000, alpha = 0.05, grid_df_boot = NULL )SCB_linear_outcome( df_fit, model, grid_df = NULL, n_boot = 1000, alpha = 0.05, grid_df_boot = NULL )
df_fit |
A data frame containing the training design matrix used to fit the linear model. Acceptable input format includes numeric and factor. |
model |
A character string representing the formula for the linear model
(e.g., |
grid_df |
A data frame specifying the covariate settings that define the
mean outcome for which simultaneous confidence bands (SCB) are constructed.
Each row represents one covariate combination at which predictions and
SCBs are evaluated. Column names should match variables in the fitted model,
but |
n_boot |
Number of bootstrap samples used in the non-parametric bootstrap procedure to generate the empirical distribution. Default is 1000. |
alpha |
Significance level for the confidence band (e.g., 0.05 for 95% confidence). Default is 0.05. |
grid_df_boot |
An optional data frame specifying the input grid at which
predictions are evaluated during bootstrap resampling.
This allows SCBs to be constructed on a denser set of covariate values
if desired. If NULL, uses |
A data frame with the following columns:
Lower bound of the simultaneous confidence band.
Predicted mean response from the fitted model.
Upper bound of the simultaneous confidence band.
All columns from grid_df, representing the prediction grid.
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
set.seed(262) x1 <- rnorm(100) epsilon <- rnorm(100,0,sqrt(2)) y <- -1 + x1 + epsilon df <- data.frame(x1 = x1, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100)) model <- "y ~ x1" results <- SCB_linear_outcome(df_fit = df, model = model, grid_df = grid, n_boot = 100)set.seed(262) x1 <- rnorm(100) epsilon <- rnorm(100,0,sqrt(2)) y <- -1 + x1 + epsilon df <- data.frame(x1 = x1, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100)) model <- "y ~ x1" results <- SCB_linear_outcome(df_fit = df, model = model, grid_df = grid, n_boot = 100)
This function fits a logistic regression model and constructs simultaneous confidence bands (SCB) using a non-parametric bootstrap method for the mean outcome of regression on a fixed test set design matrix
SCB_logistic_outcome( df_fit, model, grid_df = NULL, n_boot = 1000, alpha = 0.05 )SCB_logistic_outcome( df_fit, model, grid_df = NULL, n_boot = 1000, alpha = 0.05 )
df_fit |
A data frame containing the training design matrix used to fit the logistic model. |
model |
A character string representing the formula for the logistic model (e.g., |
grid_df |
A data frame specifying the covariate settings that define the
mean outcome for which simultaneous confidence bands (SCB) are constructed.
Each row represents one covariate combination at which predictions and
SCBs are evaluated. Column names should match variables in the fitted model,
but |
n_boot |
Number of bootstrap samples used in the non-parametric bootstrap procedure to generate the empirical distribution. Default is 1000. |
alpha |
Significance level for the confidence band (e.g., 0.05 for 95% confidence). Default is 0.05. |
A data frame with the following columns:
Lower bound of the simultaneous confidence band.
Predicted mean response from the fitted model.
Upper bound of the simultaneous confidence band.
All columns from grid_df, representing the prediction grid.
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
set.seed(262) x1 <- rnorm(100) mu <- -1 + x1 p <- expit(mu) y <- rbinom(100, size = 1, prob = p) df <- data.frame(x1 = x1, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100)) model <- "y ~ x1" results <- SCB_logistic_outcome(df_fit = df, model = model, grid_df = grid, n_boot = 100)set.seed(262) x1 <- rnorm(100) mu <- -1 + x1 p <- expit(mu) y <- rbinom(100, size = 1, prob = p) df <- data.frame(x1 = x1, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100)) model <- "y ~ x1" results <- SCB_logistic_outcome(df_fit = df, model = model, grid_df = grid, n_boot = 100)
This function fits either a linear or logistic regression model and computes simultaneous confidence bands (SCBs) for the model coefficients using a non-parametric bootstrap procedure.
SCB_regression_coef( df_fit, model, n_boot = 5000, alpha = 0.05, type = "linear" )SCB_regression_coef( df_fit, model, n_boot = 5000, alpha = 0.05, type = "linear" )
df_fit |
A data frame containing the design matrix and response variable used to fit the model. |
model |
A character string specifying the regression formula (e.g., |
n_boot |
Integer. Number of bootstrap samples to use for constructing the SCBs. Default is 5000. |
alpha |
Numeric. Significance level for the confidence bands (e.g., 0.05 for 95% SCBs). Default is 0.05. |
type |
A character string specifying the model type. Either |
A data frame with the following columns:
Lower bound of the simultaneous confidence band. The first row corresponds to the intercept, and subsequent rows correspond to regression coefficients.
Estimated values. The first element is the intercept estimate, and the remaining are coefficient estimates.
Upper bound of the simultaneous confidence band. The first row corresponds to the intercept, and subsequent rows correspond to regression coefficients.
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
library(MASS) set.seed(262) M <- 5 rho <- 0.4 n <- 100 beta <- rnorm(M, mean = 0, sd = 1) Sigma <- outer(1:M, 1:M, function(i, j) rho^abs(i - j)) X <- MASS::mvrnorm(n = n, mu = rep(0, M), Sigma = Sigma) epsilon <- rnorm(n, mean = 0, sd = 1) y <- X %*% beta + epsilon df <- as.data.frame(X) names(df) <- paste0("x", 1:M) df$y <- as.vector(y) model <- "y ~ ." results <- SCB_regression_coef(df, model, n_boot = 100)library(MASS) set.seed(262) M <- 5 rho <- 0.4 n <- 100 beta <- rnorm(M, mean = 0, sd = 1) Sigma <- outer(1:M, 1:M, function(i, j) rho^abs(i - j)) X <- MASS::mvrnorm(n = n, mu = rep(0, M), Sigma = Sigma) epsilon <- rnorm(n, mean = 0, sd = 1) y <- X %*% beta + epsilon df <- as.data.frame(X) names(df) <- paste0("x", 1:M) df$y <- as.vector(y) model <- "y ~ ." results <- SCB_regression_coef(df, model, n_boot = 100)
This function fits a user-specified regression model (linear or logistic), and constructs simultaneous confidence bands (SCBs) for either the mean outcome or the regression coefficients. SCBs are obtained using a nonparametric bootstrap procedure evaluated on a fixed test design matrix, providing simultaneous inference across the entire range of covariates or parameters of interest.
SCB_regression_outcome( df_fit, model, grid_df = NULL, n_boot = 1000, alpha = 0.05, grid_df_boot = NULL, type = "linear", fitted = TRUE, w = NULL )SCB_regression_outcome( df_fit, model, grid_df = NULL, n_boot = 1000, alpha = 0.05, grid_df_boot = NULL, type = "linear", fitted = TRUE, w = NULL )
df_fit |
A data frame containing the training design matrix used to fit the linear model. Acceptable input format includes numeric and factor. |
model |
A character string representing the formula for the linear model
(e.g., |
grid_df |
A data frame specifying the covariate settings that define the
mean outcome or linear combination for which simultaneous confidence bands (SCB)
are constructed.
Each row represents one covariate combination at which predictions and
SCBs are evaluated. Column names should match variables in the fitted model,
but |
n_boot |
Number of bootstrap samples used in the non-parametric bootstrap procedure to generate the empirical distribution. Default is 1000. |
alpha |
Significance level for the confidence band (e.g., 0.05 for 95% confidence). Default is 0.05. |
grid_df_boot |
An optional data frame specifying the input grid at which
predictions are evaluated during bootstrap resampling.
This allows SCBs to be constructed on a denser set of covariate values
if desired. If NULL, uses |
type |
A character string specifying the model type. Either |
fitted |
Logical. Whether to estimate the simultaneous confidence bands for regression outcomes or coefficients.
Default is |
w |
A numeric matrix that specifies the linear combinations of regression coefficients
for which simultaneous confidence bands (SCBs) are to be constructed.
The number of columns should be equal to the number of coefficients in the
regression model fitted.
Default is NULL, will return SCBs for all coefficients and the intercept.
This argument is only for |
A data frame with the following columns:
Lower bound of the simultaneous confidence band.
Predicted mean response from the fitted model.
Upper bound of the simultaneous confidence band.
All columns from grid_df, representing the prediction grid.
Optional, if fitted = TRUE
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
set.seed(262) x1 <- rnorm(100) x2 <- rnorm(100) epsilon <- rnorm(100,0,sqrt(2)) y <- -1 + x1 + x2 + epsilon df <- data.frame(x1 = x1, x2 = x2, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100), x2 = seq(-1, 1, length.out = 100)) model <- "y ~ x1 + x2" w <- matrix(c(0, 1, 1), ncol = 3) results <- SCB_regression_outcome(df_fit = df, model = model, grid_df = grid, w = w, n_boot = 100)set.seed(262) x1 <- rnorm(100) x2 <- rnorm(100) epsilon <- rnorm(100,0,sqrt(2)) y <- -1 + x1 + x2 + epsilon df <- data.frame(x1 = x1, x2 = x2, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100), x2 = seq(-1, 1, length.out = 100)) model <- "y ~ x1 + x2" w <- matrix(c(0, 1, 1), ncol = 3) results <- SCB_regression_outcome(df_fit = df, model = model, grid_df = grid, w = w, n_boot = 100)
This function constructs simultaneous confidence regions (SCRs) for upper and lower excursion sets, and interval sets from simultaneous confidence bands (SCBs). It allows estimation of inner and outer confidence region under single or multiple thresholds. Visualization of the confidence region is also included, along with a containment check for the coverage of true or estimated functions.
scb_to_cs( scb_up, scb_low, levels, true_mean = NULL, est_mean = NULL, x1 = NULL, x2 = NULL, type = "upper", return_contain_only = FALSE, return_plot = FALSE, xlab = NULL, ylab = NULL )scb_to_cs( scb_up, scb_low, levels, true_mean = NULL, est_mean = NULL, x1 = NULL, x2 = NULL, type = "upper", return_contain_only = FALSE, return_plot = FALSE, xlab = NULL, ylab = NULL )
scb_up |
A numeric vector (1D) or matrix (2D) containing the upper simultaneous confidence interval. |
scb_low |
A numeric vector (1D) or matrix (2D) containing
the lower bounds of the simultaneous confidence bands.
Dimensions of |
levels |
A numeric vector or list of scalers for different levels or matrix
containing interval sets to construct the confidence sets.
If |
true_mean |
Optional matrix of the true mean function.
Should have the same dimension as |
est_mean |
Optional matrix of the estimated mean function,
used for plotting if |
x1 |
A numeric vector of coordinates for the first dimension used
for plotting the inner and outer confidence region. Default is NULL.
Dimension of |
x2 |
A numeric vector of coordinates for the second dimension used
for plotting inner and outer confidence region. Default is NULL.
Dimension of |
type |
A character string specifying the type of inverse set to construct
if levels are not a matrix. Choices are |
return_contain_only |
Logical. If |
return_plot |
Logical. If |
xlab |
A character for the name of the x axis used for plotting the inner and outer confidence region. Default is NULL. |
ylab |
A character for the name of the y axis used for plotting the inner and outer confidence region. Default is NULL. |
A list containing the following components:
A vector (or list) of threshold levels used to define the
confidence sets. Same as the input levels.
(Optional) A list of logical matrices indicating whether each
point is within the simultaneous inner confidence set for each level.
Returned only when return_contain_only = FALSE and type != "two-sided".
(Optional) A list of logical matrices indicating whether each
point is within the simultaneous outer confidence set for each level.
Returned only when return_contain_only = FALSE and type != "two-sided".
(Two-sided only) A list of logical matrices
indicating lower bound containment (for type = "two-sided" and return_contain_only = FALSE).
(Two-sided only) A list of logical matrices
indicating upper bound containment (for type = "two-sided" and return_contain_only = FALSE).
A logical vector indicating
whether the true mean is fully contained within each level's simultaneous
inner and outer confidence region. Returned only if true_mean is provided.
A single logical value indicating whether the true mean
is contained in all levels' simultaneous inner and outer confidence region.
Returned only if true_mean is provided.
(Optional) A list of ggplot2 objects for visualizing the
SCBs and simultaneous confidence region across all levels,
returned when return_plot = TRUE. Includes both a combined plot and
individual plots per level.
Ren, J., Telschow, F. J. E., & Schwartzman, A. (2024). Inverse set estimation and inversion of simultaneous confidence intervals. Journal of the Royal Statistical Society: Series C (Applied Statistics), 73(4), 1082–1109. doi:10.1093/jrsssc/qlae027
set.seed(262) x1 <- rnorm(100) x2 <- rnorm(100) y <- -1 + x1 - 0.5 * x2 + rnorm(100,0,sqrt(2)) df <- data.frame(x1 = x1, x2 = x2, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100), x2 = seq(-1, 1, length.out = 100)) model <- "y ~ x1 + x2 " result <- SCB_linear_outcome(df_fit = df, model = model, grid_df = grid, n_boot = 100) scb_to_cs(result$scb_up, result$scb_low, c(-1, -0.5, 0.5, 1), x1 = grid$x1, x2 = grid$x2, est_mean = results$Mean)set.seed(262) x1 <- rnorm(100) x2 <- rnorm(100) y <- -1 + x1 - 0.5 * x2 + rnorm(100,0,sqrt(2)) df <- data.frame(x1 = x1, x2 = x2, y = y) grid <- data.frame(x1 = seq(-1, 1, length.out = 100), x2 = seq(-1, 1, length.out = 100)) model <- "y ~ x1 + x2 " result <- SCB_linear_outcome(df_fit = df, model = model, grid_df = grid, n_boot = 100) scb_to_cs(result$scb_up, result$scb_low, c(-1, -0.5, 0.5, 1), x1 = grid$x1, x2 = grid$x2, est_mean = results$Mean)