The fairadapt() function performs data adaptation, but does so only once. Sometimes, it might be desirable to repeat this process, in order to be able to make uncertainty estimates about the data adaptation that is performed. The wrapper function fairadaptBoot() enables the user to do so, by performing the fairadapt() procedure multiple times, and keeping in memory the important multiple data transformations. For a worked example of how to use fairadaptBoot() for uncertainty quantification, see the fairadapt vignette.

fairadaptBoot(
  formula,
  prot.attr,
  adj.mat,
  train.data,
  test.data = NULL,
  cfd.mat = NULL,
  top.ord = NULL,
  res.vars = NULL,
  quant.method = rangerQuants,
  keep.object = FALSE,
  n.boot = 100,
  rand.mode = c("finsamp", "quant", "both"),
  test.seed = 2022,
  ...
)

Arguments

formula

Object of class formula describing the response and the covariates.

prot.attr

A value of class character describing the binary protected attribute. Must be one of the entries of colnames(adj.mat).

adj.mat

Matrix of class matrix encoding the relationships in the causal graph. M[i,j] == 1L implies the existence of an edge from node i to node j. Must include all the variables appearing in the formula object. When the adj.mat argument is set to NULL, then the top.ord argument has to be supplied.

train.data, test.data

Training data & testing data, both of class data.frame. Test data is by default NULL.

cfd.mat

Symmetric matrix of class matrix encoding the bidirected edges in the causal graph. M[i,j] == M[j, i] == 1L implies the existence of a bidirected edge between nodes i and j. Must include all the variables appearing in the formula object.

top.ord

A vector of class character describing the topological ordering of the causal graph. Default value is NULL, but this argument must be supplied if adj.mat is not specified. Also must include all the variables appearing in the formula object.

res.vars

A vector of class character listing all the resolving variables, which should not be changed by the adaption procedure. Default value is NULL, corresponding to no resolving variables. Resolving variables should be a subset of the descendants of the protected attribute.

quant.method

A function choosing the method used for quantile regression. Default value is rangerQuants (using random forest quantile regression). Other implemented options are linearQuants and mcqrnnQuants. A custom function can be supplied by the user here, and the associated method for the S3 generic computeQuants needs to be added.

keep.object

a logical scalar, indicating whether all the fairadapt S3 objects built in bootstrap repetitions should be saved.

n.boot

An integer corresponding to the umber of bootstrap iterations.

rand.mode

A string, taking values "finsamp", "quant" or "both", corresponding to considering finite sample uncertainty, quantile uncertainty, or both.

test.seed

a seed for the randomness in breaking quantiles for the discrete variables. This argument is only relevant when rand.mode equals "quant" or "both" (otherwise ignored).

...

Additional arguments forwarded to the function passed as quant.method.

Value

An object of class fairadaptBoot, containing the original and adapted training and testing data, together with the causal graph and some additional meta-information.

References

Plecko, D. & Meinshausen, N. (2019). Fair Data Adaptation with Quantile Preservation

Examples

n_samp <- 200
uni_dim <- c(       "gender", "edu", "test", "score")
uni_adj <- matrix(c(       0,     1,      1,       0,
                           0,     0,      1,       1,
                           0,     0,      0,       1,
                           0,     0,      0,       0),
                  ncol = length(uni_dim),
                  dimnames = rep(list(uni_dim), 2),
                  byrow = TRUE)

uni_ada <- fairadaptBoot(score ~ .,
  train.data = head(uni_admission, n = n_samp),
  test.data = tail(uni_admission, n = n_samp),
  adj.mat = uni_adj,
  prot.attr = "gender",
  n.boot = 5
)

uni_ada
#> 
#> Call:
#> fairadaptBoot(formula = score ~ ., prot.attr = "gender", adj.mat = uni_adj, 
#>     train.data = head(uni_admission, n = n_samp), test.data = tail(uni_admission, 
#>         n = n_samp), n.boot = 5)
#> 
#> Bootstrap repetitions: 5 
#> 
#> Adapting variables:
#>   edu, test, score
#> 
#> Based on protected attribute gender 
#> 
#>   AND
#> 
#> Based on causal graph:
#>        gender edu test score
#> gender      0   1    1     0
#> edu         0   0    1     1
#> test        0   0    0     1
#> score       0   0    0     0
#>