Fitting vine copula models

Automated fitting and model selection for vine copula models with continuous or discrete data. Selection of the structure is performed using the algorithm of Dissmann et al. (2013).

Usage

vinecop(
  data,
  var_types = rep("c", NCOL(data)),
  family_set = "all",
  structure = NA,
  par_method = "mle",
  nonpar_method = "constant",
  mult = 1,
  selcrit = "aic",
  weights = numeric(),
  psi0 = 0.9,
  presel = TRUE,
  allow_rotations = TRUE,
  trunc_lvl = Inf,
  tree_crit = "tau",
  threshold = 0,
  keep_data = FALSE,
  vinecop_object = NULL,
  show_trace = FALSE,
  cores = 1,
  tree_algorithm = "mst_prim"
)

Arguments

data: a matrix or data.frame with at least two columns, containing the (pseudo-)observations for the two variables (copula data should have approximately uniform margins). More columns are required for discrete models, see Details.
var_types: variable types, a length d vector; e.g., c("c", "c") for two continuous variables, or c("c", "d") for first variable continuous and second discrete.
family_set: a character vector of families; see bicop() for additional options.
structure: an rvine_structure object, namely a compressed representation of the vine structure, or an object that can be coerced into one (see rvine_structure() and as_rvine_structure()). The dimension must be length(pair_copulas[[1]]) + 1; structure = NA performs automatic selection based on Dissman's algorithm. See Details for partial selection of the structure.
par_method: the estimation method for parametric models, either "mle" for maximum likelihood or "itau" for inversion of Kendall's tau (only available for one-parameter families and "t".
nonpar_method: the estimation method for nonparametric models, either "constant" for the standard transformation estimator, or "linear"/"quadratic" for the local-likelihood approximations of order one/two.
mult: multiplier for the smoothing parameters of nonparametric families. Values larger than 1 make the estimate more smooth, values less than 1 less smooth.
selcrit: criterion for family selection, either "loglik", "aic", "bic", "mbic". For vinecop() there is the additional option "mbicv".
weights: optional vector of weights for each observation.
psi0: prior probability of a non-independence copula (only used for selcrit = "mbic" and selcrit = "mbicv").
presel: whether the family set should be thinned out according to symmetry characteristics of the data.
allow_rotations: whether to allow rotations of the copula.
trunc_lvl: the truncation level of the vine copula; Inf means no truncation, NA indicates that the truncation level should be selected automatically by mBICV().
tree_crit: the criterion for tree selection, one of "tau", "rho", "hoeffd", "mcor", or "joe" for Kendall's \(\tau\), Spearman's \(\rho\), Hoeffding's \(D\), maximum correlation, or logarithm of the partial correlation, respectively.
threshold: for thresholded vine copulas; NA indicates that the threshold should be selected automatically by mBICV().
keep_data: whether the data should be stored (necessary for using fitted()).
vinecop_object: a vinecop object to be updated; if provided, only the parameters are fit; structure and families are kept the same.
show_trace: logical; whether a trace of the fitting progress should be printed.
cores: number of cores to use; if more than 1, estimation of pair copulas within a tree is done in parallel.
tree_algorithm: The algorithm for building the spanning tree ("mst_prim", "mst_kruskal", "random_weighted", or "random_unweighted") during the tree-wise structure selection. "mst_prim" and "mst_kruskal" use Prim's and Kruskal's algorithms respectively to select the maximum spanning tree, maximizing the sum of the edge weights (i.e., tree_criterion). "random_weighted" and "random_unweighted" use Wilson's algorithm to generate a random spanning tree, either with probability proportional to the product of the edge weights (weighted) or uniformly (unweighted).

Value

Objects inheriting from vinecop and vinecop_dist for vinecop(). In addition to the entries provided by vinecop_dist(), there are:

threshold, the (set or estimated) threshold used for thresholding the vine.
data (optionally, if keep_data = TRUE was used), the dataset that was passed to vinecop().
controls, a list with fit controls that was passed to vinecop().
nobs, the number of observations that were used to fit the model.

Details

Missing data

If there are missing data (i.e., NA entries), incomplete observations are discarded before fitting a pair-copula. This is done on a pair-by-pair basis so that the maximal available information is used.

Discrete variables

The dependence measures used to select trees (default: Kendall's tau) are corrected for ties (see wdm::wdm).

Let n be the number of observations and d the number of variables. When at least one variable is discrete, two types of "observations" are required in data: the first n x d block contains realizations of \(F_{X_j}(X_j)\). The second n x d block contains realizations of \(F_{X_j}(X_j^-)\). The minus indicates a left-sided limit of the cdf. For, e.g., an integer-valued variable, it holds \(F_{X_j}(X_j^-) = F_{X_j}(X_j - 1)\). For continuous variables the left limit and the cdf itself coincide. Respective columns can be omitted in the second block.

Structure selection

Selection of the structure is performed using the algorithm of Dissmann, J. F., E. C. Brechmann, C. Czado, and D. Kurowicka (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis, 59 (1), 52-69. The dependence measure used to select trees (default: Kendall's tau) is corrected for ties and can be changed using the tree_criterion argument, which can be set to "tau", "rho" or "hoeffd". Both Prim's (default: "mst_prim") and Kruskal's ()"mst_kruskal") algorithms are available through tree_algorithm to set the maximum spanning tree selection algorithm. An alternative to the maximum spanning tree selection is to use random spanning trees, which can be selected using controls.tree_algorithm and come in two flavors, both using Wilson's algorithm loop erased random walks:

"random_weighted"` generates a random spanning tree with probability proportional to the product of the weights (i.e., the dependence) of the edges in the tree.
"random_unweighted"` generates a random spanning tree uniformly over all spanning trees satisfying the proximity condition.

Partial structure selection

It is possible to fix the vine structure only in the first trees and select the remaining ones automatically. To specify only the first k trees, supply a k-truncated rvine_structure() or rvine_matrix(). All trees up to trunc_lvl will then be selected automatically.

References

Dissmann, J. F., E. C. Brechmann, C. Czado, and D. Kurowicka (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis, 59 (1), 52-69.

Examples

## simulate dummy data
x <- rnorm(30) * matrix(1, 30, 5) + 0.5 * matrix(rnorm(30 * 5), 30, 5)
u <- pseudo_obs(x)

## fit and select the model structure, family and parameters
fit <- vinecop(u)
summary(fit)
#> # A data.frame: 10 x 11 
#>  tree edge conditioned conditioning var_types   family rotation
#>     1    1        3, 4                    c,c      bb7      180
#>     1    2        4, 1                    c,c   gumbel        0
#>     1    3        2, 1                    c,c      bb7        0
#>     1    4        1, 5                    c,c   gumbel        0
#>     2    1        3, 1            4       c,c gaussian        0
#>     2    2        4, 2            1       c,c     tawn        0
#>     2    3        2, 5            1       c,c     tawn      180
#>     3    1        3, 2         1, 4       c,c      joe        0
#>     3    2        4, 5         2, 1       c,c    indep        0
#>     4    1        3, 5      2, 1, 4       c,c    indep        0
#>        parameters df  tau loglik
#>          2.2, 2.9  2 0.64   18.0
#>               2.2  1 0.55   12.0
#>          2.0, 1.4  2 0.54   11.9
#>               2.1  1 0.52   10.1
#>              0.34  1 0.22    1.8
#>     0.3, 1.0, 4.1  3 0.27    5.5
#>  0.42, 0.30, 7.00  3 0.20    3.0
#>               1.3  1 0.15    1.7
#>                    0 0.00    0.0
#>                    0 0.00    0.0
plot(fit)
#> Error in plot.vinecop(fit): The 'ggraph' package must be installed to plot.
contour(fit)


## select by log-likelihood criterion from one-paramter families
fit <- vinecop(u, family_set = "onepar", selcrit = "bic")
summary(fit)
#> # A data.frame: 10 x 11 
#>  tree edge conditioned conditioning var_types   family rotation parameters df
#>     1    1        3, 4                    c,c   gumbel        0        2.7  1
#>     1    2        2, 1                    c,c gaussian        0       0.76  1
#>     1    3        4, 1                    c,c   gumbel        0        2.2  1
#>     1    4        1, 5                    c,c   gumbel        0        2.1  1
#>     2    1        3, 1            4       c,c gaussian        0       0.32  1
#>     2    2        2, 4            1       c,c  clayton      180       0.53  1
#>     2    3        4, 5            1       c,c      joe      180        1.3  1
#>     3    1        3, 2         1, 4       c,c  clayton      180       0.43  1
#>     3    2        2, 5         4, 1       c,c gaussian        0       0.12  1
#>     4    1        3, 5      2, 1, 4       c,c gaussian        0      0.086  1
#>    tau loglik
#>  0.629  16.37
#>  0.546  10.70
#>  0.548  12.02
#>  0.522  10.10
#>  0.209   1.50
#>  0.209   2.23
#>  0.145   0.77
#>  0.176   1.08
#>  0.076   0.21
#>  0.055   0.12

## 1-truncated, Gaussian D-vine
fit <- vinecop(u, structure = dvine_structure(1:5), family = "gauss", trunc_lvl = 1)
plot(fit)
#> Error in plot.vinecop(fit): The 'ggraph' package must be installed to plot.
contour(fit)


## Partial structure selection with only first tree specified
structure <- rvine_structure(order = 1:5, list(rep(5, 4)))
structure
#> 5-dimensional R-vine structure ('rvine_structure'), 1-truncated
#> 5 5 5 5 5
#>       4  
#>     3    
#>   2      
#> 1        
fit <- vinecop(u, structure = structure, family = "gauss")
plot(fit)
#> Error in plot.vinecop(fit): The 'ggraph' package must be installed to plot.

## Model for discrete data
x <- qpois(u, 1)  # transform to Poisson margins
# we require two types of observations (see Details)
u_disc <- cbind(ppois(x, 1), ppois(x - 1, 1))
fit <- vinecop(u_disc, var_types = rep("d", 5))

## Model for mixed data
x <- qpois(u[, 1], 1)  # transform first variable to Poisson margin
# we require two types of observations (see Details)
u_disc <- cbind(ppois(x, 1), u[, 2:5], ppois(x - 1, 1))
fit <- vinecop(u_disc, var_types = c("d", rep("c", 4)))