Vinecop.select

Vinecop.select(self, data: numpy.ndarray[dtype=float64, shape=(*, *), order='F'], controls: pyvinecopulib.FitControlsVinecop = FitControlsVinecop()) → None

In other words, select() behaves differently depending on its current truncation level and the truncation level specified in the controls, respectively called trunc_lvl and . in what follows. Essentially, . defines the object’s truncation level after calling select():

If controls.trunc_lvl <= trunc_lvl, the families and parameters for all pairs in trees smaller or equal to . are selected, using the current structure.

If controls.trunc_lvl > trunc_lvl, select() behaves as above for all trees that are smaller or equal to trunc_lvl, and then it selects the structure for higher trees along with the families and parameters. This includes the case where trunc_lvl = 0, namely where the structure is fully unspecified.

Selection of the structure is performed using the algorithm of Dissmann, J. F., E. C. Brechmann, C. Czado, and D. Kurowicka (2013). Selecting and estimating regular vine copulae and application to financial returns. Computational Statistics & Data Analysis, 59 (1), 52-69. The dependence measure used to select trees (default: Kendall’s tau) is corrected for ties (see the wdm library). The dependence measure can be changed using the ., which can be set to "tau", "rho" or "hoeffd". Both Prim’s (default: "mst_prim") and Kruskal’s ()``”mst_kruskal”``) algorithms are available through . for the maximum spanning tree selection. An alternative to the maximum spanning tree selection is to use random spanning trees, which can be selected using . and come in two flavors, both using Wilson’s algorithm loop erased random walks:

“random_weighted”` generates a random spanning tree with probability proportional to the product of the weights (i.e., the dependence) of the edges in the tree.

“random_unweighted”` generates a random spanning tree uniformly over all spanning trees satisfying the proximity condition.

If the controls object has been instantiated with select_families = false, then the method simply updates the parameters of the pair-copulas without selecting the families or the structure. In this case, this is equivalent to calling fit() for each pair-copula, albeit potentially in parallel if num_threads > 1.

When at least one variable is discrete, two types of “observations” are required: the first \(n \times d\) block contains realizations of \(F_Y(Y), F_X(X)\); the second \(n \times d\) block contains realizations of \(F_Y(Y^-), F_X(X^-), ...\). The minus indicates a left-sided limit of the cdf. For continuous variables the left limit and the cdf itself coincide. For, e.g., an integer-valued variable, it holds \(F_Y(Y^-) = F_Y(Y - 1)\). Continuous variables in the second block can be omitted.

If there are missing data (i.e., NaN entries), incomplete observations are discarded before fitting a pair-copula. This is done on a pair-by-pair basis so that the maximal available information is used.

Parameters:

data – \(n \times (d + k)\) or \(n \times 2d\) matrix of observations, where \(k\) is the number of discrete variables.
controls – The controls to the algorithm (see FitControlsVinecop()).