Title: | A Robust Integrated Variance Correlation |
---|---|
Description: | A integrated variance correlation is proposed to measure the dependence between a categorical or continuous random variable and a continuous random variable or vector. This package is designed to estimate the new correlation coefficient with parametric and nonparametric approaches. Test of independence for different problems can also be implemented via the new correlation coefficient with this package. |
Authors: | Wei Xiong [aut], Han Pan [aut, cre], Hengjian Cui [aut] |
Maintainer: | Han Pan <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-01-10 05:46:14 UTC |
Source: | https://github.com/cran/IVCor |
This function is used to calculate the integrated variance correlation between two random variables or between a random variable and a multivariate random variable
IVC(y, x, K, NN = 3, type)
IVC(y, x, K, NN = 3, type)
y |
is a numeric vector |
x |
is a numeric vector or a data matrix |
K |
is the number of quantile levels |
NN |
is the number of B spline basis, default is 3 |
type |
is an indicator for measuring linear or nonlinear correlation, "linear" represents linear correlation and "nonlinear" represents linear or nonlinear correlation using B splines |
The value of the corresponding sample statistic
# linear model n=100 x=rnorm(n) y=3*x+rnorm(n) IVC(y,x,K=5,type="linear") # nonlinear model n=100 p=3 x=matrix(NA,nrow=n,ncol=p) for(i in 1:p){ x[,i]=rnorm(n) } y=cos(x[,1]+x[,2])+x[,3]^2+rnorm(n) IVC(y,x,K=5,type="nonlinear")
# linear model n=100 x=rnorm(n) y=3*x+rnorm(n) IVC(y,x,K=5,type="linear") # nonlinear model n=100 p=3 x=matrix(NA,nrow=n,ncol=p) for(i in 1:p){ x[,i]=rnorm(n) } y=cos(x[,1]+x[,2])+x[,3]^2+rnorm(n) IVC(y,x,K=5,type="nonlinear")
This function is used to calculate the critical values for integrated variance correlation test at significance level 0.1, 0.05 and 0.01
IVC_crit(N = 500, realizations)
IVC_crit(N = 500, realizations)
N |
is a integer as large as possible, default is 500 |
realizations |
is the the number of replication times for simulating the distribution under the null hypothesis |
The critical values at significance level 0.1, 0.05 and 0.01
IVC_crit(N=500,realizations=100)
IVC_crit(N=500,realizations=100)
This function is used to calculate the integrated variance correlation to measure interval independence
IVC_Interval(y, x, K, tau1, tau2, NN = 3, type)
IVC_Interval(y, x, K, tau1, tau2, NN = 3, type)
y |
is a numeric vector |
x |
is a numeric vector or a data matrix |
K |
is the number of quantile levels |
tau1 |
is the minimum quantile level |
tau2 |
is the maximum quantile level |
NN |
is the number of B spline basis, default is 3 |
type |
is an indicator for measuring linear or nonlinear correlation, "linear" represents linear correlation and "nonlinear" represents linear or nonlinear correlation using B splines |
The value of the corresponding sample statistic for interval independence
# linear model require("mvtnorm") n=100 p=3 pho1=0.5 mean_x=rep(0,p) sigma_x=matrix(NA,nrow = p,ncol = p) for (i in 1:p) { for (j in 1:p) { sigma_x[i,j]=pho1^(abs(i-j)) } } x=rmvnorm(n, mean = mean_x, sigma = sigma_x,method = "chol") y=2*(x[,1]+x[,2]+x[,3])+rnorm(n) IVC_Interval(y,x,K=5,tau1=0.4,tau2=0.6,type="linear") # nonlinear model n=100 x=runif(n,min=-2,max=2) y=exp(x^2)*rnorm(n) IVC_Interval(y,x,K=5,tau1=0.4,tau2=0.6,type="nonlinear")
# linear model require("mvtnorm") n=100 p=3 pho1=0.5 mean_x=rep(0,p) sigma_x=matrix(NA,nrow = p,ncol = p) for (i in 1:p) { for (j in 1:p) { sigma_x[i,j]=pho1^(abs(i-j)) } } x=rmvnorm(n, mean = mean_x, sigma = sigma_x,method = "chol") y=2*(x[,1]+x[,2]+x[,3])+rnorm(n) IVC_Interval(y,x,K=5,tau1=0.4,tau2=0.6,type="linear") # nonlinear model n=100 x=runif(n,min=-2,max=2) y=exp(x^2)*rnorm(n) IVC_Interval(y,x,K=5,tau1=0.4,tau2=0.6,type="nonlinear")
This function is used to calculate the integrated variance correlation between a discrete response variable and a continuous random variable
IVCCA(y, x, K)
IVCCA(y, x, K)
y |
is the categorical response vector |
x |
is a numeric vector |
K |
is the number of quantile levels |
The value of the corresponding sample statistic
n=100 y=sample(rep(1:3), n, replace = TRUE, prob = c(1/3,1/3,1/3)) x=c() for(i in 1:n){ x[i]=rnorm(1,mean=2*y[i],sd=1) } IVCCA(y,x,K=5)
n=100 y=sample(rep(1:3), n, replace = TRUE, prob = c(1/3,1/3,1/3)) x=c() for(i in 1:n){ x[i]=rnorm(1,mean=2*y[i],sd=1) } IVCCA(y,x,K=5)
This function is used to calculate the critical values for integrated variance correlation test with discrete response at significance level 0.1, 0.05 and 0.01
IVCCA_crit(R, N = 500, realizations)
IVCCA_crit(R, N = 500, realizations)
R |
is the number of categories |
N |
is a integer as large as possible, default is 500 |
realizations |
is the the number of replication times for simulating the distribution under the null hypothesis |
The critical values at significance level 0.1, 0.05 and 0.01
IVCCA_crit(R=5,N=500,realizations=100)
IVCCA_crit(R=5,N=500,realizations=100)
This function is used to test independence between a categorical variable and a continuous variable using integrated variance correlation
IVCCAT(y, x, K, num_per, type)
IVCCAT(y, x, K, num_per, type)
y |
is a categorical response vector |
x |
is a numeric vector |
K |
is the number of quantile levels |
num_per |
is the number of permutation times |
type |
is an indicator for fixed number of categories or infinity number of categories, "fixed" represents number of categories is fixed, then a permutation test is used, "infinity" represents number of categories is infinite, then an asymptotic normal distribution is used to calculate p values |
The p-value of the corresponding hypothesis test
# small R n=100 x=runif(n,0,1) y=sample(rep(1:3), n, replace = TRUE, prob = c(1/3,1/3,1/3)) IVCCAT(y,x,K=5,num_per=20,type = "fixed") # large R n=200 y=sample(rep(1:20), n, replace = TRUE, prob = rep(1/20,20)) mu_x=sample(c(1,2,3,4),20,replace = TRUE,prob = c(1/4,1/4,1/4,1/4)) x=c() for (i in 1:n) { x[i]=2*mu_x[y[i]]+rcauchy(1) } IVCCAT(y,x,K=10,type = "infinity")
# small R n=100 x=runif(n,0,1) y=sample(rep(1:3), n, replace = TRUE, prob = c(1/3,1/3,1/3)) IVCCAT(y,x,K=5,num_per=20,type = "fixed") # large R n=200 y=sample(rep(1:20), n, replace = TRUE, prob = rep(1/20,20)) mu_x=sample(c(1,2,3,4),20,replace = TRUE,prob = c(1/4,1/4,1/4,1/4)) x=c() for (i in 1:n) { x[i]=2*mu_x[y[i]]+rcauchy(1) } IVCCAT(y,x,K=10,type = "infinity")
This function is used to calculate the integrated variance correlation between two random variables with local linear estimation
IVCLLQ(y, x, K)
IVCLLQ(y, x, K)
y |
is a numeric vector |
x |
is a numeric vector |
K |
is the number of quantile levels |
The value of the corresponding sample statistic
n=100 x=rnorm(n) y=exp(x)+rnorm(n) IVCLLQ(y,x,K=4)
n=100 x=rnorm(n) y=exp(x)+rnorm(n) IVCLLQ(y,x,K=4)
This function is used to test significance of linear or nonlinear correlation using integrated variance correlation
IVCT(y, x, K, num_per, NN = 3, type)
IVCT(y, x, K, num_per, NN = 3, type)
y |
is the response vector |
x |
is a numeric vector or a data matrix |
K |
is the number of quantile levels |
num_per |
is the number of permutation times |
NN |
is the number of B spline basis, default is 3 |
type |
is an indicator for measuring linear or nonlinear correlation, "linear" represents linear correlation and "nonlinear" represents linear or nonlinear correlation using B splines |
The p-value of the corresponding hypothesis test
# linear model n=100 x=rnorm(n) y=rnorm(n) IVCT(y,x,K=5,num_per=20,type = "linear") # nonlinear model n=100 p=4 x=matrix(NA,nrow=n,ncol=p) for(i in 1:p){ x[,i]=runif(n,0,1) } y=3*ifelse(x[,1]>0.5,1,0)*x[,2]+3*cos(x[,3])^2*x[,1]+3*(x[,4]^2-1)*x[,1]+rnorm(n) IVCT(y,x,K=5,num_per=20,type = "nonlinear")
# linear model n=100 x=rnorm(n) y=rnorm(n) IVCT(y,x,K=5,num_per=20,type = "linear") # nonlinear model n=100 p=4 x=matrix(NA,nrow=n,ncol=p) for(i in 1:p){ x[,i]=runif(n,0,1) } y=3*ifelse(x[,1]>0.5,1,0)*x[,2]+3*cos(x[,3])^2*x[,1]+3*(x[,4]^2-1)*x[,1]+rnorm(n) IVCT(y,x,K=5,num_per=20,type = "nonlinear")
This function is used to test interval independence using integrated variance correlation
IVCT_Interval(y, x, tau1, tau2, K, num_per, NN = 3, type)
IVCT_Interval(y, x, tau1, tau2, K, num_per, NN = 3, type)
y |
is the response vector |
x |
is a numeric vector or a data matrix |
tau1 |
is the minimum quantile level |
tau2 |
is the maximum quantile level |
K |
is the number of quantile levels |
num_per |
is the number of permutation times |
NN |
is the number of B spline basis, default is 3 |
type |
is an indicator for measuring linear or nonlinear correlation, "linear" represents linear correlation and "nonlinear" represents linear or nonlinear correlation using B splines |
The p-value of the corresponding hypothesis test
require("mvtnorm") n=100 p=3 pho1=0.5 mean_x=rep(0,p) sigma_x=matrix(NA,nrow = p,ncol = p) for (i in 1:p) { for (j in 1:p) { sigma_x[i,j]=pho1^(abs(i-j)) } } x=rmvnorm(n, mean = mean_x, sigma = sigma_x,method = "chol") y=rnorm(n) IVCT_Interval(y,x,tau1=0.5,tau2=0.75,K=5,num_per=20,type = "linear") n=100 x_til=runif(n,min=-1,max=1) y_til=rnorm(n) epsilon=rnorm(n) x=x_til+2*epsilon*ifelse(x_til<=-0.5&y_til<=-0.675,1,0) y=y_til+2*epsilon*ifelse(x_til<=-0.5&y_til<=-0.675,1,0) IVCT_Interval(y,x,tau1=0.6,tau2=0.8,K=5,num_per=20,type = "nonlinear")
require("mvtnorm") n=100 p=3 pho1=0.5 mean_x=rep(0,p) sigma_x=matrix(NA,nrow = p,ncol = p) for (i in 1:p) { for (j in 1:p) { sigma_x[i,j]=pho1^(abs(i-j)) } } x=rmvnorm(n, mean = mean_x, sigma = sigma_x,method = "chol") y=rnorm(n) IVCT_Interval(y,x,tau1=0.5,tau2=0.75,K=5,num_per=20,type = "linear") n=100 x_til=runif(n,min=-1,max=1) y_til=rnorm(n) epsilon=rnorm(n) x=x_til+2*epsilon*ifelse(x_til<=-0.5&y_til<=-0.675,1,0) y=y_til+2*epsilon*ifelse(x_til<=-0.5&y_til<=-0.675,1,0) IVCT_Interval(y,x,tau1=0.6,tau2=0.8,K=5,num_per=20,type = "nonlinear")
This function is used to test significance using integrated variance correlation with local linear estimation
IVCTLLQ(y, x, K, num_per)
IVCTLLQ(y, x, K, num_per)
y |
is a numeric vector |
x |
is a numeric vector |
K |
is the number of quantile levels |
num_per |
is the number of permutation times |
The p-value of the corresponding hypothesis test
n=100 x=runif(n,-1,1) y=2*cos(2*x)+rnorm(n) IVCTLLQ(y,x,K=5,num_per=100)
n=100 x=runif(n,-1,1) y=2*cos(2*x)+rnorm(n) IVCTLLQ(y,x,K=5,num_per=100)