Title: | A Collection of Graphon Estimation Methods |
---|---|
Description: | Provides a not-so-comprehensive list of methods for estimating graphon, a symmetric measurable function, from a single or multiple of observed networks. For a detailed introduction on graphon and popular estimation techniques, see the paper by Orbanz, P. and Roy, D.M.(2014) <doi:10.1109/TPAMI.2014.2334607>. It also contains several auxiliary functions for generating sample networks using various network models and graphons. |
Authors: | Kisung You [aut, cre] |
Maintainer: | Kisung You <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.5 |
Built: | 2024-11-07 06:13:22 UTC |
Source: | https://github.com/kisungyou/graphon |
The graphon provides a not-so-comprehensive list of methods for estimating graphon, a symmetric measurable function, from a single or multiple of observed networks. It also contains several auxiliary functions for generating sample networks using various network models and graphons.
Graphon - graph function - is a symmetric measurable function
that arise
in studying exchangeable random graph models as well as sequence of dense graphs. In the language of
graph theory, it can be understood as a two-stage procedural network modeling that 1) each vertex/node in the graph
is assigned an independent random variable from uniform distribution
, and
2) each edge
is randomly determined with probability
. Due to such
procedural aspect, the term probability matrix and graphon will be interchangeably used
in further documentation.
The package mainly consists of two types of functions whose names start with 'est'
and 'gmodel'
for estimation algorithms and graph models, respectively.
The 'est'
family has 4 estimation methods at the current version,
est.LG
for empirical degree sorting in stochastic blockmodel.
est.SBA
for stochastic blockmodel approximation.
est.USVT
for universal singular value thresholding.
est.nbdsmooth
for neighborhood smoothing.
est.completion
for matrix completion from a partially revealed data.
Also, the current release has following graph models implemented,
gmodel.P
generates a binary graph given an arbitrary probability matrix.
gmodel.ER
is an implementation of Erdos-Renyi random graph models.
gmodel.block
is used to generate networks with block structure.
gmodel.preset
has 10 exemplary graphon models for simulation.
Kisung You
The performance of Stochastic Blockmodel Approximation (SBA) method is
contingent on the number of blocks it finds during estimation process,
which is rougly determined by a precision parameter delta
. cv.SBA
tests multiple of delta values to find the optimal one that minimizes
the cross validation risk. Note that the optimal delta is not bound to be a single value.
cv.SBA(A, vecdelta = seq(0.1, 1, by = 0.1))
cv.SBA(A, vecdelta = seq(0.1, 1, by = 0.1))
A |
either
|
vecdelta |
a vector containing target delta values to be tested. |
a named list containing
optimal delta values that minimize the cross validation risk J.
cross validation risk values.
Chan SH, Airoldi EM (2014). “A Consistent Histogram Estimator for Exchangeable Graph Models.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML'14, I-208–I-216.
Airoldi EM, Costa TB, Chan SH (2013). “Stochastic blockmodel approximation of a graphon: Theory and consistent estimation.” In Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds.), Advances in Neural Information Processing Systems 26, 692–700. Curran Associates, Inc.
## Not run: ## generate a graphon of type No.8 with 3 clusters W = gmodel.preset(3,id=8) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 15 observations from a given probability matrix A = gmodel.P(P,rep=15) ## cross validate SBA algorithm over different deltas rescv = cv.SBA(A,vecdelta=c(0.1,0.5,0.9)) print(rescv$optdelta) ## End(Not run)
## Not run: ## generate a graphon of type No.8 with 3 clusters W = gmodel.preset(3,id=8) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 15 observations from a given probability matrix A = gmodel.P(P,rep=15) ## cross validate SBA algorithm over different deltas rescv = cv.SBA(A,vecdelta=c(0.1,0.5,0.9)) print(rescv$optdelta) ## End(Not run)
est.completion
adopts a matrix completion scheme,
which is common in missing data or matrix reconstruction studies.
When given a multiple of, or a single observation, we consider
non-existent edges as missing entries and apply the completion scheme.
See OptSpace
for a more detailed introduction.
est.completion( A, rank = NA, tolerance = 0.001, maxiter = 20, progress = FALSE, adjust = TRUE )
est.completion( A, rank = NA, tolerance = 0.001, maxiter = 20, progress = FALSE, adjust = TRUE )
A |
either
|
rank |
an estimated rank condition for the matrix; |
tolerance |
a tolerance level for singular value thresholding from OptSpace method. |
maxiter |
the number of maximum iterations for OptSpace method. |
progress |
a logical value; |
adjust |
a logical value; |
an corresponding probability matrix.
Keshavan RH, Montanari A, Oh S (2010). “Matrix Completion From a Few Entries.” IEEE Transactions on Information Theory, 56(6), 2980-2998. ISSN 0018-9448.
## generate a graphon of type No.5 with 3 clusters W = gmodel.preset(3,id=5) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 10 observations from a given probability matrix A = gmodel.P(P,rep=10) ## apply the method res_r3 = est.completion(A,rank=3) # use rank-3 approximation res_r9 = est.completion(A,rank=9) # use rank-9 approximation res_rN = est.completion(A,adjust=FALSE) # stop the code if guess works poorly ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(res_r3, main="rank 3") image(res_r9, main="rank 9") image(res_rN, main="rank is guessed") par(opar)
## generate a graphon of type No.5 with 3 clusters W = gmodel.preset(3,id=5) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 10 observations from a given probability matrix A = gmodel.P(P,rep=10) ## apply the method res_r3 = est.completion(A,rank=3) # use rank-3 approximation res_r9 = est.completion(A,rank=9) # use rank-9 approximation res_rN = est.completion(A,adjust=FALSE) # stop the code if guess works poorly ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(res_r3, main="rank 3") image(res_r9, main="rank 9") image(res_rN, main="rank is guessed") par(opar)
est.LG
takes a 2-stage approach. First it adopts largest gap criterion on empirical degrees to
estimate blocks of a given network under Stochastic Blockmodel framework.
Then a consistent histogram estimator is utilized to estimate graphons based on
estimated blocks in a given network.
est.LG(A, K = 2)
est.LG(A, K = 2)
A |
an |
K |
the number of blocks provided by an user. |
a named list containing
a matrix of 3D histogram.
an corresponding probability matrix.
a length- list where each element is a vector of nodes/indices
for each cluster.
Channarond A, Daudin J, Robin S (2011). “Classification and estimation in the Stochastic Block Model based on the empirical degrees.” ArXiv e-prints. 1110.6517.
Chan SH, Airoldi EM (2014). “A Consistent Histogram Estimator for Exchangeable Graph Models.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML'14, I-208–I-216.
## generate a graphon of type No.5 with 3 clusters W = gmodel.preset(3,id=10) ## create a probability matrix for 20 nodes graphW = gmodel.block(W,n=20) P = graphW$P ## draw 23 observations from a given probability matrix A = gmodel.P(P,rep=23,symmetric.out=TRUE) ## run LG algorithm with a rough guess for K=2,3,4 res2 = est.LG(A,K=2) res3 = est.LG(A,K=3) res4 = est.LG(A,K=4) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(P, main="original P matrix") image(res2$P, main="LG with K=2") image(res3$P, main="LG with K=3") image(res4$P, main="LG with K=4") par(opar)
## generate a graphon of type No.5 with 3 clusters W = gmodel.preset(3,id=10) ## create a probability matrix for 20 nodes graphW = gmodel.block(W,n=20) P = graphW$P ## draw 23 observations from a given probability matrix A = gmodel.P(P,rep=23,symmetric.out=TRUE) ## run LG algorithm with a rough guess for K=2,3,4 res2 = est.LG(A,K=2) res3 = est.LG(A,K=3) res4 = est.LG(A,K=4) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(P, main="original P matrix") image(res2$P, main="LG with K=2") image(res3$P, main="LG with K=3") image(res4$P, main="LG with K=4") par(opar)
est.nbdsmooth
takes the expectation of the adjacency matrix
in that it directly aims at estimating network edge probabilities without
imposing structural assumptions as of usual graphon estimation requires,
such as piecewise lipschitz condition. Note that this method is
for symmetric adjacency matrix only, i.e., undirected networks.
est.nbdsmooth(A)
est.nbdsmooth(A)
A |
either
|
a named list containing
a quantile threshold value.
a matrix of estimated edge probabilities.
Zhang Y, Levina E, Zhu J (2017). “Estimating network edge probabilities by neighbourhood smoothing.” Biometrika, 104(4), 771-783.
## generate a graphon of type No.4 with 3 clusters W = gmodel.preset(3,id=4) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 5 observations from a given probability matrix A = gmodel.P(P,rep=5,symmetric.out=TRUE) ## run nbdsmooth algorithm res2 = est.nbdsmooth(A) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(P, main="original P matrix") image(res2$P, main="nbdsmooth estimated P") par(opar)
## generate a graphon of type No.4 with 3 clusters W = gmodel.preset(3,id=4) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 5 observations from a given probability matrix A = gmodel.P(P,rep=5,symmetric.out=TRUE) ## run nbdsmooth algorithm res2 = est.nbdsmooth(A) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(P, main="original P matrix") image(res2$P, main="nbdsmooth estimated P") par(opar)
est.SBA
takes a 2-stage approach for estimating graphons
based on exchangeable random graph models. First, it finds a
Stochastic Blockmodel Approximation (SBA) of the graphon. Then,
it uses clustering information to estimate graphon using a consistent
histogram estimator.
est.SBA(A, delta = 0.5)
est.SBA(A, delta = 0.5)
A |
either
|
delta |
a precision parameter larger than 0. |
a named list containing
a matrix fo 3D histogram.
an corresponding probability matrix.
a length- list where each element is a vector of nodes/indices
for each cluster.
Airoldi EM, Costa TB, Chan SH (2013). “Stochastic blockmodel approximation of a graphon: Theory and consistent estimation.” In Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds.), Advances in Neural Information Processing Systems 26, 692–700. Curran Associates, Inc.
Chan SH, Airoldi EM (2014). “A Consistent Histogram Estimator for Exchangeable Graph Models.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML'14, I-208–I-216.
## generate a graphon of type No.6 with 3 clusters W = gmodel.preset(3,id=6) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 17 observations from a given probability matrix A = gmodel.P(P,rep=17) ## run SBA algorithm with different deltas (0.2,0.5,0.8) res2 = est.SBA(A,delta=0.2) res3 = est.SBA(A,delta=0.5) res4 = est.SBA(A,delta=0.8) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(P); title("original P") image(res2$P); title("SBA with delta=0.2") image(res3$P); title("SBA with delta=0.5") image(res4$P); title("SBA with delta=0.8") par(opar)
## generate a graphon of type No.6 with 3 clusters W = gmodel.preset(3,id=6) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 17 observations from a given probability matrix A = gmodel.P(P,rep=17) ## run SBA algorithm with different deltas (0.2,0.5,0.8) res2 = est.SBA(A,delta=0.2) res3 = est.SBA(A,delta=0.5) res4 = est.SBA(A,delta=0.8) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(P); title("original P") image(res2$P); title("SBA with delta=0.2") image(res3$P); title("SBA with delta=0.5") image(res4$P); title("SBA with delta=0.8") par(opar)
est.USVT
is a generic matrix estimation method first
proposed for the case where a noisy realization of the matrix is given.
Universal Singular Value Thresholding (USVT), as its name suggests,
utilizes singular value decomposition of observations in addition to
thresholding over singular values achieved from the decomposition.
est.USVT(A, eta = 0.01)
est.USVT(A, eta = 0.01)
A |
either
|
eta |
a positive number in |
a named list containing
a vector of sorted singular values.
a threshold to disregard singular values.
a matrix of estimated edge probabilities.
Chatterjee S (2015). “Matrix estimation by Universal Singular Value Thresholding.” Ann. Statist., 43(1), 177–214.
## generate a graphon of type No.1 with 3 clusters W = gmodel.preset(3,id=1) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 5 observations from a given probability matrix A = gmodel.P(P,rep=5,symmetric.out=TRUE) ## run USVT algorithm with different eta values (0.01,0.1) res2 = est.USVT(A,eta=0.01) res3 = est.USVT(A,eta=0.1) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(P, main="original P matrix") image(res2$P, main="USVT with eta=0.01") image(res3$P, main="USVT with eta = 0.1") par(opar)
## generate a graphon of type No.1 with 3 clusters W = gmodel.preset(3,id=1) ## create a probability matrix for 100 nodes graphW = gmodel.block(W,n=100) P = graphW$P ## draw 5 observations from a given probability matrix A = gmodel.P(P,rep=5,symmetric.out=TRUE) ## run USVT algorithm with different eta values (0.01,0.1) res2 = est.USVT(A,eta=0.01) res3 = est.USVT(A,eta=0.1) ## compare true probability matrix and estimated ones opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(P, main="original P matrix") image(res2$P, main="USVT with eta=0.01") image(res3$P, main="USVT with eta = 0.1") par(opar)
Given a stochastic blockmodel W,
gmodel.block
generates an (n-by-n) binary random graphs. All K blocks have
same number of nodes, or almost identical if n is not a multiple
of K. Parameter noloop
controls whether generated observations
have an edge from a node to itself, called a loop.
gmodel.block(W, n, rep = 1, noloop = TRUE)
gmodel.block(W, n, rep = 1, noloop = TRUE)
W |
a |
n |
the number of nodes for each observation. |
rep |
the number of observations to be generated. |
noloop |
a logical value; TRUE for graphs without self-loops, FALSE otherwise. |
a named list containing
depending on rep
value,
an observation, or
a length-rep
list where each element
is an observation is an realization from the model.
an probability matrix of generating each edge.
## set inputs W = matrix(c(0.9,0.2,0.2,0.7),nr=2) n = 200 ## generate 2 observations without self-loops. out <- gmodel.block(W,n,rep=2,noloop=TRUE) ## visualize generated graphs opar = par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(out$G[[1]]); title("Observation 1") image(out$G[[2]]); title("Observation 2") par(opar)
## set inputs W = matrix(c(0.9,0.2,0.2,0.7),nr=2) n = 200 ## generate 2 observations without self-loops. out <- gmodel.block(W,n,rep=2,noloop=TRUE) ## visualize generated graphs opar = par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(out$G[[1]]); title("Observation 1") image(out$G[[2]]); title("Observation 2") par(opar)
Erdos-Renyi random graph model is one of the most popular and
fundamental examples in modeling networks. Given n nodes,
gmodel.ER
generates edges randomly from Bernoulli distribution
with a globally specified probability.
gmodel.ER(n, mode = "prob", par = 0.5, rep = 1)
gmodel.ER(n, mode = "prob", par = 0.5, rep = 1)
n |
the number of nodes to be generated |
mode |
'prob' (default) for edges to be drawn from Bernoulli distribution independently, or 'num' for a graph to have a fixed number of edges placed randomly |
par |
a real number |
rep |
the number of observations to be generated. |
In network science, 'ER' model is often interchangeably used in where we have fixed number of edges to be placed at random. The original use of edge-generating probability is from Gilbert (1959). Therefore, we set this algorithm to be flexible in that user can create either a fixed number of edges placed at random or set global edge-generating probability and draw independent observations following Bernoulli distribution.
depending on rep
value, either
an observation matrix, or
a length-rep
list where each element
is an observation is an realization from the model.
Erdös P, Rényi A (1959). “On Random Graphs I.” Publicationes Mathematicae Debrecen, 6, 290.
Gilbert EN (1959). “Random Graphs.” Ann. Math. Statist., 30(4), 1141–1144.
## generate 3 graphs with a global with probability 0.5 graph3 = gmodel.ER(100,par=0.5,rep=3) ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(graph3[[1]], main="1st sample") image(graph3[[2]], main="2nd sample") image(graph3[[3]], main="3rd sample") par(opar)
## generate 3 graphs with a global with probability 0.5 graph3 = gmodel.ER(100,par=0.5,rep=3) ## visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(graph3[[1]], main="1st sample") image(graph3[[2]], main="2nd sample") image(graph3[[3]], main="3rd sample") par(opar)
Given an probability matrix
,
gmodel.P
generates
binary observation graphs corresponding to Bernoulli distribution
whose parameter matches to the element of .
gmodel.P(P, rep = 1, noloop = TRUE, symmetric.out = FALSE)
gmodel.P(P, rep = 1, noloop = TRUE, symmetric.out = FALSE)
P |
an |
rep |
the number of observations to be generated. |
noloop |
a logical value; TRUE for graphs without self-loops, FALSE otherwise. |
symmetric.out |
a logical value; FALSE for generated graphs to be nonsymmetric, TRUE otherwise. Note that TRUE is supported only if the input matrix P is symmetric. |
depending on rep
value, either
an (n-by-n)
observation matrix, or
a length-rep
list where each element
is an observation is an (n-by-n)
realization from the model.
## set inputs modelP <- matrix(runif(16),nrow=4) ## generate 3 observations without self-loops. out <- gmodel.P(modelP,rep=3,noloop=TRUE) ## visualize generated graphs opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(out[[1]], main="1st sample") image(out[[2]], main="2nd sample") image(out[[3]], main="3rd sample") par(opar)
## set inputs modelP <- matrix(runif(16),nrow=4) ## generate 3 observations without self-loops. out <- gmodel.P(modelP,rep=3,noloop=TRUE) ## visualize generated graphs opar = par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") image(out[[1]], main="1st sample") image(out[[2]], main="2nd sample") image(out[[3]], main="3rd sample") par(opar)
gmodel.preset
generates one of pre-specified graphons
of size . Users can select one of 10 different graphons by
their
id
, an integer from 1 to 10. The table of available graphons
follows that of the reference article given below.
gmodel.preset(n, id = 1, sort = TRUE)
gmodel.preset(n, id = 1, sort = TRUE)
n |
the number of nodes for a graphon to be generated. |
id |
an integer from 1 to 10, each corresponding to a specific graphon model. |
sort |
a logical value; TRUE to sort in an decreasing order of degree, FALSE otherwise. |
an graphon matrix.
Chan SH, Airoldi EM (2014). “A Consistent Histogram Estimator for Exchangeable Graph Models.” In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML'14, I-208–I-216.
## Not run: ## Generate 3 random graphons of nodal size 100. n = 100 r3 = (sample(1:10,3)) W1 = gmodel.preset(n,id=r3[1]) W2 = gmodel.preset(n,id=r3[2]) W3 = gmodel.preset(n,id=r3[3]) ## Generate corresponding observations and plot them A1 = gmodel.P(W1) A2 = gmodel.P(W2) A3 = gmodel.P(W3) \dontshow{ for (i in 1:10){ W = gmodel.preset(100,id=i) } } ## End(Not run)
## Not run: ## Generate 3 random graphons of nodal size 100. n = 100 r3 = (sample(1:10,3)) W1 = gmodel.preset(n,id=r3[1]) W2 = gmodel.preset(n,id=r3[2]) W3 = gmodel.preset(n,id=r3[3]) ## Generate corresponding observations and plot them A1 = gmodel.P(W1) A2 = gmodel.P(W2) A3 = gmodel.P(W3) \dontshow{ for (i in 1:10){ W = gmodel.preset(100,id=i) } } ## End(Not run)