Title: | Toolkit for Topological Data Analysis |
---|---|
Description: | Topological data analysis studies structure and shape of the data using topological features. We provide a variety of algorithms to learn with persistent homology of the data based on functional summaries for clustering, hypothesis testing, visualization, and others. We refer to Wasserman (2018) <doi:10.1146/annurev-statistics-031017-100045> for a statistical perspective on the topic. |
Authors: | Kisung You [aut, cre] , Byeongsu Yu [aut] |
Maintainer: | Kisung You <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2024-11-02 05:16:07 UTC |
Source: | https://github.com/kisungyou/tdakit |
Persistence Landscape (PL) is a functional summary of persistent homology
that is constructed given a homology
object.
diag2landscape(homology, dimension = 1, k = 0, nseq = 1000)
diag2landscape(homology, dimension = 1, k = 0, nseq = 1000)
homology |
an object of S3 class |
dimension |
dimension of features to be considered (default: 1). |
k |
the number of top landscape functions to be used (default: 0). When |
nseq |
grid size for which the landscape function is evaluated (default: 1000). |
a list object of "landscape"
class containing
an landscape functions.
a length-nseq
vector of domain grid.
dimension of features considered.
Peter Bubenik (2018). “The Persistence Landscape and Some of Its Properties.” arXiv:1810.04963.
# --------------------------------------------------------------------------- # Persistence Landscape of 'iris' Dataset # # We will extract landscapes of dimensions 0, 1, and 2. # For each feature, only the top 5 landscape functions are plotted. # --------------------------------------------------------------------------- ## Prepare 'iris' data XX = as.matrix(iris[,1:4]) ## Compute Persistence Diagram pdrips = diagRips(XX, maxdim=2) ## Convert to Landscapes of Each Dimension land0 <- diag2landscape(pdrips, dimension=0, k=5) land1 <- diag2landscape(pdrips, dimension=1, k=5) land2 <- diag2landscape(pdrips, dimension=2, k=5) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2)) plot(pdrips$Birth, pdrips$Death, col=as.factor(pdrips$Dimension), pch=19, main="persistence diagram", xlab="Birth", ylab="Death") matplot(land0$tseq, land0$lambda, type="l", lwd=3, main="dimension 0", xlab="t") matplot(land1$tseq, land1$lambda, type="l", lwd=3, main="dimension 1", xlab="t") matplot(land2$tseq, land2$lambda, type="l", lwd=3, main="dimension 2", xlab="t") par(opar)
# --------------------------------------------------------------------------- # Persistence Landscape of 'iris' Dataset # # We will extract landscapes of dimensions 0, 1, and 2. # For each feature, only the top 5 landscape functions are plotted. # --------------------------------------------------------------------------- ## Prepare 'iris' data XX = as.matrix(iris[,1:4]) ## Compute Persistence Diagram pdrips = diagRips(XX, maxdim=2) ## Convert to Landscapes of Each Dimension land0 <- diag2landscape(pdrips, dimension=0, k=5) land1 <- diag2landscape(pdrips, dimension=1, k=5) land2 <- diag2landscape(pdrips, dimension=2, k=5) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2)) plot(pdrips$Birth, pdrips$Death, col=as.factor(pdrips$Dimension), pch=19, main="persistence diagram", xlab="Birth", ylab="Death") matplot(land0$tseq, land0$lambda, type="l", lwd=3, main="dimension 0", xlab="t") matplot(land1$tseq, land1$lambda, type="l", lwd=3, main="dimension 1", xlab="t") matplot(land2$tseq, land2$lambda, type="l", lwd=3, main="dimension 2", xlab="t") par(opar)
Persistence Silhouette (PS) is a functional summary of persistent homology
that is constructed given a homology
object. PS is a weighted average of
landscape functions so that it becomes a uni-dimensional function.
diag2silhouette(homology, dimension = 1, p = 2, nseq = 100)
diag2silhouette(homology, dimension = 1, p = 2, nseq = 100)
homology |
an object of S3 class |
dimension |
dimension of features to be considered (default: 1). |
p |
an exponent for the weight function of form |
nseq |
grid size for which the landscape function is evaluated. |
a list object of "silhouette"
class containing
an landscape functions.
a length-nseq
vector of domain grid.
dimension of features considered.
# --------------------------------------------------------------------------- # Persistence Silhouette of 'iris' Dataset # # We will extract silhouettes of dimensions 0, 1, and 2. # --------------------------------------------------------------------------- ## Prepare 'iris' data XX = as.matrix(iris[,1:4]) ## Compute Persistence Diagram pdrips = diagRips(XX, maxdim=2) ## Convert to Silhouettes of Each Dimension sil0 <- diag2silhouette(pdrips, dimension=0) sil1 <- diag2silhouette(pdrips, dimension=1) sil2 <- diag2silhouette(pdrips, dimension=2) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2)) plot(pdrips$Birth, pdrips$Death, col=as.factor(pdrips$Dimension), pch=19, main="persistence diagram", xlab="Birth", ylab="Death") plot(sil0$tseq, sil0$lambda, type="l", lwd=3, main="dimension 0", xlab="t") plot(sil1$tseq, sil1$lambda, type="l", lwd=3, main="dimension 1", xlab="t") plot(sil2$tseq, sil2$lambda, type="l", lwd=3, main="dimension 2", xlab="t") par(opar)
# --------------------------------------------------------------------------- # Persistence Silhouette of 'iris' Dataset # # We will extract silhouettes of dimensions 0, 1, and 2. # --------------------------------------------------------------------------- ## Prepare 'iris' data XX = as.matrix(iris[,1:4]) ## Compute Persistence Diagram pdrips = diagRips(XX, maxdim=2) ## Convert to Silhouettes of Each Dimension sil0 <- diag2silhouette(pdrips, dimension=0) sil1 <- diag2silhouette(pdrips, dimension=1) sil2 <- diag2silhouette(pdrips, dimension=2) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2)) plot(pdrips$Birth, pdrips$Death, col=as.factor(pdrips$Dimension), pch=19, main="persistence diagram", xlab="Birth", ylab="Death") plot(sil0$tseq, sil0$lambda, type="l", lwd=3, main="dimension 0", xlab="t") plot(sil1$tseq, sil1$lambda, type="l", lwd=3, main="dimension 1", xlab="t") plot(sil2$tseq, sil2$lambda, type="l", lwd=3, main="dimension 2", xlab="t") par(opar)
diagRips
computes the persistent diagram of the Vietoris-Rips filtration
constructed on a point cloud represented as matrix
or dist
object.
This function is a second-hand wrapper to TDAstats's wrapping for Ripser
library.
diagRips(data, maxdim = 1, threshold = Inf)
diagRips(data, maxdim = 1, threshold = Inf)
data |
a |
maxdim |
maximum dimension of the computed homological features (default: 1). |
threshold |
maximum value of the filtration (default: |
a dataframe object of S3 class "homology"
with following columns
dimension corresponding to a feature.
birth of a feature.
death of a feature.
Raoul R. Wadhwa, Drew F.K. Williamson, Andrew Dhawan, Jacob G. Scott (2018). “TDAstats: R Pipeline for Computing Persistent Homology in Topological Data Analysis.” Journal of Open Source Software, 3(28), 860. ISSN 2475-9066.
Ulrich Bauer (2019). “Ripser: Efficient Computation of Vietoris-Rips Persistence Barcodes.” arXiv:1908.02518.
# --------------------------------------------------------------------------- # Check consistency of two types of inputs : 'matrix' and 'dist' objects # --------------------------------------------------------------------------- # Use 'iris' data and compute its distance matrix XX = as.matrix(iris[,1:4]) DX = stats::dist(XX) # Compute VR Diagram with two inputs vr.mat = diagRips(XX) vr.dis = diagRips(DX) col1 = as.factor(vr.mat$Dimension) col2 = as.factor(vr.dis$Dimension) # Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") plot(vr.mat$Birth, vr.mat$Death, pch=19, col=col1, main="from 'matrix'") plot(vr.dis$Birth, vr.dis$Death, pch=19, col=col2, main="from 'dist'") par(opar)
# --------------------------------------------------------------------------- # Check consistency of two types of inputs : 'matrix' and 'dist' objects # --------------------------------------------------------------------------- # Use 'iris' data and compute its distance matrix XX = as.matrix(iris[,1:4]) DX = stats::dist(XX) # Compute VR Diagram with two inputs vr.mat = diagRips(XX) vr.dis = diagRips(DX) col1 = as.factor(vr.mat$Dimension) col2 = as.factor(vr.dis$Dimension) # Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") plot(vr.mat$Birth, vr.mat$Death, pch=19, col=col1, main="from 'matrix'") plot(vr.dis$Birth, vr.dis$Death, pch=19, col=col2, main="from 'dist'") par(opar)
Distance of Multiple Functional SummariesGiven multiple functional summaries ,
compute
distance in a pairwise sense.
fsdist(fslist, p = 2, as.dist = TRUE)
fsdist(fslist, p = 2, as.dist = TRUE)
fslist |
a length- |
p |
an exponent in |
as.dist |
logical; if TRUE, it returns |
a S3 dist
object or symmetric matrix of pairwise distances according to
as.dist
parameter.
# --------------------------------------------------------------------------- # Compute L_2 Distance for 3 Types of Landscapes and Silhouettes # # We will compare dim=0,1 with top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions # We try to get distance in dimensions 0 and 1. list_land0 = list() list_land1 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) list_land1[[i]] = diag2landscape(list_rips[[i]], dimension=1, k=5) } ## Compute Silhouettes list_sil0 = list() list_sil1 = list() for (i in 1:(3*ndata)){ list_sil0[[i]] = diag2silhouette(list_rips[[i]], dimension=0) list_sil1[[i]] = diag2silhouette(list_rips[[i]], dimension=1) } ## Compute L2 Distance Matrices ldmat0 = fsdist(list_land0, p=2, as.dist=FALSE) ldmat1 = fsdist(list_land1, p=2, as.dist=FALSE) sdmat0 = fsdist(list_sil0, p=2, as.dist=FALSE) sdmat1 = fsdist(list_sil1, p=2, as.dist=FALSE) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(ldmat0[,(3*(ndata)):1], axes=FALSE, main="Landscape : dim=0") image(ldmat1[,(3*(ndata)):1], axes=FALSE, main="Landscape : dim=1") image(sdmat0[,(3*(ndata)):1], axes=FALSE, main="Silhouette : dim=0") image(sdmat1[,(3*(ndata)):1], axes=FALSE, main="Silhouette : dim=1") par(opar)
# --------------------------------------------------------------------------- # Compute L_2 Distance for 3 Types of Landscapes and Silhouettes # # We will compare dim=0,1 with top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions # We try to get distance in dimensions 0 and 1. list_land0 = list() list_land1 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) list_land1[[i]] = diag2landscape(list_rips[[i]], dimension=1, k=5) } ## Compute Silhouettes list_sil0 = list() list_sil1 = list() for (i in 1:(3*ndata)){ list_sil0[[i]] = diag2silhouette(list_rips[[i]], dimension=0) list_sil1[[i]] = diag2silhouette(list_rips[[i]], dimension=1) } ## Compute L2 Distance Matrices ldmat0 = fsdist(list_land0, p=2, as.dist=FALSE) ldmat1 = fsdist(list_land1, p=2, as.dist=FALSE) sdmat0 = fsdist(list_sil0, p=2, as.dist=FALSE) sdmat1 = fsdist(list_sil1, p=2, as.dist=FALSE) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") image(ldmat0[,(3*(ndata)):1], axes=FALSE, main="Landscape : dim=0") image(ldmat1[,(3*(ndata)):1], axes=FALSE, main="Landscape : dim=1") image(sdmat0[,(3*(ndata)):1], axes=FALSE, main="Silhouette : dim=0") image(sdmat1[,(3*(ndata)):1], axes=FALSE, main="Silhouette : dim=1") par(opar)
Distance for Two Sets of Functional SummariesGiven two sets of functional summaries and
, compute
distance across pairs.
fsdist2(fslist1, fslist2, p = 2)
fsdist2(fslist1, fslist2, p = 2)
fslist1 |
a length- |
fslist2 |
a length- |
p |
an exponent in |
an distance matrix.
# --------------------------------------------------------------------------- # Compute L1 and L2 Distance for Two Sets of Landscapes # # First set consists of {Class 1, Class 2}, while # Second set consists of {Class 1, Class 3} where # # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips1 = list() list_rips2 = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4, sd=4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4, sd=4), ncol=4) dat4 = gen2circles(n=100, sd=1)$data list_rips1[[i]] = diagRips(dat1, maxdim=1) list_rips1[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips2[[i]] = diagRips(dat3, maxdim=1) list_rips2[[i+ndata]] = diagRips(dat4, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=10 Functions # We try to get distance in dimension 1 only for faster comparison. list_pset1 = list() list_pset2 = list() for (i in 1:(2*ndata)){ list_pset1[[i]] = diag2landscape(list_rips1[[i]], dimension=1, k=10) list_pset2[[i]] = diag2landscape(list_rips2[[i]], dimension=1, k=10) } ## Compute L1 and L2 Distance Matrix dmat1 = fsdist2(list_pset1, list_pset2, p=1) dmat2 = fsdist2(list_pset1, list_pset2, p=2) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(dmat1[,(2*ndata):1], axes=FALSE, main="distance for p=1") image(dmat2[,(2*ndata):1], axes=FALSE, main="distance for p=2") par(opar)
# --------------------------------------------------------------------------- # Compute L1 and L2 Distance for Two Sets of Landscapes # # First set consists of {Class 1, Class 2}, while # Second set consists of {Class 1, Class 3} where # # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips1 = list() list_rips2 = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4, sd=4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4, sd=4), ncol=4) dat4 = gen2circles(n=100, sd=1)$data list_rips1[[i]] = diagRips(dat1, maxdim=1) list_rips1[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips2[[i]] = diagRips(dat3, maxdim=1) list_rips2[[i+ndata]] = diagRips(dat4, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=10 Functions # We try to get distance in dimension 1 only for faster comparison. list_pset1 = list() list_pset2 = list() for (i in 1:(2*ndata)){ list_pset1[[i]] = diag2landscape(list_rips1[[i]], dimension=1, k=10) list_pset2[[i]] = diag2landscape(list_rips2[[i]], dimension=1, k=10) } ## Compute L1 and L2 Distance Matrix dmat1 = fsdist2(list_pset1, list_pset2, p=1) dmat2 = fsdist2(list_pset1, list_pset2, p=2) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(dmat1[,(2*ndata):1], axes=FALSE, main="distance for p=1") image(dmat2[,(2*ndata):1], axes=FALSE, main="distance for p=2") par(opar)
Also known as -sample problem, it tests whether multiple functional summaries
are equally distributed or not via Energy statistics.
fseqdist(fslist, label, method = c("original", "disco"), mc.iter = 999)
fseqdist(fslist, label, method = c("original", "disco"), mc.iter = 999)
fslist |
a length- |
label |
a length- |
method |
(case-sensitive) name of methods; one of |
mc.iter |
number of bootstrap replicates. |
a (list) object of S3 class htest
containing:
name of the test.
a test statistic.
-value under
of equal distributions.
# --------------------------------------------------------------------------- # Test for Equality of Distributions via Energy Statistics # # We will compare dim=0's top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Create Label and Run the Test with Different Options list_lab = c(rep(1,ndata), rep(2,ndata), rep(3,ndata)) fseqdist(list_land0, list_lab, method="original") fseqdist(list_land0, list_lab, method="disco")
# --------------------------------------------------------------------------- # Test for Equality of Distributions via Energy Statistics # # We will compare dim=0's top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Create Label and Run the Test with Different Options list_lab = c(rep(1,ndata), rep(2,ndata), rep(3,ndata)) fseqdist(list_land0, list_lab, method="original") fseqdist(list_land0, list_lab, method="disco")
Given multiple functional summaries ,
perform hierarchical agglomerative clustering with
distance.
fshclust( fslist, method = c("single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", "median"), members = NULL )
fshclust( fslist, method = c("single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", "median"), members = NULL )
fslist |
a length- |
method |
agglomeration method to be used. This must be one of |
members |
|
an object of class hclust
. See hclust
for details.
# --------------------------------------------------------------------------- # K-Groups Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } list_lab = c(rep(1,ndata), rep(2,ndata), rep(3,ndata)) ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run MDS for Visualization embed = fsmds(list_land0, ndim=2) ## Clustering with 'single' and 'complete' linkage hc.sing <- fshclust(list_land0, method="single") hc.comp <- fshclust(list_land0, method="complete") ## Visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3)) plot(embed, pch=19, col=list_lab, main="2-dim embedding") plot(hc.sing, main="single linkage") plot(hc.comp, main="complete linkage") par(opar)
# --------------------------------------------------------------------------- # K-Groups Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } list_lab = c(rep(1,ndata), rep(2,ndata), rep(3,ndata)) ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run MDS for Visualization embed = fsmds(list_land0, ndim=2) ## Clustering with 'single' and 'complete' linkage hc.sing <- fshclust(list_land0, method="single") hc.comp <- fshclust(list_land0, method="complete") ## Visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,3)) plot(embed, pch=19, col=list_lab, main="2-dim embedding") plot(hc.sing, main="single linkage") plot(hc.comp, main="complete linkage") par(opar)
-Groups Clustering of Multiple Functional Summaries by Energy DistanceGiven functional summaries
,
perform
-groups clustering by energy distance using
metric.
fskgroups(fslist, k = 2, ...)
fskgroups(fslist, k = 2, ...)
fslist |
a length- |
k |
the number of clusters. |
... |
extra parameters including
|
a length- vector of class labels (from
).
# --------------------------------------------------------------------------- # K-Groups Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run K-Groups Clustering with different K's label2 = fskgroups(list_land0, k=2) label3 = fskgroups(list_land0, k=3) label4 = fskgroups(list_land0, k=4) truelab = rep(c(1,2,3), each=ndata) ## Run MDS & Visualization embed = fsmds(list_land0, ndim=2) opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") plot(embed, col=truelab, pch=19, main="true label") plot(embed, col=label2, pch=19, main="k=2 label") plot(embed, col=label3, pch=19, main="k=3 label") plot(embed, col=label4, pch=19, main="k=4 label") par(opar)
# --------------------------------------------------------------------------- # K-Groups Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run K-Groups Clustering with different K's label2 = fskgroups(list_land0, k=2) label3 = fskgroups(list_land0, k=3) label4 = fskgroups(list_land0, k=4) truelab = rep(c(1,2,3), each=ndata) ## Run MDS & Visualization embed = fsmds(list_land0, ndim=2) opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") plot(embed, col=truelab, pch=19, main="true label") plot(embed, col=label2, pch=19, main="k=2 label") plot(embed, col=label3, pch=19, main="k=3 label") plot(embed, col=label4, pch=19, main="k=4 label") par(opar)
Given functional summaries
,
perform k-medoids clustering using pairwise distances using
metric.
fskmedoids(fslist, k = 2)
fskmedoids(fslist, k = 2)
fslist |
a length- |
k |
the number of clusters. |
a length- vector of class labels (from
).
# --------------------------------------------------------------------------- # K-Groups Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run K-Medoids Clustering with different K's label2 = fskmedoids(list_land0, k=2) label3 = fskmedoids(list_land0, k=3) label4 = fskmedoids(list_land0, k=4) truelab = rep(c(1,2,3), each=ndata) ## Run MDS & Visualization embed = fsmds(list_land0, ndim=2) opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") plot(embed, col=truelab, pch=19, main="true label") plot(embed, col=label2, pch=19, main="k=2 label") plot(embed, col=label3, pch=19, main="k=3 label") plot(embed, col=label4, pch=19, main="k=4 label") par(opar)
# --------------------------------------------------------------------------- # K-Groups Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run K-Medoids Clustering with different K's label2 = fskmedoids(list_land0, k=2) label3 = fskmedoids(list_land0, k=3) label4 = fskmedoids(list_land0, k=4) truelab = rep(c(1,2,3), each=ndata) ## Run MDS & Visualization embed = fsmds(list_land0, ndim=2) opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") plot(embed, col=truelab, pch=19, main="true label") plot(embed, col=label2, pch=19, main="k=2 label") plot(embed, col=label3, pch=19, main="k=3 label") plot(embed, col=label4, pch=19, main="k=4 label") par(opar)
Given multiple functional summaries ,
apply multidimensional scaling to get low-dimensional representation in Euclidean space. Usually,
ndim=2,3
is chosen for visualization.
fsmds(fslist, ndim = 2, method = c("classical", "metric"))
fsmds(fslist, ndim = 2, method = c("classical", "metric"))
fslist |
a length- |
ndim |
an integer-valued target dimension (default: 2). |
method |
name of an algorithm type (default: classical). |
an matrix of embedding.
# --------------------------------------------------------------------------- # Multidimensional Scaling for Multiple Landscapes and Silhouettes # # We will compare dim=0 with top-5 landscape and silhouette functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Landscape and Silhouettes of Dimension 0 list_land = list() list_sils = list() for (i in 1:(3*ndata)){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0) list_sils[[i]] = diag2silhouette(list_rips[[i]], dimension=0) } list_lab = rep(c(1,2,3), each=ndata) ## Run Classical/Metric Multidimensional Scaling land_cmds = fsmds(list_land, method="classical") land_mmds = fsmds(list_land, method="metric") sils_cmds = fsmds(list_sils, method="classical") sils_mmds = fsmds(list_sils, method="metric") ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2)) plot(land_cmds, pch=19, col=list_lab, main="Landscape+CMDS") plot(land_mmds, pch=19, col=list_lab, main="Landscape+MMDS") plot(sils_cmds, pch=19, col=list_lab, main="Silhouette+CMDS") plot(sils_mmds, pch=19, col=list_lab, main="Silhouette+MMDS") par(opar)
# --------------------------------------------------------------------------- # Multidimensional Scaling for Multiple Landscapes and Silhouettes # # We will compare dim=0 with top-5 landscape and silhouette functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Landscape and Silhouettes of Dimension 0 list_land = list() list_sils = list() for (i in 1:(3*ndata)){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0) list_sils[[i]] = diag2silhouette(list_rips[[i]], dimension=0) } list_lab = rep(c(1,2,3), each=ndata) ## Run Classical/Metric Multidimensional Scaling land_cmds = fsmds(list_land, method="classical") land_mmds = fsmds(list_land, method="metric") sils_cmds = fsmds(list_sils, method="classical") sils_mmds = fsmds(list_sils, method="metric") ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,2)) plot(land_cmds, pch=19, col=list_lab, main="Landscape+CMDS") plot(land_mmds, pch=19, col=list_lab, main="Landscape+MMDS") plot(sils_cmds, pch=19, col=list_lab, main="Silhouette+CMDS") plot(sils_mmds, pch=19, col=list_lab, main="Silhouette+MMDS") par(opar)
Given multiple functional summaries ,
compute the mean
.
fsmean(fslist)
fsmean(fslist)
fslist |
a length- |
a functional summary object.
# --------------------------------------------------------------------------- # Mean of 10 Persistence Landscapes from '2holes' data # --------------------------------------------------------------------------- ## Generate 10 Diagrams with 'gen2holes()' function list_rips = list() for (i in 1:10){ list_rips[[i]] = diagRips(gen2holes(n=100, sd=2)$data, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land = list() for (i in 1:10){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Compute Weighted Sum of Landscapes ldsum = fsmean(list_land) ## Visualize sam5 <- sort(sample(1:10, 5, replace=FALSE)) opar <- par(no.readonly=TRUE) par(mfrow=c(2,3), pty="s") for (i in 1:5){ tgt = list_land[[sam5[i]]] matplot(tgt$tseq, tgt$lambda[,1:5], type="l", lwd=3, main=paste("landscape no.",sam5[i])) } matplot(ldsum$tseq, ldsum$lambda[,1:5], type="l", lwd=3, main="weighted sum") par(opar)
# --------------------------------------------------------------------------- # Mean of 10 Persistence Landscapes from '2holes' data # --------------------------------------------------------------------------- ## Generate 10 Diagrams with 'gen2holes()' function list_rips = list() for (i in 1:10){ list_rips[[i]] = diagRips(gen2holes(n=100, sd=2)$data, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land = list() for (i in 1:10){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Compute Weighted Sum of Landscapes ldsum = fsmean(list_land) ## Visualize sam5 <- sort(sample(1:10, 5, replace=FALSE)) opar <- par(no.readonly=TRUE) par(mfrow=c(2,3), pty="s") for (i in 1:5){ tgt = list_land[[sam5[i]]] matplot(tgt$tseq, tgt$lambda[,1:5], type="l", lwd=3, main=paste("landscape no.",sam5[i])) } matplot(ldsum$tseq, ldsum$lambda[,1:5], type="l", lwd=3, main="weighted sum") par(opar)
Norm of a Single Functional SummaryGiven a functional summary , compute the
-norm.
fsnorm(fsobj, p = 2)
fsnorm(fsobj, p = 2)
fsobj |
a functional summary object. |
p |
an exponent in |
an -norm value.
## Generate Toy Data from 'gen2circles()' dat = gen2circles(n=100)$data ## Compute PD, Landscapes, and Silhouettes myPD = diagRips(dat, maxdim=1) myPL0 = diag2landscape(myPD, dimension=0) myPL1 = diag2landscape(myPD, dimension=1) myPS0 = diag2silhouette(myPD, dimension=0) myPS1 = diag2silhouette(myPD, dimension=1) ## Compute 2-norm fsnorm(myPL0, p=2) fsnorm(myPL1, p=2) fsnorm(myPS0, p=2) fsnorm(myPS1, p=2)
## Generate Toy Data from 'gen2circles()' dat = gen2circles(n=100)$data ## Compute PD, Landscapes, and Silhouettes myPD = diagRips(dat, maxdim=1) myPL0 = diag2landscape(myPD, dimension=0) myPL1 = diag2landscape(myPD, dimension=1) myPS0 = diag2silhouette(myPD, dimension=0) myPS1 = diag2silhouette(myPD, dimension=1) ## Compute 2-norm fsnorm(myPL0, p=2) fsnorm(myPL1, p=2) fsnorm(myPS0, p=2) fsnorm(myPS1, p=2)
Given functional summaries
,
perform spectral clustering proposed by Zelnik-Manor and Perona using a set of
data-driven bandwidth parameters.
fssc05Z(fslist, k = 2, nnbd = 5)
fssc05Z(fslist, k = 2, nnbd = 5)
fslist |
a length- |
k |
the number of cluster (default: 2). |
nnbd |
neighborhood size to define data-driven bandwidth parameter (default: 5). |
a length- vector of class labels (from
).
Zelnik-manor L, Perona P (2005). “Self-Tuning Spectral Clustering.” In Saul LK, Weiss Y, Bottou L (eds.), Advances in Neural Information Processing Systems 17, 1601–1608. MIT Press.
# --------------------------------------------------------------------------- # Spectral Clustering Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run Spectral Clustering using Different K's. label2 = fssc05Z(list_land0, k=2) label3 = fssc05Z(list_land0, k=3) label4 = fssc05Z(list_land0, k=4) truelab = rep(c(1,2,3), each=ndata) ## Run MDS & Visualization embed = fsmds(list_land0, ndim=2) opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") plot(embed, col=truelab, pch=19, main="true label") plot(embed, col=label2, pch=19, main="k=2 label") plot(embed, col=label3, pch=19, main="k=3 label") plot(embed, col=label4, pch=19, main="k=4 label") par(opar)
# --------------------------------------------------------------------------- # Spectral Clustering Clustering via Energy Distance # # We will cluster dim=0 under top-5 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land0 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Run Spectral Clustering using Different K's. label2 = fssc05Z(list_land0, k=2) label3 = fssc05Z(list_land0, k=3) label4 = fssc05Z(list_land0, k=4) truelab = rep(c(1,2,3), each=ndata) ## Run MDS & Visualization embed = fsmds(list_land0, ndim=2) opar = par(no.readonly=TRUE) par(mfrow=c(2,2), pty="s") plot(embed, col=truelab, pch=19, main="true label") plot(embed, col=label2, pch=19, main="k=2 label") plot(embed, col=label3, pch=19, main="k=3 label") plot(embed, col=label4, pch=19, main="k=4 label") par(opar)
Given multiple functional summaries ,
compute the weighted sum
with a specified vector of given weights .
fssum(fslist, weight = NULL)
fssum(fslist, weight = NULL)
fslist |
a length- |
weight |
a weight vector of length |
a functional summary object.
# --------------------------------------------------------------------------- # Weighted Average of 10 Persistence Landscapes from '2holes' data # --------------------------------------------------------------------------- ## Generate 10 Diagrams with 'gen2holes()' function list_rips = list() for (i in 1:10){ list_rips[[i]] = diagRips(gen2holes(n=100, sd=2)$data, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land = list() for (i in 1:10){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Some Random Weights wrand = abs(stats::rnorm(10)) wrand = wrand/sum(wrand) ## Compute Weighted Sum of Landscapes ldsum = fssum(list_land, weight=wrand) ## Visualize sam5 <- sort(sample(1:10, 5, replace=FALSE)) opar <- par(no.readonly=TRUE) par(mfrow=c(2,3), pty="s") for (i in 1:5){ tgt = list_land[[sam5[i]]] matplot(tgt$tseq, tgt$lambda[,1:5], type="l", lwd=3, main=paste("landscape no.",sam5[i])) } matplot(ldsum$tseq, ldsum$lambda[,1:5], type="l", lwd=3, main="weighted sum") par(opar)
# --------------------------------------------------------------------------- # Weighted Average of 10 Persistence Landscapes from '2holes' data # --------------------------------------------------------------------------- ## Generate 10 Diagrams with 'gen2holes()' function list_rips = list() for (i in 1:10){ list_rips[[i]] = diagRips(gen2holes(n=100, sd=2)$data, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions list_land = list() for (i in 1:10){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) } ## Some Random Weights wrand = abs(stats::rnorm(10)) wrand = wrand/sum(wrand) ## Compute Weighted Sum of Landscapes ldsum = fssum(list_land, weight=wrand) ## Visualize sam5 <- sort(sample(1:10, 5, replace=FALSE)) opar <- par(no.readonly=TRUE) par(mfrow=c(2,3), pty="s") for (i in 1:5){ tgt = list_land[[sam5[i]]] matplot(tgt$tseq, tgt$lambda[,1:5], type="l", lwd=3, main=paste("landscape no.",sam5[i])) } matplot(ldsum$tseq, ldsum$lambda[,1:5], type="l", lwd=3, main="weighted sum") par(opar)
Given functional summaries
,
t-SNE mimicks the pattern of probability distributions over pairs of Banach-valued
objects on low-dimensional target embedding space by minimizing Kullback-Leibler divergence.
fstsne(fslist, ndim = 2, ...)
fstsne(fslist, ndim = 2, ...)
fslist |
a length- |
ndim |
an integer-valued target dimension. |
... |
extra parameters for |
a named list containing
an matrix whose rows are embedded observations.
discrepancy between embedded and original distances as a measure of error.
# --------------------------------------------------------------------------- # Multidimensional Scaling for Multiple Landscapes and Silhouettes # # We will compare dim=0 with top-5 landscape and silhouette functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Landscape and Silhouettes of Dimension 0 list_land = list() list_sils = list() for (i in 1:(3*ndata)){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0) list_sils[[i]] = diag2silhouette(list_rips[[i]], dimension=0) } list_lab = rep(c(1,2,3), each=ndata) ## Run t-SNE and Classical/Metric MDS land_cmds = fsmds(list_land, method="classical") land_mmds = fsmds(list_land, method="metric") land_tsne = fstsne(list_land, perplexity=5)$embed sils_cmds = fsmds(list_sils, method="classical") sils_mmds = fsmds(list_sils, method="metric") sils_tsne = fstsne(list_land, perplexity=5)$embed ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,3)) plot(land_cmds, pch=19, col=list_lab, main="Landscape+CMDS") plot(land_mmds, pch=19, col=list_lab, main="Landscape+MMDS") plot(land_tsne, pch=19, col=list_lab, main="Landscape+tSNE") plot(sils_cmds, pch=19, col=list_lab, main="Silhouette+CMDS") plot(sils_mmds, pch=19, col=list_lab, main="Silhouette+MMDS") plot(sils_tsne, pch=19, col=list_lab, main="Silhouette+tSNE") par(opar)
# --------------------------------------------------------------------------- # Multidimensional Scaling for Multiple Landscapes and Silhouettes # # We will compare dim=0 with top-5 landscape and silhouette functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Landscape and Silhouettes of Dimension 0 list_land = list() list_sils = list() for (i in 1:(3*ndata)){ list_land[[i]] = diag2landscape(list_rips[[i]], dimension=0) list_sils[[i]] = diag2silhouette(list_rips[[i]], dimension=0) } list_lab = rep(c(1,2,3), each=ndata) ## Run t-SNE and Classical/Metric MDS land_cmds = fsmds(list_land, method="classical") land_mmds = fsmds(list_land, method="metric") land_tsne = fstsne(list_land, perplexity=5)$embed sils_cmds = fsmds(list_sils, method="classical") sils_mmds = fsmds(list_sils, method="metric") sils_tsne = fstsne(list_land, perplexity=5)$embed ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(2,3)) plot(land_cmds, pch=19, col=list_lab, main="Landscape+CMDS") plot(land_mmds, pch=19, col=list_lab, main="Landscape+MMDS") plot(land_tsne, pch=19, col=list_lab, main="Landscape+tSNE") plot(sils_cmds, pch=19, col=list_lab, main="Silhouette+CMDS") plot(sils_mmds, pch=19, col=list_lab, main="Silhouette+MMDS") plot(sils_tsne, pch=19, col=list_lab, main="Silhouette+tSNE") par(opar)
It generates data from two intersecting circles.
gen2circles(n = 496, sd = 0)
gen2circles(n = 496, sd = 0)
n |
the total number of observations to be generated. |
sd |
level of additive white noise. |
a list containing
an data matrix for row-stacked observations.
a length- vector for class label.
## Generate Data with Different Noise Levels nn = 200 x1 = gen2circles(n=nn, sd=0) x2 = gen2circles(n=nn, sd=0.1) x3 = gen2circles(n=nn, sd=0.25) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") plot(x1$data, pch=19, main="sd=0.00", col=x1$label) plot(x2$data, pch=19, main="sd=0.10", col=x2$label) plot(x3$data, pch=19, main="sd=0.25", col=x3$label) par(opar)
## Generate Data with Different Noise Levels nn = 200 x1 = gen2circles(n=nn, sd=0) x2 = gen2circles(n=nn, sd=0.1) x3 = gen2circles(n=nn, sd=0.25) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") plot(x1$data, pch=19, main="sd=0.00", col=x1$label) plot(x2$data, pch=19, main="sd=0.10", col=x2$label) plot(x3$data, pch=19, main="sd=0.25", col=x3$label) par(opar)
It generates data from two intertwine circles with empty interiors(holes).
gen2holes(n = 496, sd = 0)
gen2holes(n = 496, sd = 0)
n |
the total number of observations to be generated. |
sd |
level of additive white noise. |
a list containing
an data matrix for row-stacked observations.
a length- vector for class label.
## Generate Data with Different Noise Levels nn = 200 x1 = gen2holes(n=nn, sd=0) x2 = gen2holes(n=nn, sd=0.1) x3 = gen2holes(n=nn, sd=0.25) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") plot(x1$data, pch=19, main="sd=0.00", col=x1$label) plot(x2$data, pch=19, main="sd=0.10", col=x2$label) plot(x3$data, pch=19, main="sd=0.25", col=x3$label) par(opar)
## Generate Data with Different Noise Levels nn = 200 x1 = gen2holes(n=nn, sd=0) x2 = gen2holes(n=nn, sd=0.1) x3 = gen2holes(n=nn, sd=0.25) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,3), pty="s") plot(x1$data, pch=19, main="sd=0.00", col=x1$label) plot(x2$data, pch=19, main="sd=0.10", col=x2$label) plot(x3$data, pch=19, main="sd=0.25", col=x3$label) par(opar)
Given multiple persistence landscapes , compute
the persistence landscape kernel under the
sense.
plkernel(landlist)
plkernel(landlist)
landlist |
a length- |
an kernel matrix.
Jan Reininghaus, Stefan Huber, Ulrich Bauer, and Roland Kwitt (2015). “A stable multi-scale kernel for topological machine learning.” Proc. 2015 IEEE Conf. Comp. Vision & Pat. Rec. (CVPR ’15).
# --------------------------------------------------------------------------- # Persistence Landscape Kernel in Dimension 0 and 1 # # We will compare dim=0,1 with top-20 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions # We try to get distance in dimensions 0 and 1. list_land0 = list() list_land1 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) list_land1[[i]] = diag2landscape(list_rips[[i]], dimension=1, k=5) } ## Compute Persistence Landscape Kernel Matrix plk0 <- plkernel(list_land0) plk1 <- plkernel(list_land1) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(plk0[,(3*(ndata)):1], axes=FALSE, main="Kernel : dim=0") image(plk1[,(3*(ndata)):1], axes=FALSE, main="Kernel : dim=1") par(opar)
# --------------------------------------------------------------------------- # Persistence Landscape Kernel in Dimension 0 and 1 # # We will compare dim=0,1 with top-20 landscape functions with # - Class 1 : 'iris' dataset with noise # - Class 2 : samples from 'gen2holes()' # - Class 3 : samples from 'gen2circles()' # --------------------------------------------------------------------------- ## Generate Data and Diagram from VR Filtration ndata = 10 list_rips = list() for (i in 1:ndata){ dat1 = as.matrix(iris[,1:4]) + matrix(rnorm(150*4), ncol=4) dat2 = gen2holes(n=100, sd=1)$data dat3 = gen2circles(n=100, sd=1)$data list_rips[[i]] = diagRips(dat1, maxdim=1) list_rips[[i+ndata]] = diagRips(dat2, maxdim=1) list_rips[[i+(2*ndata)]] = diagRips(dat3, maxdim=1) } ## Compute Persistence Landscapes from Each Diagram with k=5 Functions # We try to get distance in dimensions 0 and 1. list_land0 = list() list_land1 = list() for (i in 1:(3*ndata)){ list_land0[[i]] = diag2landscape(list_rips[[i]], dimension=0, k=5) list_land1[[i]] = diag2landscape(list_rips[[i]], dimension=1, k=5) } ## Compute Persistence Landscape Kernel Matrix plk0 <- plkernel(list_land0) plk1 <- plkernel(list_land1) ## Visualize opar <- par(no.readonly=TRUE) par(mfrow=c(1,2), pty="s") image(plk0[,(3*(ndata)):1], axes=FALSE, main="Kernel : dim=0") image(plk1[,(3*(ndata)):1], axes=FALSE, main="Kernel : dim=1") par(opar)
Given a persistent homology of the data represented by a reconstructed
complex in S3 class homology
object, visualize it as either a barcode
or a persistence diagram using ggplot2.
## S3 method for class 'homology' plot(x, ...)
## S3 method for class 'homology' plot(x, ...)
x |
a |
... |
extra parameters including
|
a ggplot2 object.
# Use 'iris' data XX = as.matrix(iris[,1:4]) # Compute VR Diagram homology = diagRips(XX) # Plot with 'barcode' opar <- par(no.readonly=TRUE) plot(homology, method="barcode") par(opar)
# Use 'iris' data XX = as.matrix(iris[,1:4]) # Compute VR Diagram homology = diagRips(XX) # Plot with 'barcode' opar <- par(no.readonly=TRUE) plot(homology, method="barcode") par(opar)
Given a persistence landscape object in S3 class landscape
, visualize the
landscapes using ggplot2.
## S3 method for class 'landscape' plot(x, ...)
## S3 method for class 'landscape' plot(x, ...)
x |
a |
... |
extra parameters including
|
a ggplot2 object.
# Use 'iris' data XX = as.matrix(iris[,1:4]) # Compute Persistence diagram and landscape of order 0 homology = diagRips(XX) landscape = diag2landscape(homology, dimension=0) # Plot with 'barcode' opar <- par(no.readonly=TRUE) plot(landscape) par(opar)
# Use 'iris' data XX = as.matrix(iris[,1:4]) # Compute Persistence diagram and landscape of order 0 homology = diagRips(XX) landscape = diag2landscape(homology, dimension=0) # Plot with 'barcode' opar <- par(no.readonly=TRUE) plot(landscape) par(opar)