1 Introduction

This report will focus on the reproducibility of six different articles, the first three of which are somewhat interrelated.

Unless otherwise specified (or followed by an error message) the results shown should be taken as identical to the ones in the paper. It should also be noted that somehow, all of the packages researched in this report turned out to be still maintained and available for the current (3.6.3) version of R.

2 The ade4 Package: Implementing the Duality Diagram for Ecologists

###################################################
### chunk number 5: ni1
###################################################
library(ade4)
apropos("dudi.")

##  [1] "dudi.acm"       "dudi.coa"       "dudi.dec"       "dudi.fca"      
##  [5] "dudi.fpca"      "dudi.hillsmith" "dudi.mix"       "dudi.nsc"      
##  [9] "dudi.pca"       "dudi.pco"       "dudi.type"

FULLY REP

###################################################
### chunk number 6: ni2
###################################################
args(as.dudi)

## function (df, col.w, row.w, scannf, nf, call, type, tol = 1e-07, 
##     full = FALSE) 
## NULL

FULLY REP

###################################################
### chunk number 7: ni3
###################################################
methods(class="dudi")

##  [1] [         bca       biplot    inertia   predict   print     scatter  
##  [8] screeplot summary   supcol    suprow    t         wca      
## see '?methods' for accessing help and source code

MOSTLY REP

Here is the first deviation from the results shown in the paper. The methods which should have been shown are as follows: “print.dudi”, “scatter.dudi” and “t.dudi”. It should be noted that these methods are still included in the package, although they’ve been shortened to “print”, “scatter” and “t”.

###################################################
### chunk number 8: ex1
###################################################
data(dunedata)
sapply(dunedata$envir,class)

## $A1
## [1] "numeric"
## 
## $moisture
## [1] "integer"
## 
## $manure
## [1] "integer"
## 
## $use
## [1] "ordered" "factor" 
## 
## $management
## [1] "factor"

FULLY REP

###################################################
### chunk number 9: ex2
###################################################
dunedata$envir$use <- factor(dunedata$envir$use,ordered=FALSE)
summary(dunedata$envir)

##        A1            moisture       manure           use    management
##  Min.   : 2.800   Min.   :1.0   Min.   :0.00   hayfield:7   BF:3      
##  1st Qu.: 3.500   1st Qu.:1.0   1st Qu.:0.00   both    :8   HF:5      
##  Median : 4.200   Median :2.0   Median :2.00   grazing :5   NM:6      
##  Mean   : 4.850   Mean   :2.9   Mean   :1.75                SF:6      
##  3rd Qu.: 5.725   3rd Qu.:5.0   3rd Qu.:3.00                          
##  Max.   :11.500   Max.   :5.0   Max.   :4.00

FULLY REP

###################################################
### chunk number 10: ex3
###################################################
dd1 <- dudi.hillsmith(dunedata$envir, scannf = FALSE,nf=2)
dd1

## Duality diagramm
## class: mix dudi
## $call: dudi.hillsmith(df = dunedata$envir, scannf = FALSE, nf = 2)
## 
## $nf: 2 axis-components saved
## $rank: 8
## eigen values: 2.542 1.858 1.231 0.9899 0.6927 ...
##   vector length mode    content       
## 1 $cw    10     numeric column weights
## 2 $lw    20     numeric row weights   
## 3 $eig   8      numeric eigen values  
## 
##   data.frame nrow ncol content             
## 1 $tab       20   10   modified array      
## 2 $li        20   2    row coordinates     
## 3 $l1        20   2    row normed scores   
## 4 $co        10   2    column coordinates  
## 5 $c1        10   2    column normed scores
## other elements: assign index cr center norm

MOSTLY REP

In this chunk results are nearly identical with the exception of the last row: “other elements: assign index cr center norm” instead of “other elements: assign index cr”, whatever may it mean.

###################################################
### chunk number 11: ex4
###################################################
scatter.dudi(dd1)

## Error in scatter.dudi(dd1): nie udało się znaleźć funkcji 'scatter.dudi'

And here we have the first error. As noted earlier, some methods’ names have been shortened, such as “scatter.dudi”. In the current package it’s just called “scatter()”.

scatter(dd1)

HAD TO CHANGE & STILL SOMEWHAT DIFFERENT

After invoking the correct name of the method the error message goes away. However, the graph drawn isn’t the same as the one in the paper. The barplots in the corner match, but sor some reason the background graph is a vertically flipped version of the one shown in the article.

The package has visibly undergone updates, both functional and aesthetic, since the 2007 version, thanks to the dedication and continuos maintenance provided by the selfless authours. However, overall it may be said that the paper is generally reproducible.

3 untb, an R Package For Simulating Ecological Drift Under the Unified Neutral Theory of Biodiversity

###################################################
### chunk number 3: SaundersSummary
###################################################
data(saunders)
summary(saunders.tot)

## Number of individuals: 8419 
## Number of species: 176 
## Number of singletons: 42 
## Most abundant species: Ventojassa (1046 individuals)
## estimated theta:  31.35894

FULLY REP

###################################################
### chunk number 4: prestonSaunders
###################################################
preston(saunders.tot,n=9)

##                   1   2  3-4  5-8  9-16  17-32  33-64  65-128  129-Inf
## number of species 42 14   21   22    13     20     15      11       18

FULLY REP

###################################################
### chunk number 5: prestonSaundersTemp
###################################################
jj.preston <- preston(saunders.tot)
jj.summary <- summary(saunders.tot)

FULLY REP

###################################################
### chunk number 6: calculate_uncertainty_Saunders
###################################################
n <- 10
J <- no.of.ind(saunders.tot)
unc <- list()
theta_hat <- optimal.theta(saunders.tot)
for(i in 1:n){
  unc[[i]] <- rand.neutral(J=J, theta=theta_hat)
}

FULLY REP

###################################################
### chunk number 7: plotSaunders
###################################################
plot(saunders.tot,uncertainty=FALSE)
for(i in 1:n){
  points(unc[[i]],type="l",col="grey")
}

NO REP

As can be surmised, the graph above differs greatly from the one in the article. It is absolutely unreadable. It is also the first divergence from the results shown in the paper.

###################################################
### chunk number 8: optimalThetaSaunders
###################################################
optimal.theta(saunders.tot)

## [1] 31.35894

FULLY REP

###################################################
### chunk number 9: supportTheta
###################################################
S <- no.of.spp(saunders.tot)
J <- no.of.ind(saunders.tot)
theta <- seq(from=25,to=39,len=55)
jj <- theta.likelihood(theta=theta,S=S,J=J,give.log=TRUE)
support <- jj-max(jj)
plot(theta,support,xlab=paste("Biodiversity parameter",expression(theta)),ylab="support")
abline(h= -2)

FULLY REP

###################################################
### chunk number 10: 
###################################################
load("bci1982.Rdata")

## Error in readChar(con, 5L, useBytes = TRUE): nie można otworzyć połączenia

NO REP

The link given in the paper to the dataset above leads to nothing. Brief google searches did not help with locating it. It might or might not be lost forever. Because of that most of the chunks below result in errors.

###################################################
### chunk number 11: optimal.params
###################################################
op.bci.1982 <- optimal.params(bci1982, l)

## Error in optimal.params(bci1982, l): nie znaleziono obiektu 'l'

NO REP

###################################################
### chunk number 12: flish2
###################################################
op.bci.1982

## Error in eval(expr, envir, enclos): nie znaleziono obiektu 'op.bci.1982'

NO REP

###################################################
### chunk number 13: estimateMandTheta
###################################################
load("mle.Rdata")

## Error in readChar(con, 5L, useBytes = TRUE): nie można otworzyć połączenia

plot(x100,log="xy",xlim=c(1,80),ylim=c(0.001,1),col="black",pch=1,
main=paste("Maximum likelihood estimates of ", expression(m), " and ",expression(theta))
     )

## Error in plot(x100, log = "xy", xlim = c(1, 80), ylim = c(0.001, 1), col = "black", : nie znaleziono obiektu 'x100'

points(x1000,col="red",pch=2)

## Error in points(x1000, col = "red", pch = 2): nie znaleziono obiektu 'x1000'

points(x10000,col="blue",pch=3)

## Error in points(x10000, col = "blue", pch = 3): nie znaleziono obiektu 'x10000'

points(50,0.01,pch=4,lwd=3,cex=2)

## Error in plot.xy(xy.coords(x, y), type = type, ...): plot.new has not been called yet

legend( "bottomleft", c("100","1000","10000"),col=c("black","red","blue"), pch=1:3, title="Local community size")

## Error in strwidth(legend, units = "user", cex = cex, font = text.font): plot.new has not been called yet

NO REP

###################################################
### chunk number 14: e.lowandhigh
###################################################
n <- 20
x <- expected.abundance(J=n, theta=3)
e.low  <- expected.abundance(J=n,theta=4)
e.high <- expected.abundance(J=n,theta=2)

FULLY REP

###################################################
### chunk number 15: expectedAbundance
###################################################
plot(x)
segments(x0=1:n,x1=1:n,y0=e.low,y1=e.high)

FULLY REP

The graph above does not use the missing dataset above and thus remains true to the one shown in the article.

###################################################
### chunk number 16: calculate_thirdRank
###################################################
rank3 <- table(replicate(1000,rand.neutral(J=20,theta=2)[3]))

FULLY REP

###################################################
### chunk number 17: plot_thirdRank
###################################################
plot(rank3,xlab="abundance of third ranked species",ylab="frequency")

FULLY REP

Same as the last one.

###################################################
### chunk number 18: calculate_species_table
###################################################
 {
set.seed(0);
a <- species.table(untb(start=rep(1,60),prob=0.002, gens=40000,keep=TRUE))
}

FULLY REP

###################################################
### chunk number 19: matplot_species_table
###################################################
matplot(a,type="l",lty=1,xlab="time (generation)",ylab="abundance")

FULLY REP

In contrast to the two graphs above, while this one still can be generated, it is vastly different from the result shown in the paper.

###################################################
### chunk number 20: SampleTenThousand
###################################################
set.seed(0)
rn <- rand.neutral(5e6, theta=50)
jj <- isolate(rn,size=10000)
a <- untb(start=jj, prob=0.01, D=10000, gens=1000, meta=rn)
a.logkda <- logkda(a)

NO REP

op <- optimal.params(a,log.kda=a.logkda)

## Error in optimal.params(a, log.kda = a.logkda): nie znaleziono obiektu 'a.logkda'

v.opt <- volkov(no.of.ind(a), op, bins=TRUE)

## Error in volkov(no.of.ind(a), op, bins = TRUE): nie znaleziono obiektu 'op'

v.true <- volkov(no.of.ind(a), c(100,0.01), bins=TRUE)

NO REP

###################################################
### chunk number 21: PlotSampleTenThousand
###################################################
pa <- preston(a,n=12)

## Error in preston(a, n = 12): nie znaleziono obiektu 'outnames'

pa.names <- sub(" ", "", names(pa))
jj <- plot(pa,ylim=c(0,27),axisnames=FALSE,
ylab="Number of species",xlab="Abundance class")
axis(1, at=jj, labels=FALSE, lty=0)
text(jj, par("usr")[3]-0.65, srt=90, cex=0.8, adj=1, labels=pa.names,xpd=TRUE)

points(jj, v.opt[1:12], type="b",col="red",pch=1)

## Error in xy.coords(x, y): nie znaleziono obiektu 'v.opt'

points(jj, v.true[1:12], type="b",col="blue",pch=4)
par(xpd=2)
legend("topright", c("best estimate","true"), pch=c(1,4), col=c("red","blue"), lty=c(1,1))

MOSTLY REP

The graph is almost identical, but because of problems with ‘optimal.params’ it lacks the ‘best estimate’ line.

###################################################
### chunk number 22: differentThetas
###################################################
set.seed(0)
f <- function(gens,p){
  display.untb(start=sample(as.census(untb(start=1:100,gens=gens,D=100,prob=p))),gens=0,main="",cex=1.7, asp=1)
}

g <- function(u="title", ...){
  par(srt=0)
  par(mai=c(0,0,0,0))
  plot.new()
  text(0.5,0.5,u,...)
}

h <- function(u="title", ...){
  par(mai=c(0,0,0,0))
  par(srt=90)
  plot.new()
  text(0.5,0.5,u, ...)
}

nf <- layout(matrix(
                    c(00,01,00,02,00,03,
                      04,05,00,06,00,07,
                      00,00,00,00,00,00,
                      08,09,00,10,00,11,
                      00,00,00,00,00,00,
                      12,13,00,14,00,15,
                      00,00,00,00,00,00,
                      16,17,00,18,00,19),8,6,byrow=TRUE),
             c(1,4, 1,4, 1,4),
             c(1,4, 1,4, 1,4, 1,4),
             TRUE)

g(expression(t==10))
g(expression(t==50))
g(expression(t==100))

h(expression(theta==0))
f(10,0)
f(50,0)
f(100,0)
h(expression(theta==0.1))
f(10,0.001)
f(50,0.001)
f(100,0.001)
h(expression(theta==1))
f(10,0.01)
f(50,0.01)
f(100,0.01)
h(expression(theta==10))
f(10,0.1)
f(50,0.1)
f(100,0.1)

MOSTLY REP

This graph, while seeming similar at first glance, is different. That might be because of the different seed - the method sample is used in generating this plot.

Like the package before, untb is also still maintained. The original 2007 article is a great deal less reproducible than the one describing ade4. Less than half of the graphs are possible to reproduce.

4 The bio.infer R Package: Maximum Likelihood Method for Predicting Environmental Conditions from Assemblage Composition

# Merge EMAP biological data with standardized taxonomy
library(bio.infer)
options(width = 60)
data(itis.ttable)
data(bcnt.emapw)
bcnt.tax <- get.taxonomic(bcnt.emapw, itis.ttable,
                          outputFile = "sum.tax.table.txt")

## Error in get.taxonomic(bcnt.emapw, itis.ttable, outputFile = "sum.tax.table.txt"): nieużywane argumenty (itis.ttable, outputFile = "sum.tax.table.txt")

HAD TO CHANGE STH

While the first chunk resulting in an error message might seem discouraging, it is relatively easy to resolve. As stated before, all of the packages are being maintained and have had numerous updates since 2007. The method ‘get.taxonomic’ has been simplified and currently it does not require the two last arguments.

bcnt.tax <- get.taxonomic(bcnt.emapw)

# Show excerpt from full taxonomic table
df1 <- read.delim("sum.tax.table.txt")
incvec <- df1[, "FAMILY"] == "EPHEMERIDAE"
incvec[is.na(incvec)] <- F
print(df1[incvec,c("ORDER", "FAMILY", "GENUS", "SPECIES", "TAXANAME")])

## Error in `[.data.frame`(df1, incvec, c("ORDER", "FAMILY", "GENUS", "SPECIES", : undefined columns selected

HAD TO CHANGE STH

Another error: this one due to the column “TAXANAME” being changed to “taxaname.orig”.

print(df1[incvec,c("ORDER", "FAMILY", "GENUS", "SPECIES", "taxaname.orig")])

## [1] ORDER         FAMILY        GENUS         SPECIES       taxaname.orig
## <0 rows> (or 0-length row.names)

# compute taxon-environment relationships for EMAP species data only
data(envdata.emapw)
coef <- taxon.env(form = ~STRMTEMP + STRMTEMP^2, bcnt = bcnt.tax, 
                  envdata = envdata.emapw, bcnt.siteid = "ID.NEW",
                  bcnt.abndid = "ABUND", env.siteid = "ID.NEW",
                  tlevs = "SPECIES", dumpdata = TRUE)

## Model formula:  resp ~STRMTEMP+I(STRMTEMP^2) 
## Minimum number of occurrences:  30 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for DESPAXIA.AUGUSTA 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for DRUNELLA.COLORADENSIS 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for MOSELIA.INFUSCATA 
## Number of taxa modeled:
##   TAXON.LEVEL NUM.MODS
## 1     SPECIES       59

MOSTLY REP

The results in the article are “64”, not “59”.

# Echo names of coefficient data
names(coef)

## [1] "tnames"   "csave"    "xvar"     "xlims"    "form"     "roc"      "raw.data"

FULLY REP

# View taxon-environment relationships
view.te(coef,plotform = "windows")

FULLY REP

# Plot histogram of area under ROC values
par(xaxs = "i", yaxs = "i", mar = c(4,4,1,1))
breaks <- seq(from =0.5,to = 1, by = 0.05)
hist(coef[["roc"]], col = "lightgray", breaks =breaks,
     xlab = "", ylab = "", main = "")
mtext("Area under ROC", side = 1, line = 2.3)
mtext("Number of taxa", side = 2, line = 2.3)

MOSTLY REP

Slightly different distribution than in the graph from the article.

# Estimate taxon-environment relationships for all taxa
coef <- taxon.env(form = ~STRMTEMP + STRMTEMP^2, bcnt = bcnt.tax, 
                  envdata = envdata.emapw, bcnt.siteid = "ID.NEW", 
                  bcnt.abndid = "ABUND", env.siteid = "ID.NEW",
                  tlevs = "all", dumpdata = FALSE)

## Model formula:  resp ~STRMTEMP+I(STRMTEMP^2) 
## Minimum number of occurrences:  30 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for PLEUROCERIDAE 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for DIPLOPERLINI 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for DESPAXIA 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for JUGA 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for MOSELIA 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for DESPAXIA.AUGUSTA 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for DRUNELLA.COLORADENSIS 
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred 
##  for MOSELIA.INFUSCATA 
## Number of taxa modeled:
##     TAXON.LEVEL NUM.MODS
## 1    SUBKINGDOM        0
## 2  INFRAKINGDOM        0
## 3   SUPERPHYLUM        2
## 4        PHYLUM        4
## 5     SUBPHYLUM        4
## 6   INFRAPHYLUM        0
## 7    SUPERCLASS        0
## 8         CLASS        8
## 9      SUBCLASS        5
## 10   INFRACLASS        1
## 11   SUPERORDER        6
## 12        ORDER       17
## 13     SUBORDER       14
## 14   INFRAORDER       13
## 15  SUPERFAMILY       24
## 16       FAMILY       84
## 17    SUBFAMILY       52
## 18        TRIBE       38
## 19     SUBTRIBE        1
## 20        GENUS      201
## 21      SPECIES       59

MOSTLY REP

Slightly different numbers than the ones in the article.

# Assign operational taxonomic units (OTU) to OR data
data(bcnt.OR)
bcnt.tax.OR <- get.taxonomic(bcnt.OR, itis.ttable)

## Error in get.taxonomic(bcnt.OR, itis.ttable): nieużywany argument (itis.ttable)

bcnt.otu.OR <- get.otu(bcnt.tax.OR, coef)

## Check OTU assignments in sum.otu.txt

HAD TO CHANGE STH

Another ‘get.taxonomic’ method requiring an update of the arguments.

bcnt.tax.OR <- get.taxonomic(bcnt.OR)

## The following taxa are not in ITIS:
## ALBERTATHYAS
## HYDRACARINA
## ORIBATEI
## PALAEGAPETUS
## RADOTANYPUS
## TURBELLARIA

bcnt.otu.OR <- get.otu(bcnt.tax.OR, coef)

## Check OTU assignments in sum.otu.txt

FULLY REP

# Compute inferences for temperature for one site in OR
# and plot likelihood curve
ss <- makess(bcnt.otu.OR)
inferences <- mlsolve(ss, coef, site.sel = "99046CSR", bruteforce = T)

print(inferences)

##        SVN STRMTEMP Inconsistent
## 1 99046CSR 15.86332        FALSE

MOSTLY REP

Slightly different numbers in STRMTEMP, 15.86332 instead of 15.6759.

# Compute inferences at all sites in OR
inferences <- mlsolve(ss, coef, site.sel = "all", bruteforce = F)

FULLY REP

# Compare inferences in OR to measured temperature
data(envdata.OR)
df1 <- merge(envdata.OR, inferences, by.x = "STRM.ID", by.y = "SVN")
par(mar=c(3.4,3.4,1,1), pty = "s")
lim0 <- range(c(df1$STRMTEMP, df1$temp), na.rm = T)
plot(df1$STRMTEMP, df1$temp, xlab = "", ylab = "", xlim = lim0, 
ylim = lim0, axes = F)
axis(1)
axis(2, las = 1)
box(bty = "l")
mtext("Inferred temperature", side = 1, line = 2.3)
mtext("Measured temperature", side = 2, line = 2.3)
abline(0,1, lty = "dashed")

sqdiff <- (df1$temp - df1$STRMTEMP)^2
n <- sum(! is.na(sqdiff))
rmserr <- sqrt(sum(sqdiff, na.rm = T)/n)

FULLY REP

# Examine pre-computed taxon-environment relationships
data(coef.west.wt)
view.te(coef.west.wt, plotform = "windows")

FULLY REP

# Compute inferences at a single site in OR using \
# pre-computed taxon-environment relationships
bcnt.otu.OR <- get.otu(bcnt.tax.OR, coef.west.wt)

## Check OTU assignments in sum.otu.txt

ss <- makess(bcnt.otu.OR)
inference <- mlsolve(ss, coef.west.wt, site.sel = "99046CSR", bruteforce = T)

print(inference)

##        SVN      sed STRMTEMP Inconsistent
## 1 99046CSR 16.59015 16.88423        FALSE

MOSTLY REP

Another set of slightly different numbers.

To conclude - most of the graphs are identical, with somewhat different (but still similar) numerical values. Much like both of the packages before this one, bio.infer is still being maintained, and hence some methods and datasets are diffent. The article isn’t fully reproducible, but it could be said that it is somewhat reproducible.

5 The pls Package: Principal Component and Partial Least Squares Regression in R

###################################################
library("pls")

FULLY REP

###################################################
data("yarn")
data("oliveoil")
data("gasoline")

FULLY REP

###################################################
matplot(t(gasoline$NIR), type = "l", lty = 1, ylab = "log(1/R)", xaxt = "n")
ind <- pretty(seq(from = 900, to = 1700, by = 2))
ind <- ind[ind >= 900 & ind <= 1700]
ind <- (ind - 898) / 2
axis(1, ind, colnames(gasoline$NIR)[ind])

FULLY REP

###################################################
gasTrain <- gasoline[1:50,]
gasTest <- gasoline[51:60,]

FULLY REP

###################################################
gas1 <- plsr(octane ~ NIR, ncomp = 10, data = gasTrain, validation = "LOO")

FULLY REP

###################################################
summary(gas1)

## Data:    X dimension: 50 401 
##  Y dimension: 50 1
## Fit method: kernelpls
## Number of components considered: 10
## 
## VALIDATION: RMSEP
## Cross-validated using 50 leave-one-out segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           1.545    1.357   0.2966   0.2524   0.2476   0.2398   0.2319
## adjCV        1.545    1.356   0.2947   0.2521   0.2478   0.2388   0.2313
##        7 comps  8 comps  9 comps  10 comps
## CV      0.2386   0.2316   0.2449    0.2673
## adjCV   0.2377   0.2308   0.2438    0.2657
## 
## TRAINING: % variance explained
##         1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
## X         78.17    85.58    93.41    96.06    96.94    97.89    98.38    98.85
## octane    29.39    96.85    97.89    98.26    98.86    98.96    99.09    99.16
##         9 comps  10 comps
## X         99.02     99.19
## octane    99.28     99.39

MOSTLY REP

Here is our first discrepancy with the article. In the last line, the last values are 99.2, 99.41 instead of 99.19 and 99.39.

###################################################
plot(RMSEP(gas1), legendpos = "topright")

FULLY REP

Both the graph above and 2 further graphs seem to be identical.

###################################################
plot(gas1, ncomp = 2, asp = 1, line = TRUE)

FULLY REP

###################################################
plot(gas1, plottype = "scores", comps = 1:3)

FULLY REP

###################################################
explvar(gas1)

##     Comp 1     Comp 2     Comp 3     Comp 4     Comp 5     Comp 6     Comp 7 
## 78.1707683  7.4122245  7.8241556  2.6577773  0.8768214  0.9466384  0.4921537 
##     Comp 8     Comp 9    Comp 10 
##  0.4723207  0.1688272  0.1693770

MOSTLY REP

Here, as before, the numbers vary in rounding. In the article there are 4 numbers after the decimal point when 7 is given here.

###################################################
plot(gas1, "loadings", comps = 1:2, legendpos = "topleft", labels = "numbers", xlab = "nm")
abline(h = 0)

FULLY REP

###################################################
predict(gas1, ncomp = 2, newdata = gasTest)

## , , 2 comps
## 
##      octane
## 51 87.94125
## 52 87.25242
## 53 88.15832
## 54 84.96913
## 55 85.15396
## 56 84.51415
## 57 87.56190
## 58 86.84622
## 59 89.18925
## 60 87.09116

MOSTLY REP

As above: here 5 decimal places, in article 2.

###################################################
RMSEP(gas1, newdata = gasTest)

## (Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps  
##      1.5369       1.1696       0.2445       0.2341       0.3287       0.2780  
##     6 comps      7 comps      8 comps      9 comps     10 comps  
##      0.2703       0.3301       0.3571       0.4090       0.6116

MOSTLY REP

Here we have another change: in reproduction, the value of the Intercept is 1.53369, when in the article it is 1.5114. The other values are identical. This may indicate that there is another seed in newer versions of R.

###################################################
dens1 <- plsr(density ~ NIR, ncomp = 5, data = yarn)

FULLY REP

###################################################
dim(oliveoil$sensory)

## [1] 16  6

plsr(sensory ~ chemical, data = oliveoil)

## Partial least squares regression , fitted with the kernel algorithm.
## Call:
## plsr(formula = sensory ~ chemical, data = oliveoil)

FULLY REP

###################################################
trainind <- which(yarn$train == TRUE)
dens2 <- update(dens1, subset = trainind)

FULLY REP

###################################################
dens3 <- update(dens1, ncomp = 10)

FULLY REP

###################################################
olive1 <- plsr(sensory ~ chemical, scale = TRUE, data = oliveoil)

FULLY REP

###################################################
gas2 <- plsr(octane ~ msc(NIR), ncomp = 10, data = gasTrain)

FULLY REP

###################################################
predict(gas2, ncomp = 3, newdata = gasTest)

## , , 3 comps
## 
##      octane
## 51 87.92446
## 52 87.26647
## 53 88.13582
## 54 84.76078
## 55 85.07921
## 56 84.55116
## 57 87.24373
## 58 86.71653
## 59 89.02589
## 60 86.98519

FULLY REP

###################################################
gas2.cv <- crossval(gas2, segments = 10)
plot(MSEP(gas2.cv), legendpos='topright')

summary(gas2.cv, what = "validation")

## Data:    X dimension: 50 401 
##  Y dimension: 50 1
## Fit method: kernelpls
## Number of components considered: 10
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           1.545    1.310   0.2851   0.2535   0.2398   0.2366   0.2406
## adjCV        1.545    1.308   0.2827   0.2523   0.2367   0.2330   0.2352
##        7 comps  8 comps  9 comps  10 comps
## CV      0.2400   0.2413   0.2508    0.2575
## adjCV   0.2348   0.2354   0.2430    0.2485

MOSTLY REP

Here the results differ by +/- a few thousandths. Again, this is the fault of another seed, because here is the croswalidation using random segments.

###################################################
plot(gas1, plottype = "coef", ncomp=1:3, legendpos = "bottomleft", labels = "numbers", xlab = "nm")

FULLY REP

###################################################
plot(gas1, plottype = "correlation")

FULLY REP

###################################################
predict(gas1, ncomp = 2:3, newdata = gasTest[1:5,])

## , , 2 comps
## 
##      octane
## 51 87.94125
## 52 87.25242
## 53 88.15832
## 54 84.96913
## 55 85.15396
## 
## , , 3 comps
## 
##      octane
## 51 87.94907
## 52 87.30484
## 53 88.21420
## 54 84.86945
## 55 85.24244

MOSTLY REP

Here again, rounding up to the fifth decimal number when the article is to the second.

###################################################
predict(gas1, comps = 2, newdata = gasTest[1:5,])

##      octane
## 51 87.53322
## 52 86.30322
## 53 87.35217
## 54 85.81561
## 55 85.31952

MOSTLY REP

Same as above.

###################################################
drop(predict(gas1, ncomp = 2:3, newdata = gasTest[1:5,]))

##     2 comps  3 comps
## 51 87.94125 87.94907
## 52 87.25242 87.30484
## 53 88.15832 88.21420
## 54 84.96913 84.86945
## 55 85.15396 85.24244

MOSTLY REP

Same as above.

###################################################
predplot(gas1, ncomp = 2, newdata = gasTest, asp = 1, line = TRUE)

FULLY REP

###################################################
pls.options()

## $mvralg
## [1] "kernelpls"
## 
## $plsralg
## [1] "kernelpls"
## 
## $cpplsalg
## [1] "cppls"
## 
## $pcralg
## [1] "svdpc"
## 
## $parallel
## NULL
## 
## $w.tol
## [1] 2.220446e-16
## 
## $X.tol
## [1] 1e-12

MOSTLY REP

Here you can see that the package has developed, because in the article only $mvralg, $plsralg and $pcralg are shown as output. In addition, $plsralg has the value of kernelpls.

###################################################
pls.options(plsralg = "oscorespls")

FULLY REP

###################################################
pls.options("plsralg")

## $plsralg
## [1] "oscorespls"

rm(.pls.Options)
pls.options("plsralg")

## $plsralg
## [1] "oscorespls"

MOSTLY REP

Here $plsralg should have the value of kernelpls.

###################################################
X <- gasTrain$NIR
Y <- gasTrain$octane
ncomp <- 5
cvPreds <- matrix(nrow = nrow(X), ncol = ncomp)
for (i in 1:nrow(X)) {
    fit <- simpls.fit(X[-i,], Y[-i], ncomp = ncomp, stripped = TRUE)
    cvPreds[i,] <- (X[i,] - fit$Xmeans) %*% drop(fit$coefficients) + fit$Ymeans
}

FULLY REP

###################################################
sqrt(colMeans((cvPreds - Y)^2))

## [1] 1.3569509 0.2966201 0.2524084 0.2475784 0.2397937

MOSTLY REP

Same as before here rounded up to the seventh number and not as in the article to the fourth.

The article is very reproducible. The results are practically the same, they differ only in the case of more accurate development and randomly selected objects, probably by another seed.

6 Empirical Mode Decomposition and Hilbert Spectrum

This article (Donghoh and Hee-Seok, 2009) deals with the package EMD used to decompose the signal to the so-called intrinsic mode function.

Link https://doi.org/10.32614/RJ-2009-002

ndata <- 3000
tt <- seq(0, 9, length=ndata)
xt <- sin(pi * tt)

FULLY REP

library(EMD)
extrema(xt)

## $minindex
##      [,1] [,2]
## [1,]  501  501
## [2,] 1167 1167
## [3,] 1834 1834
## [4,] 2500 2500
## 
## $maxindex
##      [,1] [,2]
## [1,]  168  168
## [2,]  834  834
## [3,] 1500 1501
## [4,] 2167 2167
## [5,] 2833 2833
## 
## $nextreme
## [1] 9
## 
## $cross
##       [,1] [,2]
##  [1,]    1    1
##  [2,]  334  335
##  [3,]  667  668
##  [4,] 1000 1001
##  [5,] 1333 1334
##  [6,] 1667 1668
##  [7,] 2000 2001
##  [8,] 2333 2334
##  [9,] 2666 2667
## 
## $ncross
## [1] 9

FULLY REP

There were no problems with reproducing this fragment.

ndata <- 3000
par(mfrow=c(1,1), mar=c(1,1,1,1))
tt2 <- seq(0, 9, length=ndata)
xt2 <- sin(pi * tt2) + sin(2* pi * tt2) + sin(6 * pi * tt2)  + 0.5 * tt2

FULLY REP

plot(tt2, xt2, xlab="", ylab="", type="l", axes=FALSE); box()
tryimf <- extractimf(xt2, tt2, check=TRUE)

MOSTLY REP

This part is not fully reproducible. We were unable to record all stages of the process. It was also different from the original.

par(mfrow=c(3,1), mar=c(2,1,2,1))
try <- emd(xt2, tt2, boundary="wave")

FULLY REP

It was a very simple task, which was completely reproduced.

par(mfrow=c(3,1), mar=c(2,1,2,1))
par(mfrow=c(try$nimf+1, 1), mar=c(2,1,2,1))
rangeimf <- range(try$imf)
for(i in 1:try$nimf){
  plot(tt2, try$imf[,i], type="l", xlab="", ylab="",
       ylim=rangeimf, main= paste(i, "-th IMF", sep="")); 
  abline(h=0)
}
plot(tt2, try$residue, xlab="", ylab="", main="residue", type="l")

FULLY REP

This part also gives the same results as in the original article.

tt <- seq(0, 0.1, length = 2001)[1:2000]
f1 <- 1776; f2 <- 1000
xt <- sin(2*pi*f1*tt) * (tt <= 0.033 | tt >= 0.067) + sin(2*pi*f2*tt)

FULLY REP

interm1 <- emd(xt, tt, boundary="wave", max.imf=2, plot.imf=FALSE)
par(mfrow=c(3, 1), mar=c(3,2,2,1))
plot(tt, xt, main="Signal", type="l")
rangeimf <- range(interm1$imf)
plot(tt, interm1$imf[,1], type="l", xlab="", ylab="", ylim=rangeimf, main="IMF 1")
plot(tt, interm1$imf[,2], type="l", xlab="", ylab="", ylim=rangeimf, main="IMF 2")

FULLY REP

The histogram has been correctly reproduced.

par(mfrow=c(1,1), mar=c(2,4,1,1))
tmpinterm <- extrema(interm1$imf[,1])
zerocross <- as.numeric(round(apply(tmpinterm$cross, 1, mean)))
hist(diff(tt[zerocross[seq(1, length(zerocross), by=2)]]), freq=FALSE, xlab="", main="")

FULLY REP

interm2 <- emd(xt, tt, boundary="wave", max.imf=2, plot.imf=FALSE, interm=0.0007)

FULLY REP

par(mfrow=c(2,1), mar=c(2,2,3,1), oma=c(0,0,0,0))
rangeimf <- range(interm2$imf)
plot(tt,interm2$imf[,1], type="l", main="IMF 1 after treating intermittence",
     xlab="", ylab="", ylim=rangeimf)
plot(tt,interm2$imf[,2], type="l", main="IMF 2 after treating intermittence",
     xlab="", ylab="", ylim=rangeimf)

FULLY REP

Both spectrograms have been reproduced without any problems, but are not very visible.

test1 <- hilbertspec(interm1$imf)
spectrogram(test1$amplitude[,1], test1$instantfreq[,1])

FULLY REP

test2 <- hilbertspec(interm2$imf, tt=tt)
spectrogram(test2$amplitude[,1], test2$instantfreq[,1])

FULLY REP

data(lena)
z <- lena[seq(1, 512, by=4), seq(1, 512, by=8)]

FULLY REP

This part takes the most time to process. That is why we decided to reduce the size of the photo even more than the authors of the article.

lenadecom <- emd2d(z, max.imf = 4)

FULLY REP

This part is fully reproducible.

imageEMD(z=z, emdz=lenadecom, extrema=TRUE, col=gray(0:100/100))

FULLY REP

7 AdMit: Adaptive Mixtures of Student-t Distributions

AdMit is a package described in the June 2009 issue of The R Journal. The latest version on CRAN was released on April 20, 2020, so it approaches the category still updated old packages.. The authors of this article did not include additional code sources or scripts, but the code that generates the results is given in the paragraphs in the text, as it would be written in the R console. This resulted in additional unnecessary copying and formatting.

# install.packages("AdMit")
library(AdMit)

FULLY REP

In version R 3.6.3, the installation was quick and uncomplicated.

set.seed(1234)

This seed was given by the author. Since it is widely known that the update of R 3.6.0 has changed the way the random generator works, you can already expect slightly different results.

GelmanMeng <- function(x, log = TRUE) {
    if (is.vector(x)) x <- matrix(x, nrow = 1)
    r <- -0.5 * ( 5 * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2 - 10 * x[,1] * x[,2] - 6 * x[,1] - 7 * x[,2] )
    if (!log) r <- exp(r)
    as.vector(r)
}

FULLY REP

The above code creates a kernel, which is later visualized on graphs, but the code generating it has not been attached

(outAdMit <- AdMit(KERNEL = GelmanMeng, mu0 = c(0, 0.1)))

## $CV
## [1] 7.8797364 1.7576829 0.9566853 0.8893195
## 
## $mit
## $mit$p
##       cmp1       cmp2       cmp3       cmp4 
## 0.64051741 0.07157414 0.20844640 0.07946205 
## 
## $mit$mu
##             k1        k2
## cmp1 0.3639585 3.2002046
## cmp2 3.7088254 0.3175160
## cmp3 1.3855024 1.1625991
## cmp4 2.4625620 0.6313581
## 
## $mit$Sigma
##            k1k1        k1k2        k2k1       k2k2
## cmp1 0.03902518 -0.15605612 -0.15605612 1.22561184
## cmp2 0.86080447 -0.08397973 -0.08397973 0.02252490
## cmp3 0.12716098 -0.06816106 -0.06816106 0.10222707
## cmp4 0.21956988 -0.01357107 -0.01357107 0.02490221
## 
## $mit$df
## [1] 1
## 
## 
## $summary
##   H METHOD.mu TIME.mu METHOD.p TIME.p        CV
## 1 1      BFGS    0.03     NONE   0.00 7.8797364
## 2 2      BFGS    0.01   NLMINB   0.01 1.7576829
## 3 3      BFGS    0.06   NLMINB   0.04 0.9566853
## 4 4      BFGS    0.08   NLMINB   0.05 0.8893195

(outAdMitIS <- AdMitIS(N = 1e5, KERNEL = GelmanMeng, mit = outAdMit$mit))

## $ghat
## [1] 0.9679769 2.2285986
## 
## $NSE
## [1] 0.003747675 0.005276410
## 
## $RNE
## [1] 0.6440028 0.6318012

MOSTLY REP

The results are usually a little different from the original ones, most likely due to a different random number generator. Furthermore, it can be seen that a different method of rounding numbers is used.

(outAdMitMH <- AdMitMH(N = 101000, KERNEL = GelmanMeng, mit = outAdMit$mit))

HAD TO CHANGE STH

This code in the markdown generates 50000 lines of results and proudly declares that it omitted another 50000, but in the R itself it generates in an accessible way.

For the purpose of this article you had to rewrite this line of code.

# rewritten upper chunk
head(outAdMitMH$draws, 7)

##          k1         k2
## 1 0.9095752 -2.2127290
## 2 0.6661026  1.1082125
## 3 2.2159885  0.7770256
## 4 0.2783000  3.3669236
## 5 0.2783000  3.3669236
## 6 0.1393632  4.0439795
## 7 1.1690606  0.8822938

outAdMitMH$accept

## [1] 0.5061188

MOSTLY REP

Here again the results are almost equal to the original.

library("coda")
draws <- as.mcmc(outAdMitMH$draws)
draws <- window(draws, start = 1001)
colnames(draws) <- c("X1", "X2")
summary(draws)$stat

##         Mean        SD    Naive SE Time-series SE
## X1 0.9601571 0.9430263 0.002982111    0.005108426
## X2 2.2440007 1.3290242 0.004202743    0.007185532

effectiveSize(draws) / niter(draws)

##        X1        X2 
## 0.3407794 0.3420960

MOSTLY REP

Here coda package was introduced to check the performance of AdMit. Luckily, it also installed itself and worked without complications, and the results almost look like the original ones.

In conclusion, the AdMit is an example of an still updated old package, the operation of which is a little different from the 2009 version, but generally speaking, still works as intended.

8 asympTest: A Simple R Package for Classical Parametric Statistical Tests and Confidence Intervals in Large Samples

data(DIGdata)
attach(DIGdata)
x <- na.omit(DIABP[TRTMT==0])
y <- na.omit(DIABP[TRTMT==1])
c(length(x),length(y))

## [1] 3400 3395

FULLY REP

var.test(DIABP ~ TRTMT, data = DIGdata, na.action = na.omit)

## 
##  F test to compare two variances
## 
## data:  DIABP by TRTMT
## F = 0.92954, num df = 3399, denom df = 3394, p-value = 0.03328
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.8690651 0.9942238
## sample estimates:
## ratio of variances 
##          0.9295411

MOSTLY REP - x/y changed to DIABP/TRTMT

asymp.test(DIABP ~ TRTMT, data = DIGdata, na.action = na.omit, parameter = "dVar")

## 
##  Two-sample asymptotic difference of variances test
## 
## data:  DIABP by TRTMT
## statistic = -1.5272, p-value = 0.1267
## alternative hypothesis: true difference of variances is not equal to 0
## 95 percent confidence interval:
##  -21.160491   2.626127
## sample estimates:
## difference of variances 
##               -9.267182

FULLY REP

n <- 1000
x <- runif(n, max = sqrt(12))
asymp.test(x, par = "var", alt = "gr", ref = 0.97)

## 
##  One-sample asymptotic variance test
## 
## data:  x
## statistic = 1.2399, p-value = 0.1075
## alternative hypothesis: true variance is greater than 0.97
## 95 percent confidence interval:
##  0.9587403       Inf
## sample estimates:
## variance 
## 1.004479

MOSTLY REP, mainly because of the differing seed.

chisq.stat <- (n-1)*var(x)/0.97
pchisq(chisq.stat, n-1, lower.tail = F)

## [1] 0.2118375

MOSTLY REP same as above

9 PMML: An Open Standard for Sharing Models

pmml is a package coming from June 2009 R Journal issue. It introduces an XML-based syntax for predictive models. Last update comes from 22nd April 2020.

library(XML)
library(pmml)
library(rpart)
data(iris)

my.rpart <- rpart(Species ~ ., data=iris)

my.rpart

## n= 150 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)  
##   2) Petal.Length< 2.45 50   0 setosa (1.00000000 0.00000000 0.00000000) *
##   3) Petal.Length>=2.45 100  50 versicolor (0.00000000 0.50000000 0.50000000)  
##     6) Petal.Width< 1.75 54   5 versicolor (0.00000000 0.90740741 0.09259259) *
##     7) Petal.Width>=1.75 46   1 virginica (0.00000000 0.02173913 0.97826087) *

FULLY REP

pmml(my.rpart)

I assume that it is FULLY REP, altough the full output can’t be seen in the original article.

library(kernlab)
audit <- read.csv(file("http://rattle.togaware.com/audit.csv"))

FULLY REP

myksvm <- ksvm(as.factor(TARGET_Adjusted) ~ ., data=audit[,c(2:10,13)], kernel = "rbfdot", prob.model=TRUE)

HAD TO CHANGE STH Column TARGET_Adjusted was originally called Adjusted.

pmml(myksvm, data=audit)

I assume that it is FULLY REP, altough the full output can’t be seen in the original article.

10 neuralnet: Training of Neural Networks

neuralnet is a package coming from December 2009 R Journal issue. It introduces a package for neural networks in R. Last update comes from 2nd February 2019.

library(neuralnet)

nn <- neuralnet(
    case~age+parity+induced+spontaneous,
    data=infert, hidden=2, err.fct="ce",
    linear.output=FALSE)
nn

MOSTLY REP It’s impossible to get the same output, because the way of displaying this object is somewhat different.

nn$result.matrix

##                                  [,1]
## error                    1.248993e+02
## reached.threshold        9.491906e-03
## steps                    2.086000e+03
## Intercept.to.1layhid1    5.572500e+00
## age.to.1layhid1         -1.145075e-01
## parity.to.1layhid1       1.697873e+00
## induced.to.1layhid1     -2.126889e+00
## spontaneous.to.1layhid1 -3.243860e+00
## Intercept.to.1layhid2   -7.108569e-01
## age.to.1layhid2         -1.647246e+00
## parity.to.1layhid2       6.813142e+00
## induced.to.1layhid2      9.585948e-01
## spontaneous.to.1layhid2 -2.017367e+02
## Intercept.to.case        3.657058e+00
## 1layhid1.to.case        -5.438585e+00
## 1layhid2.to.case        -4.250943e+01

MOSTLY REP Results may be randomly generated. What is more, rounding and displaying is different.

out <- cbind(nn$covariate, nn$net.result[[1]])
dimnames(out) <- list(NULL, c("age","parity","induced", "spontaneous","nn-output"))
head(out)

##      age parity induced spontaneous nn-output
## [1,]  26      6       1           2 0.1546483
## [2,]  42      1       1           0 0.6195402
## [3,]  39      6       2           0 0.1447027
## [4,]  34      4       2           0 0.1541379
## [5,]  35      3       1           1 0.3531119
## [6,]  36      4       2           1 0.4925770

MOSTLY REP Results may be randomly generated. What is more, rounding and displaying is different.

nn.bp <- neuralnet(
    case~age+parity+induced+spontaneous,
    data=infert, hidden=2, err.fct="ce",
    linear.output=FALSE,
    algorithm="backprop",
    learningrate=0.01)
nn.bp

MOSTLY REP It’s impossible to get the same output, because the way of displaying this object is somewhat different.

head(nn$generalized.weights[[1]])

##              [,1]         [,2]       [,3]       [,4]
## [1,] 0.0093511100 -0.138654607 0.17368961 0.26490563
## [2,] 0.1514230061 -2.245241193 2.81256481 4.28963046
## [3,] 0.0005445869 -0.008074744 0.01011496 0.01542906
## [4,] 0.0089167345 -0.132213856 0.16562142 0.25260030
## [5,] 0.1055518133 -1.565081062 1.96054301 2.99015509
## [6,] 0.1359830667 -2.016303803 2.52577992 3.85223567

MOSTLY REP Results may be randomly generated, so there may be differences.

plot(nn)

MOSTLY REP Results may be randomly generated, which can result in differences. Results can’t be displayed by Markdown, but in basic R it works just fine.

par(mfrow=c(2,2))
gwplot(nn,selected.covariate="age",
         min=-2.5, max=5)
gwplot(nn,selected.covariate="parity",
         min=-2.5, max=5)
gwplot(nn,selected.covariate="induced",
         min=-2.5, max=5)
gwplot(nn,selected.covariate="spontaneous",
         min=-2.5, max=5)

MOSTLY REP Results almost look the same, altough some values can be seen to be significantly different.

new.output <- compute(nn,
                      covariate=matrix(c(22,1,0,0,
                                         22,1,1,0,
                                         22,1,0,1,
                                         22,1,1,1),
                                       byrow=TRUE, ncol=4))
new.output$net.result

##           [,1]
## [1,] 0.1499576
## [2,] 0.1956141
## [3,] 0.3110241
## [4,] 0.8524786

MOSTLY REP Results may be randomly generated, so there may be differences.

ci <- confidence.interval(nn.new, alpha=0.05)
ci$lower.ci

NO REP Authors gave us code to use on described nn.new, but didn’t do the same with how to create this object, so this chunk is not reproducible.

11 mvtnorm: New Numerical Algorithm for Multivariate Normal Probabilities

library("mvtnorm")
m <- 3
S <- diag(m)
S[2, 1] <- 1 / 4
S[1, 2] <- 1 / 4
S[3, 1] <- 1 / 5
S[1, 3] <- 1 / 5
S[3, 2] <- 1 / 3
S[2, 3] <- 1 / 3
S <- as.matrix(S)

pmvnorm(lower = -c(1,4,2), upper = c(1,4,2), 
        mean=rep(0, m), sigma = S, algorithm = Miwa())

## [1] 0.6536804
## attr(,"error")
## [1] NA
## attr(,"msg")
## [1] "Normal Completion"

HAD TO CHANGE STH, authors probably made mistake here. The matrix was not created in a proper way. After index fixing there was no problem with reproducing the output.

12 tmvtnorm: A Package for the TruncatedMultivariate Normal Distribution

library(tmvtnorm)
mu <- c(0.5, 0.5)
sigma <- matrix(c(1, 0.8, 0.8, 2), 2, 2)
a <- c(-1, -Inf)
b <- c(0.5, 4)

FULLY REP

X <- rtmvnorm(n=10000, mean=mu, sigma=sigma, lower=a, upper=b, algorithm="rejection")
plot(X)

FULLY REP

alpha <- pmvnorm(lower=a, upper=b, mean=mu, sigma=sigma)

FULLY REP

X <- rtmvnorm(n=10000, mean=mu, sigma=sigma, lower=a, upper=b, algorithm="gibbs")
plot(X)

FULLY REP

X2 <- rtmvnorm(n=10000, mean=mu, sigma=sigma, lower=a, upper=b, algorithm="gibbs", burn.in.samples=100, thinning = 5)
plot(X2)

FULLY REP

x  <- seq(-1, 0.5, by=0.1)
fx <- dtmvnorm(x, mu, sigma, lower=a, upper=b, margin=1)
plot(fx)

FULLY REP

13 Party on! - A New, Conditional Variable-Importance Measure for Random Forests Available in the party Package

party is a package coming from December 2009 R Journal issue. It introduces a new method of partitioning in random forest based machine learning algorithms. Last update comes from 5th March 2020.

library(party)

set.seed(42)
readingSkills.cf <- cforest(score ~ ., data = readingSkills, control = cforest_unbiased(mtry = 2, ntree = 50))
set.seed(42)
varimp(readingSkills.cf)

## nativeSpeaker           age      shoeSize 
##      12.08613      76.53061      18.20340

MOSTLY REP

set.seed(42)
varimp(readingSkills.cf, conditional = TRUE)

## nativeSpeaker           age      shoeSize 
##     11.100889     43.374060      1.144268

MOSTLY REP

14 deSolve

deSolve and other -Solve packages (+ ReacTran) are packages coming from December 2010 R Journal issue. They introduce methods of solving differential equations in R.

vdpol <- function (t, y, mu) {list(c(y[2],mu * (1 - y[1]^2) * y[2] - y[1]))}

library(deSolve)

yini <- c(y1 = 2, y2 = 0)
stiff <- ode(y = yini, func = vdpol,times = 0:3000, parms = 1000)
nonstiff <- ode(y = yini, func = vdpol,times = seq(0, 30, by = 0.01),parms = 1)
head(stiff, n = 3)

##      time       y1            y2
## [1,]    0 2.000000  0.0000000000
## [2,]    1 1.999333 -0.0006670373
## [3,]    2 1.998666 -0.0006674088

FULLY REP

plot(stiff, type = "l", which = "y1",lwd = 2, ylab = "y",main = "IVP ODE, stiff")

FULLY REP

plot(nonstiff, type = "l", which = "y1",lwd = 2, ylab = "y",main = "IVP ODE, nonstiff")

FULLY REP

15 bvpSolve

Prob14 <- function(x, y, xi)  {list(c(y[2],1/xi * (y[1] - (xi*pi*pi+1) * cos(pi*x))))}

library(bvpSolve)

## Error in library(bvpSolve): there is no package called 'bvpSolve'

x <- seq(-1, 1, by = 0.01)
shoot <- bvpshoot(yini = c(0, NA),yend = c(0, NA), x = x, parms = 0.01,func = Prob14)

## Error in bvpshoot(yini = c(0, NA), yend = c(0, NA), x = x, parms = 0.01, : nie udało się znaleźć funkcji 'bvpshoot'

twp <- bvptwp(yini = c(0, NA), yend = c(0,NA), x = x, parms = 0.0025,func = Prob14)

## Error in bvptwp(yini = c(0, NA), yend = c(0, NA), x = x, parms = 0.0025, : nie udało się znaleźć funkcji 'bvptwp'

coll <- bvpcol(yini = c(0, NA),yend = c(0, NA), x = x, parms = 1e-04,func = Prob14)

## Error in bvpcol(yini = c(0, NA), yend = c(0, NA), x = x, parms = 1e-04, : nie udało się znaleźć funkcji 'bvpcol'

NO REP, package bvpSolve cannot be installed.

daefun<-function(t, y, dy, parms) {
  res1 <- - dy[1] - 0.04 * y[1] +1e4 * y[2] * y[3]
  res2 <- - dy[2] + 0.04 * y[1] -1e4 * y[2] * y[3] - 3e7 * y[2]^2
  res3 <- y[1] + y[2] + y[3] - 1
  list(c(res1, res2, res3),
       error = as.vector(y[1] + y[2] + y[3]) - 1)}
yini  <- c(y1 = 1, y2 = 0, y3 = 0)
dyini <- c(-0.04, 0.04, 0)
times <- 10 ^ seq(-6,6,0.1)

print(system.time(out <-daspk(y = yini,
                              dy = dyini,
                              times = times, res = daefun,
                              parms = NULL)))

##    user  system elapsed 
##    0.06    0.00    0.06

FULLY REP, time of reprodution is of course shorter.

plot(out, ylab = "conc.", xlab = "time",type = "l", lwd = 2, log = "x")
mtext("IVP DAE", side = 3, outer = TRUE,line = -1)

MOSTLY REP, the error plot is different, but the tendency is the same.

16 ReacTran

library(ReacTran)
Grid <- setup.grid.1D(N = 1000, L = 10)

pde1D <-function(t, C, parms)  {
  tran <- tran.1D(C = C, D = D,C.down = Cext, dx = Grid)$dC
  list(tran - Q) }

D   <- 1  
Q   <- 1  
Cext <- 20

FULLY REP, no errors while loading.

17 RootSolve

library(rootSolve)
print(system.time(std <- steady.1D(y = runif(Grid$N), func = pde1D, parms = NULL, nspec = 1)))

##    user  system elapsed 
##       0       0       0

FULLY REP, produced in no time.

plot (Grid$x.mid, std$y, type = "l",lwd = 2, main = "steady-state PDE", xlab = "x", ylab = "C", col = "red")

FULLY REP

analytical <- Q/2/D*(Grid$x.mid^2 - 10^2) + Cext
max(abs(analytical - std$y))

## [1] 1.250002e-05

FULLY REP

require(deSolve)
times <- seq(0, 100, by = 1)
system.time(out <- ode.1D(y = rep(1, Grid$N),times = times, func = pde1D,parms = NULL, nspec = 1))

##    user  system elapsed 
##    0.21    0.01    0.23

FULLY REP, it was of course reproduced faster.

tail(out[, 1:4], n = 3)

##        time         1         2         3
##  [99,]   98 -27.55783 -27.55773 -27.55754
## [100,]   99 -27.61735 -27.61725 -27.61706
## [101,]  100 -27.67542 -27.67532 -27.67513

FULLY REP

image(out, xlab = "time, days",ylab = "Distance, cm",main = "PDE", add.contour = TRUE)

MOSTLY REP, colours a little bit brighter and more intensive.

18 Conclusions

There has been a change of seed, which hinders accurate reproduction;
Sometimes the datasets are no longer available;
There may have been name changes.

Generally speaking, it is possible to reproduce with little effort and understanding of the code.

19 Session info

sessionInfo()

## R version 3.6.3 (2020-02-29)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18362)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
## [3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
## [5] LC_TIME=Polish_Poland.1250    
## 
## attached base packages:
##  [1] stats4    grid      tcltk     stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] ReacTran_1.4.3.1  shape_1.4.4       rootSolve_1.8.2.1 deSolve_1.28     
##  [5] party_1.3-4       strucchange_1.5-2 zoo_1.8-8         modeltools_0.2-23
##  [9] tmvtnorm_1.4-10   gmm_1.6-4         sandwich_2.5-1    Matrix_1.2-18    
## [13] neuralnet_1.44.2  kernlab_0.9-29    rpart_4.1-15      pmml_2.3.1       
## [17] XML_3.99-0.3      knitr_1.26        asympTest_0.1.4   coda_0.19-3      
## [21] AdMit_2.1.5       mvtnorm_1.1-0     EMD_1.5.8         locfit_1.5-9.4   
## [25] fields_10.3       maps_3.3.0        spam_2.5-1        dotCall64_1.0-0  
## [29] pls_2.7-2         bio.infer_1.3-3   untb_1.7-4        ade4_1.7-15      
## 
## loaded via a namespace (and not attached):
##  [1] Brobdingnag_1.2-6  xfun_0.11          coin_1.3-1         partitions_1.9-22 
##  [5] splines_3.6.3      lattice_0.20-38    htmltools_0.4.0    yaml_2.2.0        
##  [9] gmp_0.5-13.6       survival_3.1-8     rlang_0.4.6        matrixStats_0.56.0
## [13] multcomp_1.4-13    stringr_1.4.0      codetools_0.2-16   evaluate_0.14     
## [17] parallel_3.6.3     TH.data_1.0-10     Rcpp_1.0.3         polynom_1.4-0     
## [21] digest_0.6.23      stringi_1.4.3      tools_3.6.3        magrittr_1.5      
## [25] MASS_7.3-51.5      libcoin_1.0-5      rmarkdown_2.1      compiler_3.6.3

Methodology

Zuzanna Mróz, Aleksander Podsiad, Michał Wdowski

1 Introduction

2 The ade4 Package: Implementing the Duality Diagram for Ecologists

3 untb, an R Package For Simulating Ecological Drift Under the Unified Neutral Theory of Biodiversity

4 The bio.infer R Package: Maximum Likelihood Method for Predicting Environmental Conditions from Assemblage Composition

5 The pls Package: Principal Component and Partial Least Squares Regression in R

6 Empirical Mode Decomposition and Hilbert Spectrum

7 AdMit: Adaptive Mixtures of Student-t Distributions

8 asympTest: A Simple R Package for Classical Parametric Statistical Tests and Confidence Intervals in Large Samples

10 neuralnet: Training of Neural Networks

11 mvtnorm: New Numerical Algorithm for Multivariate Normal Probabilities

12 tmvtnorm: A Package for the TruncatedMultivariate Normal Distribution

13 Party on! - A New, Conditional Variable-Importance Measure for Random Forests Available in the party Package

14 deSolve

15 bvpSolve

16 ReacTran

17 RootSolve

18 Conclusions

19 Session info