Introduction

Our package’s main purpose is to read, perform quality control, and normalize raw MBA data. Unfortunately, different devices and labs have different data formats. We gathered a few datasets on which our package could be tested. This document describes the datasets and their sources.

The majority of our datasets, available for the public are stored in the extdata folder of the package. The remaining ones - both private and the larger number of publicly available datasets are stored in the OneDrive folder, which is accessible to the package developers.

How to access the files

The simple way of accessing the files is to download them from our GitHub repository.

Another way is to source the files using the system.file function. The function returns the path to the file, which can be used to read the data. The function has the following syntax:

dataset_name <- "CovidOISExPONTENT.csv"

dataset_filepath <- system.file("extdata", dataset_name, package = "PvSTATEM", mustWork = TRUE)

The variable dataset_filepath now contains the path to the specified dataset on your computer. Since we know the filepath to the desired dataset, we can execute the read_data function to read the data. The function has the following syntax:

library(PvSTATEM)

plate <- read_luminex_data(dataset_filepath)
#> Reading Luminex data from: /home/runner/work/_temp/Library/PvSTATEM/extdata/CovidOISExPONTENT.csv
#> using format xPONENT
#> (WARNING)
#> Layout file not provided. Setting `use_layout_sample_names`,
#>       `use_layout_types` and `use_layout_dilutions` to FALSE.
#> (WARNING)
#> All dilutions in the plate are set to NA. Please check the dilutions in the layout file or sample names.
#> New plate object has been created with name: CovidOISExPONTENT!
#> 
plate
#> Plate with 96 samples and 30 analytes

Description of the datasets

Our datasets are divided into three main categories:

  • artificial - the ones created by us for the purpose of testing the package functionalities
  • public - the publicly available datasets, produced in the scope of PvSTATEM project or by the laboratories participating in the project.
  • external - the ones gathered from the public domain, external sources, independent from PvSTATEM project

Artificial datasets

In order to perform simple unit tests and validate the most basic reading functionalities of the package, we created a few artificial datasets. The datasets are stored in the extdata folder of the package. The datasets are:

  • random.csv - a simple dataset with random values used to test the basic functionalities of the package
  • random2.csv - another simple dataset with random values used to test the basic functionalities of the package. This file has a corresponding, artificial layout - random_layout.csv
  • random_broken_colB.csv - this dataset has a broken column, which should be detected by the package and reported as a warning

Public datasets

The datasets from this category are the most important for package development since the main purpose of the package is to make the preprocessing of the data easier in the scope of the PvSTATEM project.

The majority of them are stored in the package’s OneDrive folder. The datasets available in the extdata folder are two files coming from Covid oise examination:

  • CovidOISExPONTENT.csv, which is a IG4DC2~1.csv plate from examination IgG_CovidOise4_30plex. It contains the corresponding layout file CovidOISExPONTENT_layout.xlsx
  • CovidOISExPONTENT_CO.csv, which is a IGG_CO~1.csv plate from examination IgG_CovidOise2_30plex and corresponding layout file

Most of the examples and vignettes in the package are based on these datasets.

External datasets

To check the package functionalities on the data from different sources, we gathered a few datasets from the public domain. The datasets are also stored in the OneDrive folder of the package and in the subfolder external of the extdata directory. The datasets are:

  • Chul_IgG3_1.csv - GitHub repo RTSS_Kisumu_Schisto source

  • Chul_TotalIgG_2.csv - GitHub repo RTSS_Kisumu_Schisto source

  • pone.0187901.s001.csv - data shipped with drLumi package source

  • New_Batch_6_20160309_174224.csv - dataset posted on ReaserchGate source

  • New_Batch_14_20140513_082522.csv - dataset posted on ReaserchGate source