Package 'RESIDE' reference manual

Title:	Rapid Easy Synthesis to Inform Data Extraction
Description:	Developed to assist researchers with planning analysis, prior to obtaining data from Trusted Research Environments (TREs) also known as safe havens. With functionality to export and import marginal distributions as well as synthesise data, both with and without correlations from these marginal distributions. Using a multivariate cumulative distribution (COPULA). Additionally the International Stroke Trial (IST) is included as an example dataset under ODC-By licence Sandercock et al. (2011) <doi:10.7488/ds/104>, Sandercock et al. (2011) <doi:10.1186/1745-6215-12-101>.
Authors:	Ryan Field [aut, cre] , David McAllister [aut] , Claudia Geue [ctb]
Maintainer:	Ryan Field <[email protected]>
License:	GPL (>= 3)
Version:	0.3.3
Built:	2025-03-06 15:11:26 UTC
Source:	https://github.com/hehta/reside

RESIDE: Rapid Easy Synthesis to Inform Data Extraction

Description

logo

Developed to assist researchers with planning analysis, prior to obtaining data from Trusted Research Environments (TREs) also known as safe havens. With functionality to export and import marginal distributions as well as synthesise data, both with and without correlations from these marginal distributions. Using a multivariate cumulative distribution (COPULA). Additionally the International Stroke Trial (IST) is included as an example dataset under ODC-By licence Sandercock et al. (2011) doi:10.7488/ds/104, Sandercock et al. (2011) doi:10.1186/1745-6215-12-101.

Details

The RESIDE Package

This work was supported by the UKRI Strength in Places Fund (SIPF) Competition, #' project number 107140. The project title is SIPF The Living Laboratory driving economic growth in Glasgow through real world implementation of precision medicine.

Author(s)

Maintainer: Ryan Field [email protected] (ORCID)

Authors:

David McAllister [email protected] (ORCID)

Other contributors:

Claudia Geue [email protected] (ORCID) [contributor]

Export an empty correlation matrix

Description

A function to export a correlation matrix with the required variables as a csv file.

Usage

export_empty_cor_matrix(
  marginals,
  folder_path,
  file_name = "correlation_matrix.csv",
  create_folder = TRUE
)
export_empty_cor_matrix(
  marginals,
  folder_path,
  file_name = "correlation_matrix.csv",
  create_folder = TRUE
)

Arguments

`marginals`	The marginal distributions
`folder_path`	Folder to export to.
`file_name`	(optional) file name, Default: 'correlation_matrix.csv'
`create_folder`	Whether the folder should be created, Default: TRUE

Details

This function will export an empty correlation matrix as a csv file, it will contain all the necessary variables including dummy variables for factors. Dummy variables for factors may contain a missing category to represent missing data. Correlations should be added to the empty CSV and the imported using the import_marginal_distributions function. Correlations should be supplied using rank order correlations. The correlation matrix should be symmetric and positive semi definite.

Value

No return value, called for exportation of files.

Examples

## Not run: 
 marginals <- import_marginal_distributions()
 export_empty_cor_matrix(
   marginals,
   folder_path = tempdir()
  )

## End(Not run)
## Not run: 
 marginals <- import_marginal_distributions()
 export_empty_cor_matrix(
   marginals,
   folder_path = tempdir()
  )

## End(Not run)

Export Marginal Distributions

Description

Export the marginal distributions to CSV files

Usage

export_marginal_distributions(
  marginals,
  folder_path,
  create_folder = FALSE,
  force = FALSE
)
export_marginal_distributions(
  marginals,
  folder_path,
  create_folder = FALSE,
  force = FALSE
)

Arguments

`marginals`	an Object of type RESIDE from `import_cor_matrix`
`folder_path`	path to folder where to save files.
`create_folder`	if the folder does not exist should it be created, Default: FALSE
`force`	if the folder already contains marginal distribution files should they be removed, Default: FALSE

Details

Exports each of the marginal distributions to CSV files within a given folder, along with the continuous quantiles.

Value

No return value, called for exportation of files.

Examples


  marginal_distributions <- get_marginal_distributions(IST)
  export_marginal_distributions(
    marginal_distributions,
    folder_path = tempdir()
  )

marginal_distributions <- get_marginal_distributions(IST)
  export_marginal_distributions(
    marginal_distributions,
    folder_path = tempdir()
  )

Generate Marginal Distributions for a given data frame

Description

Generate Marginal Distributions from a given data frame with options to specify which variables to use.

Usage

get_marginal_distributions(df, variables = c(), print = FALSE)
get_marginal_distributions(df, variables = c(), print = FALSE)

Arguments

`df`	Data frame to get the marginal distributions from
`variables`	(Optional) variable (columns) to select, Default: c()
`print`	Whether to print the marginal distributions to the console, Default: FALSE

Details

A function to generate marginal distributions from a given data frame, depending on the variable type the marginals will differ, for binary variables a mean and number of missing is generated for continuous variables, they are first transformed and both mean and sd of the transformed variables are stored along with the quantile mapping for back transformation. For categorical variables, the number of each category is stored, missing values are categorise as "missing".

Value

A list of marginal distributions of an S3 RESIDE Class

Examples

marginal_distributions <- get_marginal_distributions(
  IST,
  variables <- c(
    "SEX",
    "AGE",
    "ID14",
    "RSBP",
    "RATRIAL"
  )
)
marginal_distributions <- get_marginal_distributions(
  IST,
  variables <- c(
    "SEX",
    "AGE",
    "ID14",
    "RSBP",
    "RATRIAL"
  )
)

Import a correlation matrix

Description

Imports a correlation matrix from a csv file generated by export_empty_cor_matrix

Usage

import_cor_matrix(file_path = "./correlation_matrix.csv")
import_cor_matrix(file_path = "./correlation_matrix.csv")

Arguments

file_path

A path to the csv file, Default: './correlation_matrix.csv'

Details

A function to import the user specified correlations generated from the csv file exported by the export_empty_cor_matrix function. Correlations should be entered into the CSV file, using rank order correlations. The correlation matrix should be symmetric and be positive semi definite.

Value

a matrix of correlations that can be used with synthesise_data

Examples

## Not run: 
  import_cor_matrix("correlation_matrix.csv")

## End(Not run)
## Not run: 
  import_cor_matrix("correlation_matrix.csv")

## End(Not run)

Import Marginal Distributions

Description

Import the marginal distribution as exported from a Trusted Research Environment (TRE)

Usage

import_marginal_distributions(
  folder_path = ".",
  binary_variables_file = "",
  categorical_variables_file = "",
  continuous_variables_file = "",
  continuous_quantiles_file = "",
  summary_file = "summary.csv"
)
import_marginal_distributions(
  folder_path = ".",
  binary_variables_file = "",
  categorical_variables_file = "",
  continuous_variables_file = "",
  continuous_quantiles_file = "",
  summary_file = "summary.csv"
)

Arguments

`folder_path`	Where the marginal distribution files are located, Default: '.' see details.
`binary_variables_file`	filename for the binary_variables file, Default: ” see details.
`categorical_variables_file`	filename for the categorical variables file , Default: ” see details.
`continuous_variables_file`	filename for the continuous variables file, Default: ” see details.
`continuous_quantiles_file`	filename for the continuous quantiles file, Default: ” see details.
`summary_file`	filename for the summary file, Default: 'summary.csv' see details.

Details

This function will import marginal distributions as generated within a Trusted Research Environment (TRE) using the function export_marginal_distributions. The folder_path allows the path of the files provided by the TRE to be imported, this will default to the current working directory. The file parameters will provide the default file names if no filenames are specified.

Value

Returns an object of a RESIDE class

Examples

## Not run: 
  marginals <- import_marginal_distributions()

## End(Not run)
## Not run: 
  marginals <- import_marginal_distributions()

## End(Not run)

IST Dataset

Description

The International Stroke Trial Dataset

Usage

IST
IST

Format

A data frame with 19435 rows and 112 columns:

AGE: Randomisation data: Age in years
CMPLASP: Other data and derived variables: Compliant for aspirin
CMPLHEP: Other data and derived variables: Compliant for heparin
CNTRYNUM: Other data and derived variables: Country code
COUNTRY: Other data and derived variables: Abbreviated country code
DALIVE: Recurrent stroke within 14 days: Discharged alive from hospital
DALIVED: Recurrent stroke within 14 days: Date Discharged alive from hospital
DAP: Data collected on 14 day/discharge form about treatments given in hospital: Non trial antiplatelet drug (Y/N)
DASP14: Data collected on 14 day/discharge form about treatments given in hospital: Aspirin given for 14 days or till death or discharge (Y/N)
DASPLT: Data collected on 14 day/discharge form about treatments given in hospital: Discharged on long term aspirin (Y/N)
DAYLOCAL: Randomisation data: Estimate of local day of week (assuming RDATE is Oxford)
DCAA: Data collected on 14 day/discharge form about treatments given in hospital: Calcium antagonists (Y/N)
DCAREND: Data collected on 14 day/discharge form about treatments given in hospital: Carotid surgery (Y/N)
DDEAD: Other events within 14 days: Dead on discharge form
DDEADC: Other events within 14 days: Cause of death (1-Initial stroke/2-Recurrent stroke (ischaemic or unknown /3-Recurrent stroke (haemorrhagic)/4-Pneumonia /5-Coronary heart disease/6-Pulmonary embolism /7-Other vascular or unknown/8-Non-vascular/0-unknown)
DDEADD: Date of dead on discharge form (yyyy/mm/dd); NOTE: this death is not necessarily within 14 days of randomisation
DDEADX: Other events within 14 days: Comment on death
DDIAGHA: Final diagnosis of initial event: Haemorrhagic stroke
DDIAGISC: Final diagnosis of initial event: Ischaemic stroke
DDIAGUN: Final diagnosis of initial event: Indeterminate stroke
DEAD1: Indicator variables for specific causes of death: Initial stroke
DEAD2: Indicator variables for specific causes of death: Reccurent ischaemic/unknown stroke
DEAD3: Indicator variables for specific causes of death: Reccurent haemorrhagic stroke
DEAD4: Indicator variables for specific causes of death: Pneumonia
DEAD5: Indicator variables for specific causes of death: Coronary heart disease
DEAD6: Indicator variables for specific causes of death: Pulmonary embolism
DEAD7: Indicator variables for specific causes of death: Other vascular or unknown
DEAD8: Indicator variables for specific causes of death: Non vascular
DGORM: Data collected on 14 day/discharge form about treatments given in hospital: Glycerol or manitol (Y/N)
DHAEMD: Data collected on 14 day/discharge form about treatments given in hospital: Haemodilution (Y/N)
DHH14: Data collected on 14 day/discharge form about treatments given in hospital: Medium dose heparin given for 14 days etc in pilot (combine with above)
DIED: Other data and derived variables: Indicator variable for death (1=died; 0=did not die)
DIVH: Data collected on 14 day/discharge form about treatments given in hospital: Non trial intravenous heparin (Y/N)
DLH14: Data collected on 14 day/discharge form about treatments given in hospital: Low dose heparin given for 14 days or till death/discharge (Y/N)
DMAJNCH: Data collected on 14 day/discharge form about treatments given in hospital: Major non-cerebral haemorrhage (Y/N)
DMAJNCHD: Data collected on 14 day/discharge form about treatments given in hospital: Date of Major non-cerebral haemorrhage (yyyy/mm/dd)
DMAJNCHX: Data collected on 14 day/discharge form about treatments given in hospital: Comment of Major non-cerebral haemorrhage
DMH14: Data collected on 14 day/discharge form about treatments given in hospital: Date of Major non-cerebral haemorrhage (yyyy/mm/dd)
DNOSTRK: Final diagnosis of initial event: Not a stroke
DNOSTRKX: Final diagnosis of initial event: Comment on Not a stroke
DOAC: Data collected on 14 day/discharge form about treatments given in hospital: Other anticoagulants (Y/N)
DPE: Other events within 14 days: Pulmonary embolism
DPED: Other events within 14 days: Date of Pulmonary embolism (yyyy/mm/dd)
DPLACE: Other events within 14 days: Discharge destination (A-Home /B-Relatives home /C-Residential care /D-Nursing home /E-Other hospital departments /U-Unknown)
DRSH: Recurrent stroke within 14 days: Haemorrhagic stroke
DRSHD: Recurrent stroke within 14 days: Date of Haemorrhagic stroke (yyyy/mm/dd)
DRSISC: Recurrent stroke within 14 days: Ischaemic recurrent stroke
DRSISCD: Recurrent stroke within 14 days: Date of Ischaemic recurrent stroke (yyyy/mm/dd)
DRSUNK: Recurrent stroke within 14 days: Unknown type
DRSUNKD: Recurrent stroke within 14 days: Date of Unknown type (yyyy/mm/dd)
DSCH: Data collected on 14 day/discharge form about treatments given in hospital: Non trial subcutaneous heparin (Y/N)
DSIDE: Data collected on 14 day/discharge form about treatments given in hospital: Other side effect (Y/N)
DSIDED: Data collected on 14 day/discharge form about treatments given in hospital: Date of Other side effect
DSIDEX: Data collected on 14 day/discharge form about treatments given in hospital: Comment of Other side effect
DSTER: Data collected on 14 day/discharge form about treatments given in hospital: Steroids (Y/N)
DTHROMB: Data collected on 14 day/discharge form about treatments given in hospital: Thrombolysis (Y/N)
DVT14: Indicator variables for specific causes of death: Indicator of deep vein thrombosis on discharge form
EXPD14: Other data and derived variables: Predicted probability of death at 14 days
EXPD6: Other data and derived variables: Predicted probability of death at 6 month
EXPDD: Other data and derived variables: Predicted probability of death/dependence at 6 month
FAP: Data collected at 6 months: On antiplatelet drugs
FDEAD: Data collected at 6 months: Dead at six month follow-up (Y/N)
FDEADC: Data collected at 6 months: Cause of death (1-Initial stroke /2-Recurrent stroke (ischaemic or unknown) /3-Recurrent stroke (haemorrhagic) /4-Pneumonia /5-Coronary heart disease /6-Pulmonary embolism /7-Other vascular or unknown /8-Non-vascular /0-unknown)
FDEADD: Data collected at 6 months: Date of death; NOTE: this death is not necessarily within 6 months of randomisation
FDEADX: Data collected at 6 months: Comment on death
FDENNIS: Data collected at 6 months: Dependent at 6 month follow-up (Y/N)
FLASTD: Data collected at 6 months: Date of last contact
FOAC: Data collected at 6 months: On anticoagulants
FPLACE: Data collected at 6 months: Place of residance at 6 month follow-up ( A-Home /B-Relatives home /C-Residential care /D-Nursing home /E-Other hospital departments /U-Unknown)
FRECOVER: Data collected at 6 months: Fully recovered at 6 month follow-up (Y/N)
FU1_COMP: Other data and derived variables: Date discharge form completed
FU1_RECD: Other data and derived variables: Date discharge form received
FU2_DONE: Other data and derived variables: Date 6 month follow-up done
H14: Indicator variables for specific causes of death: Cerebral bleed/heamorrhagic stroke within 14 days; this is slightly wider definition than DRSH an is used for analysis of cerebral bleeds
HOSPNUM: Randomisation data: Hospital number
HOURLOCAL: Randomisation data: Local time – hours
HTI14: Indicator variables for specific causes of death: Indicator of haemorrhagic transformation within 14 days
ID14: Other data and derived variables: Indicator of death at 14 days
ISC14: Indicator variables for specific causes of death: Indicator of ischaemic stroke within 14 days
MINLOCAL: Randomisation data: Local time – minutes
NCB14: Indicator variables for specific causes of death: Indicator of any non-cerebral bleed within 14 days
NCCODE: Other data and derived variables: Coding of compliance (see Table 3) doi:10.1186/1745-6215-13-24
NK14: Indicator variables for specific causes of death: Indicator of indeterminate stroke within 14 days
OCCODE: Other data and derived variables: Six month outcome ( 1-dead /2-dependent /3-not recovered /4-recovered /8 or 9 – missing status
ONDRUG: Data collected on 14 day/discharge form about treatments given in hospital: Estimate of time in days on trial treatment
PE14: Indicator variables for specific causes of death: Indicator of pulmonary embolism within 14 days
RASP3: Randomisation data: Aspirin within 3 days prior to randomisation (Y/N)
RATRIAL: Randomisation data: Atrial fibrillation (Y/N); not coded for pilot phase - 984 patients
RCONSC: Randomisation data: Conscious state at randomisation (F - fully alert, D - drowsy, U - unconscious)
RCT: Randomisation data: CT before randomisation (Y/N)
RDATE: Randomisation data: Date of randomisation
RDEF1: Randomisation data: Face deficit (Y/N/C=can't assess)
RDEF2: Randomisation data: Arm/hand deficit (Y/N/C=can't assess)
RDEF3: Randomisation data: Leg/foot deficit (Y/N/C=can't assess)
RDEF4: Randomisation data: Dysphasia (Y/N/C=can't assess)
RDEF5: Randomisation data: Hemianopia (Y/N/C=can't assess)
RDEF6: Randomisation data: Visuospatial disorder (Y/N/C=can't assess)
RDEF7: Randomisation data: Brainstem/cerebellar signs (Y/N/C=can't assess)
RDEF8: Randomisation data: Other deficit (Y/N/C=can't assess)
RDELAY: Randomisation data: Delay between stroke and randomisation in hours
RHEP24: Randomisation data: Heparin within 24 hours prior to randomisation (Y/N)
RSBP: Randomisation data: Systolic blood pressure at randomisation (mmHg)
RSLEEP: Randomisation data: Symptoms noted on waking (Y/N)
RVISINF: Randomisation data: Infarct visible on CT (Y/N)
RXASP: Randomisation data: Trial aspirin allocated (Y/N)
RXHEP: Randomisation data: Trial heparin allocated (M/L/N) \[M is coded as H=high in pilot\]
SET14D: Other data and derived variables: Know to be dead or alive at 14 days (1=Yes, 0=No); this does not necessarily mean that we know outcome at 6 monts – see OCCODE for this
SEX: Randomisation data: M=male; F=female
STRK14: Indicator variables for specific causes of death: Indicator of any stroke within 14 days
STYPE: Randomisation data: Stroke subtype (TACS/PACS/POCS/LACS/other)
TD: Other data and derived variables: Time of death or censoring in days
TRAN14: Indicator variables for specific causes of death: Indicator of major non-cerebral bleed within 14 days

...

Details

Obtained from Sandercock, Peter; Niewada, Maciej; Czlonkowska, Anna. (2011). International Stroke Trial database (version 2), [dataset]. University of Edinburgh. Department of Clinical Neurosciences. doi:10.7488/ds/104 Under ODC-by licence

Author(s)

Sandercock P et al. [email protected]

References

doi:10.7488/ds/104

print.RESIDE

Description

S3 override for print RESIDE

Usage

## S3 method for class 'RESIDE'
print(x, ...)
## S3 method for class 'RESIDE'
print(x, ...)

Arguments

`x`	an object of class RESIDE
`...`	Other parameters currently none are used

Details

S3 Override for RESIDE Class

Value

No return value, called to print to the terminal.

Examples

print(
  marginal_distributions <- get_marginal_distributions(
    IST,
    variables <- c(
      "SEX",
      "AGE",
      "ID14",
      "RSBP",
      "RATRIAL"
    )
  )
)
print(
  marginal_distributions <- get_marginal_distributions(
    IST,
    variables <- c(
      "SEX",
      "AGE",
      "ID14",
      "RSBP",
      "RATRIAL"
    )
  )
)

Synthesise data from marginal distributions

Description

Allows the synthesis of data from marginal distributions obtained from a Trusted Research Environment (TRE)

Usage

synthesise_data(marginals, correlation_matrix = NULL, ...)

synthesize_data(marginals, correlation_matrix = NULL, ...)
synthesise_data(marginals, correlation_matrix = NULL, ...)

synthesize_data(marginals, correlation_matrix = NULL, ...)

Arguments

`marginals`	an object of class RESIDE
`correlation_matrix`	Correlation Matrix see `export_empty_cor_matrix` and `import_cor_matrix`, Default: NULL
`...`	Additional parameters currently none are used.

Details

This function will synthesise a dataset from marginals imported using import_marginal_distributions. By default the dataset will not contain correlations, however user specified correlations can be added using the correlation_matrix parameter, see export_empty_cor_matrix and import_cor_matrix for more details.

Value

a data frame of simulated data

Examples

## Not run: 
   marginals <- import_marginal_distributions()
   df <- synthesise_data(marginals)

## End(Not run)
## Not run: 
   marginals <- import_marginal_distributions()
   df <- synthesise_data(marginals)

## End(Not run)

Package 'RESIDE'

Help Index

RESIDE: Rapid Easy Synthesis to Inform Data Extraction

Description

Details

Author(s)

See Also

Export an empty correlation matrix

Description

Usage

Arguments

Details

Value

See Also

Examples

Export Marginal Distributions

Description

Usage

Arguments

Details

Value

See Also

Examples

Generate Marginal Distributions for a given data frame

Description

Usage

Arguments

Details

Value

See Also

Examples

Import a correlation matrix

Description

Usage

Arguments

Details

Value

See Also

Examples

Import Marginal Distributions

Description

Usage

Arguments

Details

Value

See Also

Examples

IST Dataset

Description

Usage

Format

Details

Author(s)

References

print.RESIDE

Description

Usage

Arguments

Details

Value

Examples

Synthesise data from marginal distributions

Description

Usage

Arguments

Details

Value

See Also

Examples