Obtaining Marginal Distributions
Marginal distributions should first be obtained using the
get_marginal_distributions()
function.
To obtain the marginal distributions for all variables you should
only specify the dataset:
library(RESIDE)
marginals <- get_marginal_distributions(IST)
To obtain marginal distributions for select variables, you should
specify the variables using the variables parameter:
library(RESIDE)
marginals <- get_marginal_distributions(
IST,
variables = c(
"SEX",
"AGE",
"ID14",
"RSBP",
"RATRIAL",
"SET14D",
"DSIDED"
)
)
Printing the Marginal Distributions Prior to Export
Marginal distributions can be printed when generating marginal
distributions using the print parameter:
library(RESIDE)
marginals <- get_marginal_distributions(
IST,
print = TRUE
)
Or from a stored marginals object:
library(RESIDE)
marginals <- get_marginal_distributions(IST)
print(marginals)
Exporting Marginal Distributions
Marginal distributions can be exported using the
export_marginal_distributions()
function, specifying the
marginal distributions (generated by `get_marginal_distributions()’) and
a folder path:
library(RESIDE)
marginals <- get_marginal_distributions(IST)
export_marginal_distributions(
marginals,
folder_path = "/Users/ryan/marginals"
)
This folder should exist and not contain any previously exported
marginal distributions. You can create the folder automatically using
the create_folder parameter:
library(RESIDE)
marginals <- get_marginal_distributions(IST)
export_marginal_distributions(
marginals,
folder_path = "/Users/ryan/marginals",
create_folder = TRUE
)
Files created by export_marginal_distributions()
The following files will be created by the
export_marginal_distributions()
function:
- binary_variables.csv - Contains the marginal distributions
for binary variables including:
- Variable Name
- Mean
- Number of Missing Observations
- categorical_variables.csv Contains the marginal
distributions for categorical variables including:
- Variable Name & Category Name
- Number of Observations in Each Category
- NB Missing Observations are coded as a separate
category labelled missing.
- continuous_variables.csv - Contains the marginal
distributions for continuous variables including:
- Variable Name
- Transformed Mean
- Transformed Standard Deviation
- Number of Missing Observations
- Number of Decimal Places
- continuous_quantiles.csv - Contains the Quantile mapping to
allow for back transformation. For each continuous variable this
contains:
- The original quantile value
- The transformed quantile value
- An epsilon value to indicate the amount of thinning applied
- summary.csv - Contains and overall summary of the dataset
including:
- Number of Rows
- Number of Columns
- Variable Names (for validation)
These files should then be sent to the user.
NB If there are no variables of a certain type the
corresponding file will not be created.