Title: | Do a Git Style Diff of the Rows Between Two Dataframes with Similar Structure |
---|---|
Description: | Compares two dataframes which have the same column structure to show the rows that have changed. Also gives a git style diff format to quickly see what has changed in addition to summary statistics. |
Authors: | Alex Joseph [aut, cre] |
Maintainer: | Alex Joseph <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.3.5 |
Built: | 2024-11-16 03:03:55 UTC |
Source: | https://github.com/alexsanjoseph/comparedf |
Do a git style comparison between two data frames of similar columnar structure
compare_df( df_new, df_old, group_col, exclude = NULL, tolerance = 0, tolerance_type = "ratio", stop_on_error = TRUE, keep_unchanged_rows = FALSE, keep_unchanged_cols = TRUE, change_markers = c("+", "-", "="), round_output_to = 3 )
compare_df( df_new, df_old, group_col, exclude = NULL, tolerance = 0, tolerance_type = "ratio", stop_on_error = TRUE, keep_unchanged_rows = FALSE, keep_unchanged_cols = TRUE, change_markers = c("+", "-", "="), round_output_to = 3 )
df_new |
The data frame for which any changes will be shown as an addition (green) |
df_old |
The data frame for which any changes will be shown as a removal (red) |
group_col |
A character vector of a string of character vector showing the columns by which to group_by. |
exclude |
The columns which should be excluded from the comparison |
tolerance |
The amount in fraction to which changes are ignored while showing the visual representation. By default, the value is 0 and any change in the value of variables is shown off. Doesn't apply to categorical variables. |
tolerance_type |
Defaults to 'ratio'. The type of comparison for numeric values, can be 'ratio' or 'difference' |
stop_on_error |
Whether to stop on acceptable errors on not |
keep_unchanged_rows |
whether to preserve unchanged values or not. Defaults to |
keep_unchanged_cols |
whether to preserve unchanged values or not. Defaults to |
change_markers |
what the different change_type nomenclature should be eg: c("new", "old", "unchanged"). |
round_output_to |
Number of digits to round the output to. Defaults to 3. |
Currently 'html' and 'xlsx' are supported
create_output_table( comparison_output, output_type = "html", file_name = NULL, limit = 100, color_scheme = c(addition = "#52854C", removal = "#FC4E07", unchanged_cell = "#999999", unchanged_row = "#293352"), headers = NULL, change_col_name = "chng_type", group_col_name = "grp" )
create_output_table( comparison_output, output_type = "html", file_name = NULL, limit = 100, color_scheme = c(addition = "#52854C", removal = "#FC4E07", unchanged_cell = "#999999", unchanged_row = "#293352"), headers = NULL, change_col_name = "chng_type", group_col_name = "grp" )
comparison_output |
Output from the comparison Table functions |
output_type |
Type of comparison output. Defaults to 'html' |
file_name |
Where to write the output to. Default to NULL which output to the Rstudio viewer (not supported for 'xlsx') |
limit |
maximum number of rows to show in the diff. >1000 not recommended for HTML |
color_scheme |
What color scheme to use for the output. Should be a vector/list with
named_elements. Default - |
headers |
A character vector of column names to be used in the table. Defaults to |
change_col_name |
Name of the change column to use in the table. Defaults to |
group_col_name |
Name of the group column to be used in the table (if there are multiple grouping vars). Defaults to |
Easier to compare side-by-side
create_wide_output(comparison_output, suffix = c("_new", "_old"))
create_wide_output(comparison_output, suffix = c("_new", "_old"))
comparison_output |
Output from the comparison Table functions |
suffix |
Nomenclature for the new and old dataframe |
A manually created dataset showing the hypothetical scores of two divisions of students
Division The division to which the student belongs
Student Name of the Student
Maths, Physics, Chemistry, Art Scores of the student across different subjects
Discipline, PE Grades of the students across different subjects
results_2010
results_2010
A data frame 12 rows and 8 columns
A manually created dataset showing the hypothetical scores of two divisions of students
Division The division to which the student belongs
Student Name of the Student
Maths, Physics, Chemistry, Art Scores of the student across different subjects
Discipline, PE Grades of the students across different subjects
results_2011
results_2011
A data frame 13 rows and 8 columns
Some versions of Rstudio doesn't automatically show the html pane for the html output. This is a workaround
view_html(comparison_output)
view_html(comparison_output)
comparison_output |
output from the comparisonDF compare function |