library(eurobarometer)
library(dplyr)
library(knitr)
library(stringr)

The eurobarometer package relies on the survey class system of retroharmonize. You do not have to load the entire retroharmonize package - whatever is needed to make eurobarometer work is imported and modified as needed.

ZA6863 <- read_rds(
  system.file("examples", "ZA6863.rds", package = "eurobarometer")
)
#> Survey read:
#> id: ZA6863
#> filename: ZA6863.rds
#> doi: doi:10.4232/1.12847
ZA7576 <- read_rds(
  system.file("examples", "ZA7576.rds", package = "eurobarometer")
)
#> Survey read:
#> id: ZA7576
#> filename: ZA7576.rds
#> doi: doi:10.4232/1.13393

Metadata

The metadata analysis is a first step to help both variable name and value label harmonization.

ZA6863_metadata <- gesis_metadata_create(ZA6863)
ZA7576_metadata <- gesis_metadata_create(ZA7576)

Variables of base types numeric and character can be safely concatenated. The labelled, mainly categorical variables require special attention: their valid range and missing range must be harmonized before binding the two tables together.

ZA6863_items <- ZA6863_metadata %>%
  filter (
    class_orig %in% c("character", "numeric") |
    str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
  filter ( var_name_suggested != 'not_given' ) %>%
  pull (var_name_suggested)
ZA7576_items <- ZA7576_metadata %>%
  filter (
    class_orig %in% c("character", "numeric") |
    str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
    filter ( var_name_suggested != 'not_given' ) %>%
    pull (var_name_suggested)

In this case, the var_label_suggest() function worked perfectly, so we can approve the suggestions of gesis_metadata_create().

Let’s select the variables with identical names from the two surveys:

hZA6863 <- ZA6863 %>%
  stats::setNames ( nm = ZA6863_metadata$var_name_suggested ) %>%
  select ( all_of(intersect(ZA6863_items, ZA7576_items)))
hZA7576 <- ZA7576 %>%
  stats::setNames ( nm = ZA7576_metadata$var_name_suggested ) %>%
  select ( all_of(intersect(ZA6863_items, ZA7576_items)))

And have a look at their value labelling: [no idea why are this not identical.]

ZA6863_trust <- ZA6863_metadata %>%
  filter ( str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
  select ( labels, na_labels ) %>%
  tidyr::unnest( cols = c(labels, na_labels) ) %>%
  distinct_all()

ZA6863_trust$labels[1]
#> Tend to trust 
#>             1
ZA6863_trust$labels[2]
#> Tend not to trust 
#>                 2
ZA7576_trust <- ZA7576_metadata %>%
  filter ( str_sub(var_name_suggested, 1,5) == 'trust' ) %>%
  select ( labels, na_labels ) %>%
  tidyr::unnest( cols = c(labels, na_labels) ) %>%
  distinct_all()

ZA7576_trust$labels[1]
#> Tend to trust 
#>             1
ZA7576_trust$labels[2]
#> Tend not to trust 
#>                 2

Harmonize the value labels

The retroharmonize::harmonize_values() is a prototype of the harmonization function. It should be adjusted to survey and question-block specific idiosyncrasies. This should be the work of various vocabulary tables, but the prototype can be made work with inputting the harmonization regex either as a list or as a data frame.

Because we would like to have the same harmonization for a question block, in this case we adopt the prototype with a regex. The retroharmonize::harmonize_values() function will normalize the labels, so you do not have to deal with capitalization and upper case versions. If you want to understand better the harmonization procedure, please refer to the Harmonize Value Labels vignette of the retroharmonize package.

With a better imputing system, this could be automated to a high level, probably harmonizing all trend variables at the same time. The harmonize_eurobaromter should be something that deals with this.

harmonize_trust <- function(x) {
   retroharmonize::harmonize_values(
  x = x,
  harmonize_label = NULL,
  harmonize_labels = (
    list (
     from = c("^tend\\sto|^trust", "^tend\\snot|not\\strust", "^dk|^don", "^inap"),
    to = c("trust", "not_trust", "do_not_know", "inap"),
    numeric_values = c(1,0,99997, 99999))
    ),
  na_values = c(do_not_know = 99997, declined = 99998, inap = 99999),
  na_range = NULL,
  id = "survey_id",
  name_orig = NULL)
}

Choosing the first trust vector, we can see that the harmonization records all metadata for reproducibility.

harmonize_trust (hZA6863$trust_army)
#>  [1]     1     0 99997     0 99997     0     1     1     1     1 99997     1
#> [13]     1     0     0     0     0     1     0     1     1     1     0     1
#> [25]     1     0     1     1     1     1     0     1     0     0     1 99999
#> [37] 99999 99999 99999 99999     1     0     1     0     0     1 99997 99997
#> [49]     0     1
#> attr(,"labels")
#>   not_trust       trust do_not_know        inap 
#>           0           1       99997       99999 
#> attr(,"label")
#> [1] "TRUST IN INSTITUTIONS: ARMY"
#> attr(,"na_values")
#> [1] 99997 99999
#> attr(,"class")
#> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss"                
#> [3] "haven_labelled"                     
#> attr(,"id")
#> [1] "survey_id"
#> attr(,"survey_id_name")
#> [1] "x"
#> attr(,"survey_id_values")
#>     2     1     3     9 
#>     0     1 99997 99999 
#> attr(,"survey_id_label")
#> [1] "TRUST IN INSTITUTIONS: ARMY"
#> attr(,"survey_id_labels")
#>              Tend to trust          Tend not to trust 
#>                          1                          2 
#>                         DK Inap. (CY-TCC in isocntry) 
#>                          3                          9 
#> attr(,"survey_id_na_values")
#> [1] 9

The coding appears very similar, so we use the same helper function for the same question in the other survey:

harmonize_trust (hZA7576$trust_army)
#>  [1]     1     1     1     1     1     1     1     0     0     1     1 99997
#> [13]     0     1     1     1     0     0 99997     1 99997     1     0     0
#> [25]     0     1     0     1     0     0     1     0     0     0     1     0
#> [37]     0     1     0     1     0 99999 99999 99999 99999
#> attr(,"labels")
#>   not_trust       trust do_not_know        inap 
#>           0           1       99997       99999 
#> attr(,"label")
#> [1] "TRUST IN INSTITUTIONS: ARMY"
#> attr(,"na_values")
#> [1] 99997 99999
#> attr(,"class")
#> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss"                
#> [3] "haven_labelled"                     
#> attr(,"id")
#> [1] "survey_id"
#> attr(,"survey_id_name")
#> [1] "x"
#> attr(,"survey_id_values")
#>     2     1     3     9 
#>     0     1 99997 99999 
#> attr(,"survey_id_label")
#> [1] "TRUST IN INSTITUTIONS: ARMY"
#> attr(,"survey_id_labels")
#>              Tend to trust          Tend not to trust 
#>                          1                          2 
#>                         DK Inap. (CY-TCC in isocntry) 
#>                          3                          9 
#> attr(,"survey_id_na_values")
#> [1] 9
trust_in_army <- retroharmonize::concatenate(
  x = harmonize_trust ( hZA6863$trust_army),
  y = harmonize_trust ( hZA7576$trust_army)
  )
trust_in_army
#>  [1]     1     0 99997     0 99997     0     1     1     1     1 99997     1
#> [13]     1     0     0     0     0     1     0     1     1     1     0     1
#> [25]     1     0     1     1     1     1     0     1     0     0     1 99999
#> [37] 99999 99999 99999 99999     1     0     1     0     0     1 99997 99997
#> [49]     0     1     1     1     1     1     1     1     1     0     0     1
#> [61]     1 99997     0     1     1     1     0     0 99997     1 99997     1
#> [73]     0     0     0     1     0     1     0     0     1     0     0     0
#> [85]     1     0     0     1     0     1     0 99999 99999 99999 99999
#> attr(,"id")
#> [1] "survey_id"
#> attr(,"labels")
#>   not_trust       trust do_not_know        inap 
#>           0           1       99997       99999 
#> attr(,"label")
#> [1] "TRUST IN INSTITUTIONS: ARMY"
#> attr(,"na_values")
#> [1] 99997 99999
#> attr(,"class")
#> [1] "retroharmonize_labelled_spss_survey" "haven_labelled_spss"                
#> [3] "haven_labelled"                     
#> attr(,"survey_id_name")
#> [1] "x"
#> attr(,"survey_id_values")
#>     2     1     3     9 
#>     0     1 99997 99999 
#> attr(,"survey_id_label")
#> [1] "TRUST IN INSTITUTIONS: ARMY"
#> attr(,"survey_id_labels")
#>              Tend to trust          Tend not to trust 
#>                          1                          2 
#>                         DK Inap. (CY-TCC in isocntry) 
#>                          3                          9 
#> attr(,"survey_id_na_values")
#> [1] 9

The attributes are complex, because they leave open reverting to historical coding, and for a choice of categorical or numeric representation in R.

summary ( as_factor(trust_in_army))
#>   not_trust       trust do_not_know        inap 
#>          35          43           8           9
summary ( as_numeric(trust_in_army))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>  0.0000  0.0000  1.0000  0.5513  1.0000  1.0000      17

Let’s repeat the same harmonization for all trust variables.

hZA7576 <- hZA7576 %>%
  mutate_at (vars (starts_with("trust")), harmonize_trust )
hZA6863 <- hZA6863 %>%
  mutate_at (vars (starts_with("trust")), harmonize_trust )

hZA6863 %>%
  select ( all_of(c("trust_army", "trust_european_union")))
#> # A tibble: 50 x 2
#>                  trust_army     trust_european_union
#>                <retroh_dbl>             <retroh_dbl>
#>  1     1 [trust]                1 [trust]           
#>  2     0 [not_trust]            1 [trust]           
#>  3 99997 (NA) [do_not_know] 99997 (NA) [do_not_know]
#>  4     0 [not_trust]            1 [trust]           
#>  5 99997 (NA) [do_not_know]     1 [trust]           
#>  6     0 [not_trust]            0 [not_trust]       
#>  7     1 [trust]            99997 (NA) [do_not_know]
#>  8     1 [trust]                1 [trust]           
#>  9     1 [trust]                0 [not_trust]       
#> 10     1 [trust]                1 [trust]           
#> # ... with 40 more rows

Given that the other selected variables have identical (harmonized) names and they are of base type numeric or character, after harmonizing the trust labels and na_values, we can bind the two panels with vectrs::vec_rbind() or dplyr::bind_rows(). Unfortunately, the generic c() method cannot be implemented to work with this type.

panel <- vctrs::vec_rbind (
  hZA6863, hZA6863
)
#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

#> Warning in x_attr_names == paste0(x_id, "_name"): longer object length is not a
#> multiple of shorter object length

The panel is created, and it is open for exporting to other statistical software, or further analysis in R. While some basic arithmetic methods are implemented for the labelled_spss_survey class of the retroharmonize package, for using all R statistical packages, the analyst has to chose a base R type that is compatible with them. Since the trust variables are categorical variables, they can be re-casted with the as_factor() or as_numeric() methods. Again, the base R as.factor() or as.numeric() will give a legible, but not correct representation.

The factor representation presents the user-defined missing values as categories:

panel %>%
  mutate_at (vars (starts_with("trust")), as_factor ) %>%
  summary()
#>      doi            gesis_archive_version_and_date     uniqid        
#>  Length:100         Length:100                     Min.   :1.10e+08  
#>  Class :character   Class :character               1st Qu.:3.20e+08  
#>  Mode  :character   Mode  :character               Median :6.30e+08  
#>                                                    Mean   :6.12e+08  
#>                                                    3rd Qu.:1.00e+09  
#>                                                    Max.   :1.00e+09  
#>  country_code_iso_3166       trust_army  trust_european_union
#>  Length:100            not_trust  :34   not_trust  :38       
#>  Class :character      trust      :46   trust      :40       
#>  Mode  :character      do_not_know:10   do_not_know:12       
#>                        inap       :10   inap       :10       
#>                                                              
#>                                                              
#>  trust_european_union_tcc  trust_justice_system trust_national_government
#>  not_trust: 4             not_trust  :42        not_trust  :44           
#>  trust    : 6             trust      :46        trust      :34           
#>  inap     :90             do_not_know: 2        do_not_know:12           
#>                           inap       :10        inap       :10           
#>                                                                          
#>                                                                          
#>  trust_national_parliament      trust_police trust_political_parties
#>  not_trust  :40            not_trust  :30    not_trust  :56         
#>  trust      :36            trust      :56    trust      :24         
#>  do_not_know:14            do_not_know: 4    do_not_know:10         
#>  inap       :10            inap       :10    inap       :10         
#>                                                                     
#>                                                                     
#>  trust_political_parties_tcc trust_public_administration
#>  not_trust:10                not_trust  :38             
#>  inap     :90                trust      :44             
#>                              do_not_know: 8             
#>                              inap       :10             
#>                                                         
#>                                                         
#>  trust_regional_local_authorities  trust_united_nations
#>  not_trust  :38                   not_trust  :38       
#>  trust      :40                   trust      :38       
#>  do_not_know:12                   do_not_know:14       
#>  inap       :10                   inap       :10       
#>                                                        
#>                                                        
#>  trust_united_nations_tcc weight_result_from_target_redressment
#>  not_trust: 4             Min.   :0.4376                       
#>  trust    : 6             1st Qu.:0.7858                       
#>  inap     :90             Median :1.0876                       
#>                           Mean   :1.1537                       
#>                           3rd Qu.:1.4168                       
#>                           Max.   :2.5878                       
#>  weight_germany   weight_extrapolated_population_aged_gt_15
#>  Min.   :0.0000   Min.   :  203.8                          
#>  1st Qu.:0.0000   1st Qu.: 1506.2                          
#>  Median :0.0000   Median : 5816.5                          
#>  Mean   :0.3344   Mean   :18774.5                          
#>  3rd Qu.:0.5392   3rd Qu.:24301.3                          
#>  Max.   :2.1909   Max.   :95773.7

And let’s compare this with the numeric representation, where the user-defined missing values are treated as missing:

panel %>%
  mutate_at (vars (starts_with("trust")), as_numeric ) %>%
  summary()
#>      doi            gesis_archive_version_and_date     uniqid        
#>  Length:100         Length:100                     Min.   :1.10e+08  
#>  Class :character   Class :character               1st Qu.:3.20e+08  
#>  Mode  :character   Mode  :character               Median :6.30e+08  
#>                                                    Mean   :6.12e+08  
#>                                                    3rd Qu.:1.00e+09  
#>                                                    Max.   :1.00e+09  
#>                                                                      
#>  country_code_iso_3166   trust_army    trust_european_union
#>  Length:100            Min.   :0.000   Min.   :0.0000      
#>  Class :character      1st Qu.:0.000   1st Qu.:0.0000      
#>  Mode  :character      Median :1.000   Median :1.0000      
#>                        Mean   :0.575   Mean   :0.5128      
#>                        3rd Qu.:1.000   3rd Qu.:1.0000      
#>                        Max.   :1.000   Max.   :1.0000      
#>                        NA's   :20      NA's   :22          
#>  trust_european_union_tcc trust_justice_system trust_national_government
#>  Min.   :0.0              Min.   :0.0000       Min.   :0.0000           
#>  1st Qu.:0.0              1st Qu.:0.0000       1st Qu.:0.0000           
#>  Median :1.0              Median :1.0000       Median :0.0000           
#>  Mean   :0.6              Mean   :0.5227       Mean   :0.4359           
#>  3rd Qu.:1.0              3rd Qu.:1.0000       3rd Qu.:1.0000           
#>  Max.   :1.0              Max.   :1.0000       Max.   :1.0000           
#>  NA's   :90               NA's   :12           NA's   :22               
#>  trust_national_parliament  trust_police    trust_political_parties
#>  Min.   :0.0000            Min.   :0.0000   Min.   :0.0            
#>  1st Qu.:0.0000            1st Qu.:0.0000   1st Qu.:0.0            
#>  Median :0.0000            Median :1.0000   Median :0.0            
#>  Mean   :0.4737            Mean   :0.6512   Mean   :0.3            
#>  3rd Qu.:1.0000            3rd Qu.:1.0000   3rd Qu.:1.0            
#>  Max.   :1.0000            Max.   :1.0000   Max.   :1.0            
#>  NA's   :24                NA's   :14       NA's   :20             
#>  trust_political_parties_tcc trust_public_administration
#>  Min.   :0                   Min.   :0.0000             
#>  1st Qu.:0                   1st Qu.:0.0000             
#>  Median :0                   Median :1.0000             
#>  Mean   :0                   Mean   :0.5366             
#>  3rd Qu.:0                   3rd Qu.:1.0000             
#>  Max.   :0                   Max.   :1.0000             
#>  NA's   :90                  NA's   :18                 
#>  trust_regional_local_authorities trust_united_nations trust_united_nations_tcc
#>  Min.   :0.0000                   Min.   :0.0          Min.   :0.0             
#>  1st Qu.:0.0000                   1st Qu.:0.0          1st Qu.:0.0             
#>  Median :1.0000                   Median :0.5          Median :1.0             
#>  Mean   :0.5128                   Mean   :0.5          Mean   :0.6             
#>  3rd Qu.:1.0000                   3rd Qu.:1.0          3rd Qu.:1.0             
#>  Max.   :1.0000                   Max.   :1.0          Max.   :1.0             
#>  NA's   :22                       NA's   :24           NA's   :90              
#>  weight_result_from_target_redressment weight_germany  
#>  Min.   :0.4376                        Min.   :0.0000  
#>  1st Qu.:0.7858                        1st Qu.:0.0000  
#>  Median :1.0876                        Median :0.0000  
#>  Mean   :1.1537                        Mean   :0.3344  
#>  3rd Qu.:1.4168                        3rd Qu.:0.5392  
#>  Max.   :2.5878                        Max.   :2.1909  
#>                                                        
#>  weight_extrapolated_population_aged_gt_15
#>  Min.   :  203.8                          
#>  1st Qu.: 1506.2                          
#>  Median : 5816.5                          
#>  Mean   :18774.5                          
#>  3rd Qu.:24301.3                          
#>  Max.   :95773.7                          
#> 

Documentation

trust_in_army_doc <- retroharmonize::document_survey_item(
  trust_in_army)
trust_in_army_doc$code_table %>% kable ()
values survey_id_values labels survey_id_labels missing
0 1 not_trust Tend to trust FALSE
1 2 trust Tend not to trust FALSE
99997 3 do_not_know DK TRUE
99999 9 inap Inap. (CY-TCC in isocntry) TRUE
trust_in_army_doc$history_var_label
#>                         label               survey_id_label 
#> "TRUST IN INSTITUTIONS: ARMY" "TRUST IN INSTITUTIONS: ARMY"