This document provides a reference to the names used for the URL Breakdowns, Attributes, and User Reports tables. It provides information about the most up-to-date version of the tables that comprise the URL Shares dataset. When we perform twice yearly data refresh updates (January and July), we add new months of data into these tables, rather than creating new tables. We update the information below to reflect the latest data refresh.
The full codebook for the URL Shares dataset is hosted on the Social Science One website. It includes information about the scope, structure, fields, and privacy-preserving characteristics.
We do not update the codebook when we perform twice yearly data refresh updates
Instead, this Overview of URL Shares data tables page will always contain the most current dates.
Object function | Table | Environment | Partitions available |
---|---|---|---|
URL Breakdown | erc_condor_url_breakdowns_dp_clean_partitioned_v2 | Private | From 2017-01 to 2022-06 |
URL Attributes | erc_condor_url_attributes_dp_final_v3 | Private | 2022-07 (Will have complete data available in the latest partition of year_month) |
User Reports | erc_condor_user_reports_dp_final | Private | From 2017-01 to 2022-06 |
URL Attributes Public | erc_condor_url_attributes_dp_final_public_v3 | Public | Will have complete data available in the latest partition (2022-07-12) |
Fact Checked URL Attributes Public | erc_condor_url_attributes_public_fact_checked | Public | Will have complete data available in the latest partition (2022-07) |
The erc_condor_url_breakdowns_dp_clean_partitioned_v2 table
will contain data from January 2017 to June 2022. When new months of data are added, that data will be loaded into the erc_condor_url_breakdowns_dp_clean_partitioned_v2
table. New data will go into new partitions of the year_month column, so no previous data will be overwritten.
The erc_condor_url_attributes_dp_final_v3
contains all the data in the latest date in the year_month column (date format is YYYY-MM, i.e. January 2017 is written as 2017-01). We added a year_month column (a partition column) to erc_condor_url_attributes_dp_final_v3
so that we are able to update data in
future releases without having to create a new table with a different name.
To explain how this works in practical terms, currently we have all the data in the year_month partition 2022-07. If the next update to the dataset were to happen in August 2022, the data will appear in the 2022-08 partition of year_month. Complete data are available in the latest partition, so there will be no need to query older partitions (filter to the appropriate year_month partition in the WHERE statement of a query on this table).
The erc_condor_user_reports_dp_final
table will contain data from January 2017 to June 2022. When new months of data are added, that data will be loaded into the erc_condor_user_reports_dp_final
table. New data will go into new partitions of the year_month column, so no previous data will be overwritten.
This table holds the columns of the URL attributes table which are approved for inclusion in the public environment. Complete data are available in the latest partition, so there will be no need to query older partitions.
The table holds the columns of the URL attributes table which are approved for inclusion in the public environment which are fact checked. Complete data are available in the latest year month partition, so there will be no need to query older partitions. Here the year_month means the year and the month refresh happened.