There is one accessible database in the private environment:
The accessible tables within that database are:
Complete data is available in the latest partition, so there is no need to query older partitions. We recommend using the latest version of each data table as that will give you access to more columns or data.
See the full codebook for the URL Shares dataset which is hosted on the Social Science One website.
Aggregate statistics exist in the tables marked “DP”. They have noise added to certain variables for differential privacy.
An artificial example data set with a few observations is available here; you might find it helpful in understanding the fields described below.
Within erc_condor_url_attributes_dp_final
the columns available are:
Column name | Data type | Description |
---|---|---|
url_rid | string | A unique URL ID created specifically for this data set. |
clean_url | string | The web page URL after processing. This is the full URL, not just the domain. URLs that are no longer reachable persist in the data. The URLs have been processed in an attempt to consolidate different web addresses that point to the same URL and to remove potentially private or sensitive data. |
full_domain | string | The full domain name from the URL. |
parent_domain | string | The parent domain name from the URL. |
first_post_time | timestamp | The date and time when the URL was first posted by a user on Facebook. Date-times are truncated to ten-minute increments. The exact format is YYYY-MM-DD HH:MM:SS. For example: 2017-12-02 18:10:00. |
first_post_time_unix | bigint | The first_post_time field translated into UNIX time, which is the number of seconds since 1970-01-01 00:00:00. For example: 1449079800. |
share_title | string | The title provided by the author of the URL's content, pulled from the og:title field in the original HTML if possible. |
share_main_blurb | string | The description provided by the author of the URL's content, pulled from the og:description field in the original HTML if possible. |
tpfc_rating | string | If the URL was sent to third-party fact-checkers (TPFC), indicates whether and how they rated it. See the full codebook for the URL Shares dataset which is hosted on the Social Science One website for additional information on third-party fact checking. |
tpfc_first_fact_check | timestamp | The date and time the article was first fact-checked. NULL indicates the article has not been fact-checked. Date-times are truncated to ten-minute increments. The exact format is YYYY-MM-DD HH:MM:SS. For example: 2017-12-02 18:10:00. |
tpfc_first_fact_check_unix | bigint | The tpfc_first_fact_check field translated into UNIX time which is the number of seconds since 1970-01-01 00:00:00. For example: 1449079800. |
spam_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as spam over the period from January 1, 2017 to July 31, 2019. |
false_news_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as false news over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
hate_speech_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as hate speech over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
public_shares_top_country | string | URL shares are tallied by country and the country with the most (differentially private) shares is provided as an ISO 3166-1 alpha-2 code. This field is not indicative of all locations where this article was posted. Rather, it is the top country among users who shared it. |
Within erc_condor_url_attributes_dp_final_v2
the columns available are:
Column name | Data type | Description |
---|---|---|
url_rid | string | A unique URL ID created specifically for this data set. |
clean_url | string | The web page URL after processing. This is the full URL, not just the domain. URLs that are no longer reachable persist in the data. The URLs have been processed in an attempt to consolidate different web addresses that point to the same URL and to remove potentially private or sensitive data. |
parent_domain | string | The parent domain name from the URL. |
full_domain | string | The full domain name from the URL. |
first_post_time | timestamp | The date and time when the URL was first posted by a user on Facebook. Date-times are truncated to ten-minute increments. The exact format is YYYY-MM-DD HH:MM:SS. For example: 2017-12-02 18:10:00. |
first_post_time_unix | bigint | The first_post_time field translated into UNIX time, which is the number of seconds since 1970-01-01 00:00:00. For example: 1449079800. |
share_title | string | The title provided by the author of the URL's content, pulled from the og:title field in the original HTML if possible. |
share_main_blurb | string | The description provided by the author of the URL's content, pulled from the og:description field in the original HTML if possible. |
tpfc_rating | string | If the URL was sent to third-party fact-checkers (TPFC), indicates whether and how they rated it. See the full codebook for the URL Shares dataset which is hosted on the Social Science One website for additional information on third-party fact checking. |
tpfc_first_fact_check | timestamp | The date and time the article was first fact-checked. NULL indicates the article has not been fact-checked. Date-times are truncated to ten-minute increments. The exact format is YYYY-MM-DD HH:MM:SS. For example: 2017-12-02 18:10:00. |
tpfc_first_fact_check_unix | bigint | The tpfc_first_fact_check field translated into UNIX time which is the number of seconds since 1970-01-01 00:00:00. For example: 1449079800. |
spam_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as spam over the period from January 1, 2017 to July 31, 2019. |
false_news_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as false news over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
hate_speech_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as hate speech over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
public_shares_top_country | string | URL shares are tallied by country and the country with the most (differentially private) shares is provided as an ISO 3166-1 alpha-2 code. This field is not indicative of all locations where this article was posted. Rather, it is the top country among users who shared it. |
Within erc_condor_url_attributes_dp_final_v3
the columns available are:
Column name | Data type | Description |
---|---|---|
url_rid | string | A unique URL ID created specifically for this data set. |
clean_url | string | The web page URL after processing. This is the full URL, not just the domain. URLs that are no longer reachable persist in the data. The URLs have been processed in an attempt to consolidate different web addresses that point to the same URL and to remove potentially private or sensitive data. |
parent_domain | string | The parent domain name from the URL. |
full_domain | string | The full domain name from the URL. |
first_post_time | timestamp | The date and time when the URL was first posted by a user on Facebook. Dates-times are truncated to ten-minute increments. The exact format is YYYY-MM-DD HH:MM:SS. For example: 2022-12-02 18:10:00. |
first_post_time_unix | bigint | The first_post_time field translated into UNIX time, which is the number of seconds since 1970-01-01 00:00:00. For example: 1449079800. |
share_title | string | The title provided by the author of the URL's content, pulled from the og:title field in the original HTML if possible. |
share_main_blurb | string | The description provided by the author of the URL's content, pulled from the og:description field in the original HTML if possible. |
tpfc_rating | string | If the URL was sent to third-party fact-checkers (TPFC), indicates whether and how they rated it. See the full codebook for the URL Shares dataset which is hosted on the Social Science One website for additional information on third-party fact checking. |
tpfc_first_fact_check | timestamp | The date and time the article was first fact-checked. NULL indicates the article has not been fact-checked. Date-times are truncated to ten-minute increments. The exact format is YYYY-MM-DD HH:MM:SS. For example: 2017-12-02 18:10:00. |
tpfc_first_fact_check_unix | bigint | The tpfc_first_fact_check field translated into UNIX time which is the number of seconds since 1970-01-01 00:00:00. For example: 1449079800. |
spam_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as spam over the period from January 1, 2017 to July 31, 2019. |
false_news_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as false news over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
hate_speech_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as hate speech over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
public_shares_top_country | string | URL shares are tallied by country and the country with the most (differentially private) shares is provided as an ISO 3166-1 alpha-2 code. This field is not indicative of all locations where this article was posted. Rather, it is the top country among users who shared it. |
Within erc_condor_url_breakdowns_dp_clean_partitioned
the columns available are:
Column name | Data type | Description |
---|---|---|
url_rid | string | A unique URL ID created specifically for this data set. |
age_bracket | string | Age data from users' profiles. Brackets include 18-24, 25-34, 35-44, 45-54, 55-64, 65+, and NULL. |
gender | string | Gender data from users' profiles. Values include male, female, and other. |
political_page_affinity | integer, ordered | A measurement of user's connections to Pages with similar audiences as Pages representing politicians of known political affiliation/ideology, based on Barberá et al. [2015]. See the full codebook for the URL Shares dataset which is hosted on the Social Science Onewebsite for additional information about our political page affinity model. |
views | bigint | Number of users who viewed a post containing the URL. |
clicks | bigint | Number of users who clicked on the URL. |
shares | bigint | Number of users who shared the URL in a post or reshared such a post. |
likes | bigint | Number of users who "liked" posts containing the URL. |
loves | bigint | Number of users who "loved" posts containing the URL. |
hahas | bigint | Number of users who reacted with "haha" to posts containing the URL. |
wows | bigint | Number of users who reacted with "wow" to posts containing the URL. |
sorrys | bigint | Number of users who reacted with "sad" to posts containing the URL. Note that the official name for this reaction is "sad", but the column name in this dataset is "sorrys". |
angers | bigint | Number of users who reacted with "angry" to posts containing the URL. |
comments | bigint | Number of users who commented on posts containing the URL. |
total_share_without_clicks | bigint | Number of users who shared a post containing the URL but did not actually click on the link themselves. Some users share articles without first clicking through to the actual content. This number might help identify articles that users are sharing without reading, or URLs used in organized campaigns to spread content. |
c (country) | string | Country in which the actions recorded in this table occurred. This variable is stored in a column called **c** in the dataset. Data is partitioned on this variable and includes data for countries needed to conduct analysis for research proposals already approved through Social Science One. See the latest release of the full codebook for the URL Shares dataset which is hosted on the Social Science One website for a list of included countries. |
year_month | string | Year and month. Data is partitioned on this variable. |
spam_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as spam over the period from January 1, 2017 to July 31, 2019. |
false_news_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as false news over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
hate_speech_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as hate speech over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
public_shares_top_country | string | URL shares are tallied by country and the country with the most (differentially private) shares is provided as an ISO 3166-1 alpha-2 code. This field is not indicative of all locations where this article was posted. Rather, it is the top country among users who shared it. |
Within erc_condor_url_breakdowns_dp_clean_partitioned_v2
the columns available are:
Column name | Data type | Description |
---|---|---|
url_rid | string | A unique URL ID created specifically for this data set. |
age_bracket | string | Age data from users' profiles. Brackets include 18-24, 25-34, 35-44, 45-54, 55-64, 65+, and NULL. |
gender | string | Gender data from users' profiles. Values include male, female, and other. |
political_page_affinity | integer, ordered | A measurement of user's connections to Pages with similar audiences as Pages representing politicians of known political affiliation/ideology, based on Barberá et al. [2015]. See the full codebook for the URL Shares dataset which is hosted on the Social Science One website for additional information about our political page affinity model. |
views | bigint | Number of users who viewed a post containing the URL. |
clicks | bigint | Number of users who clicked on the URL. |
shares | bigint | Number of users who shared the URL in a post or reshared such a post. |
likes | bigint | Number of users who "liked" posts containing the URL. |
loves | bigint | Number of users who "loved" posts containing the URL. |
hahas | bigint | Number of users who reacted with "haha" to posts containing the URL. |
wows | bigint | Number of users who reacted with "wow" to posts containing the URL. |
sorrys | bigint | Number of users who reacted with "sad" to posts containing the URL. Note that the official name for this reaction is "sad", but the column name in this dataset is "sorrys". |
angers | bigint | Number of users who reacted with "angry" to posts containing the URL. |
comments | bigint | Number of users who commented on posts containing the URL. |
total_share_without_clicks | bigint | Number of users who shared a post containing the URL but did not actually click on the link themselves. Some users share articles without first clicking through to the actual content. This number might help identify articles that users are sharing without reading, or URLs used in organized campaigns to spread content. |
c (country) | string | Country in which the actions recorded in this table occurred. This variable is stored in a column called **c** in the dataset. Data is partitioned on this variable and includes data for countries needed to conduct analysis for research proposals already approved through Social Science One. See the latest release of the full codebook for the URL Shares dataset which is hosted on the Social Science One website for a list of included countries. |
year_month | string | Year and month. Data is partitioned on this variable. |
spam_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as spam over the period from January 1, 2017 to July 31, 2019. |
false_news_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as false news over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
hate_speech_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as hate speech over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
public_shares_top_country | string | URL shares are tallied by country and the country with the most (differentially private) shares is provided as an ISO 3166-1 alpha-2 code. This field is not indicative of all locations where this article was posted. Rather, it is the top country among users who shared it. |
Within erc_condor_user_reports_dp_final
the columns available are:
Column name | Data type | Description |
---|---|---|
url_rid | string | A unique URL ID created specifically for this data set. |
year_month | string | Year and month. Data is partitioned on this variable. |
spam_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as spam over the period from January 1, 2017 to July 31, 2019. |
false_news_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as false news over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |
hate_speech_usr_feedback | bigint | The total number of unique users who reported posts containing the URL as hate speech over the period from January 1, 2017 to July 31, 2019. The User Reports Table contains monthly aggregations of the data in this column for months subsequent to July 2019. |