Overview

The URL Shares dataset is one of the most comprehensive collection of URLs shared on social media to date. The dataset contains privacy-protected aggregates showing public shares, user-flagged false news, hate speech, reactions, spam, and the ratio of shares without clicks. We also provide content summaries and third party fact-checking ratings.

The full updated codebook, which contains details related to both the data and our privacy implementation is available here.

Differences between the two environments

The URL Shares dataset allows you to access data tables that can help you study URL interactions and distribution. There are two environments in which you can access the data: the public environment and the private environment.

The public environment's dataset uses the fbri_prod_public database and has access to three tables. The private environment's dataset uses the fbri_prod_private database and has access to six tables.

Though the public environment tables give you access to a smaller subset of data, that data is downloadable. The private environment offers more tables, but the data is not downloadable.

Accessing the environments

The Get started section provides information on getting set up to use the dataset, once you have been granted access. The steps include how to set up the VPN, connect to the environment, and run a simple query.

Getting assistance

For support questions, we use a tool called Direct Support to quickly triage your questions, routing each one to the most appropriate expert. See Get help for information and instructions. You might also find an answer to your question in the URL Shares FAQ.

Using the public environment

The Public environment guides provide information on effective dataset usage in the public environment. Specifically, there are guides covering basic functionality, data visualization, additional processing using aggregate functions, avoiding query limits, and topic modeling.

For information on using Statistical Valid Inference (svinfer) for statistical modeling, see the private environment guides. svinfer is more typically used in the private environment where there is a larger volume of data.

Be sure to use sample code in the environment (public or private) for which it was intended.

Sample code could fail when run in an incompatible environment.

Downloading data from the public environment

To avoid any issues regarding usage of downloaded data, be sure to follow the Data sharing for reproducibility and publication policy guide. If you have any questions, reach out to us by creating a Direct Support ticket.

Using the private environment

The private environment guides provide information on effective dataset usage in the private environment. Specifically, there are guides covering basic functionality, data visualization, additional processing using aggregate functions, avoiding query limits, and topic modeling. Since you have access to a larger number of tables in the private environment, we also include a tutorial on statistical modeling using svinfer. There is nothing preventing you from using svinfer in the public environment as well.

Be sure to use sample code in the environment (public or private) for which it was intended.

Sample code could fail when run in an incompatible environment.

Policies

See the Data sharing for reproducibility and publication policy guide for important policy guidelines.

Important

The access to and use of data covered by these data sharing guidelines are subject to the Research Data Agreement. These guidelines are provided for information purposes only and may be subject to change. If you have any questions about these guidelines, please reach out to us using Direct Support.

For researchers interested in accessing URL Shares

If you are a researcher not yet onboarded to the URL Shares dataset, but interested in starting the process, click here for information and guidance.