Data visualization

You can perform some statistical data visualization by using the seaborn library.

The following examples assume you already have a derived table df by having run code like this:

from fbri.private.sql.query import execute
import pandas as pd
      
database = "fbri_prod_private"
table = "erc_condor_url_attributes_dp_final_v3"

sql = f"""
SELECT *
FROM {database}.{table}
LIMIT 20
"""
      
result = execute(sql, "attributes.tsv")
df = pd.read_csv('attributes.tsv', delimiter = '\t')
df

Data visualization examples

If a blank cell is not already available, create a new cell by clicking + in the notebook's top navigation bar.

Single value distribution plot

Insert the following code into a blank cell:

import seaborn as sns
sns.distplot(df.spam_usr_feedback)

This code imports the seaborn library and then creates a distribution plot from the value counts of the parent_domains column. sns.distplot() can create distribution plots from integer value-based tables.

You can do this for other table columns such as false_news_usr_feedback and hate_speech_usr_feedback by inserting the column name into sns.distplot.

For the false_news_usr_feedback column, it would look like this:

For the hate_speech_usr_feedback column, it would look like this:

Pair plot

To create a pair plot, use the following code:

sns.pairplot(df[['spam_usr_feedback', 'false_news_usr_feedback', 'hate_speech_usr_feedback']])

In this example sns.pairplot() creates a pairplot using the spam_usr_feedback, false_news_usr_feedback, and hate_speech_usr_feedback columns from the df table. Running the code should yield results similar to this:

Learn more