Additional processing

The querying library uses the AWS Athena API. Some performance optimizations such as partitioning have been applied and should be leveraged when querying.

For example, public_shares_top_country is a partition of the Country attribute, so be sure to indicate the country in your code. The following code sample demonstrates this (WHERE public_shares_top_country='US').

from fbri.public.sql.query import execute
import pandas as pd
database = "fbri_prod_public"
table = "erc_condor_url_attributes_dp_final_public_v3"

sql = f"""
SELECT public_shares_top_country AS Country, COUNT(*) AS Count 
FROM {database}.{table} 
WHERE public_shares_top_country = 'US' 
GROUP BY public_shares_top_country"""

result = execute(sql, "top_country_counts.tsv")
df = pd.read_csv("top_country_counts.tsv", delimiter = '\t')
df

This produces an aggregate statistic that should look similar to this example:

Note

You can customize the label for a created tables column by modifying the word after "as" in the SELECT statement.

You can query further to investigate the rows that make up this population. For example, the following code only requests data with US in the public_shares_top_country column.

from fbri.public.sql.query import execute
import pandas as pd
database = "fbri_prod_public"
table = "erc_condor_url_attributes_dp_final_public_v3"

sql = f"""
SELECT *
FROM {database}.{table} 
WHERE public_shares_top_country = 'US' 
LIMIT 20"""

result = execute(sql, "top_country_counts.tsv")
df = pd.read_csv("top_country_counts.tsv", delimiter = '\t')
df

The results look similar to this example: