The querying library uses the AWS Athena API. Some performance optimizations such as partitioning have been applied and should be leveraged when querying.
For example, public_shares_top_country
is a partition of the Country
attribute, so be sure to indicate the country in your code. The following code sample demonstrates this (WHERE public_shares_top_country='US'
).
from fbri.public.sql.query import execute import pandas as pd database = "fbri_prod_public" table = "erc_condor_url_attributes_dp_final_public_v3" sql = f""" SELECT public_shares_top_country AS Country, COUNT(*) AS Count FROM {database}.{table} WHERE public_shares_top_country = 'US' GROUP BY public_shares_top_country""" result = execute(sql, "top_country_counts.tsv") df = pd.read_csv("top_country_counts.tsv", delimiter = '\t') df
This produces an aggregate statistic that should look similar to this example:
Note
You can customize the label for a created tables column by modifying the word after "as" in the SELECT statement.
You can query further to investigate the rows that make up this population. For example, the following code only requests data with US
in the public_shares_top_country
column.
from fbri.public.sql.query import execute import pandas as pd database = "fbri_prod_public" table = "erc_condor_url_attributes_dp_final_public_v3" sql = f""" SELECT * FROM {database}.{table} WHERE public_shares_top_country = 'US' LIMIT 20""" result = execute(sql, "top_country_counts.tsv") df = pd.read_csv("top_country_counts.tsv", delimiter = '\t') df
The results look similar to this example: