Once you import our query module, you can use it to query the dataset using your own custom SQL queries. The code sample below shows importing the query module; specifying the database, table, and API table; defining a SQL query; and assigning the query to a variable (sql
).
library(fbrir)
athena = AthenaFacade$new()
database <- "fbri_prod_atp"
table <- "ad_targeting_dataset_siep_aug_2020"
api_table <- "ad_archive_api"
# Define your own SQL query and assign to variable 'sql'
sql <- sprintf("SELECT * FROM %s.%s LIMIT 5", database, table)
athena$QueryAthena(sql)
The dataframe result from the R example would look similar to this (blurred intentionally):
The dataframe result from the Python example would look similar to this (blurred intentionally):
The sample queries in the rest of this section demonstrate the types of queries you can perform against the dataset. For each example, we show the SQL first followed by a tabbed codeblock with Python and R so you can see how the SQL is used to define the sql
variable. To try out a sample query, copy the R or Python from the codeblock and paste it into your Jupyter notebook cell. Then run the code.
Dataframe results screenshots have been blurred intentionally. They are intended only to show the output format you can expect.
SELECT archive_id, include FROM {database}.{table} WHERE CARDINALITY( FILTER( CAST(JSON_EXTRACT(include, '$') AS ARRAY(MAP(VARCHAR, VARCHAR))), (x) -> ELEMENT_AT(x, 'Joe Biden') = 'Interests')) > 0
sql <- sprintf("SELECT archive_id, include FROM %s.%s WHERE CARDINALITY( FILTER( CAST(JSON_EXTRACT(include, '$') AS ARRAY(MAP(VARCHAR, VARCHAR))), (x) -> ELEMENT_AT(x, 'Joe Biden') = 'Interests')) > 0", database, table)
athena$QueryAthena(sql)
WITH education_table AS ( SELECT REDUCE( CAST(JSON_EXTRACT(include, '$') AS ARRAY(MAP(VARCHAR, VARCHAR))), MAP(), (s, x) -> ( MAP_CONCAT(s, MAP_FILTER(x, (k, v) -> v = 'Education level')) ), s -> s ) AS education_levels FROM {database}.{table} ) SELECT education_levels, COUNT(*) AS count FROM education_table GROUP BY education_levels; ORDER BY -count;
sql <- sprintf("WITH education_table AS ( SELECT REDUCE( CAST(JSON_EXTRACT(include, '$') AS ARRAY(MAP(VARCHAR, VARCHAR))), MAP(), (s, x) -> (MAP_CONCAT(s, MAP_FILTER(x, (k, v) -> v = 'Education level'))), s -> s) AS education_levels FROM %s.%s) SELECT education_levels, COUNT(*) AS count FROM education_table GROUP BY education_levels ORDER BY -count;", database, table)
athena$QueryAthena(sql)
SELECT exclusion, exclusion_type, COUNT(*) FROM {database}.{table} CROSS JOIN UNNEST(CAST(JSON_EXTRACT(exclude, '$') AS MAP(VARCHAR, VARCHAR))) AS t ( exclusion, exclusion_type ) GROUP BY exclusion, exclusion_type;
sql <- sprintf("SELECT exclusion, exclusion_type, COUNT(*) FROM %s.%s CROSS JOIN UNNEST(CAST(JSON_EXTRACT(exclude, '$') AS MAP(VARCHAR, VARCHAR))) AS t (exclusion, exclusion_type) GROUP BY exclusion, exclusion_type;", database, table)
athena$QueryAthena(sql)
In this query, we use the Ad Library data table (ad_archive_api
) to combine targeting data with delivery data.
SELECT a.* FROM {database}.{table} AS a, {database}.{api_table} AS b WHERE a.ds = b.ds AND a.archive_id = b.fbid AND b.reached_countries LIKE '%US%';
sql <- sprintf("SELECT a.* FROM %s.%s AS a, %s.%s AS b WHERE a.ds = b.ds AND a.archive_id = b.fbid AND b.reached_countries LIKE '%%US%%';", database, table, database, api_table)
athena$QueryAthena(sql)