Queries in Meta Content Library and API are powered by a version of Facebook’s search engine infrastructure, which relies on a combination of indexing and ranking to return relevant entities. The retrieval function combines matched and ranked IDs from a query using the same distributed memory caching layer which serves live platform content. This ensures that results represent the most current state of content on the platform and meet privacy-preserving visibility criteria (see Frequently asked questions for details).
As content is created or modified on Meta platforms, associated entities are registered to a search index built from individual words extracted from text fields as tokens. Tokenization generally isolates words separated by spaces or punctuation ( ?@$%^*()+=~[{}];:"<>|. ), with some URL normalization and locale-specific adjustments for non-English languages. Tokenization is exact, in that it does not introduce additional word variants ("cats" will not be tokenized to "cat"). Direct mentions of users or other platform entities (via @) are not tokenized and not searchable. Direct mentions are scrubbed from returned text fields if the user does not meet eligibility criteria for producer.
At query time, relevant content is identified by exact matching between tokens and individual search terms. Candidate matches are then subject to additional filtering via Boolean query logic (see Advanced search guidelines) and other selected filters. Matching is performed independently by word in a query, meaning that searching by phrase is not supported (queries “All for one” and “One for all” are equivalent).
The Content Library API uses search endpoints to extract a subset of data from an extremely large online database. Each search endpoint specifies the search path to be used and sets the values of the parameters that determine which data will be returned.
The parameters accepted by each search are described in the search guides.
The following sections describe how to use search endpoints to achieve various objectives, providing query examples in both R and Python.
Unless you specify otherwise, a search endpoint is synchronous in nature, meaning that you submit the query and wait for the results, and you cannot submit another until the first one finishes. Synchronous search results display 10 results at a time (a "page") and you request the next pages one by one. You can use the limit
parameter to update the number of entries returned per page, with a maximum of 500 entries. This type of search is most useful when the data matching the search criteria is expected to be small or when you just want to sample some results to see if they are appropriate for your research and don't necessarily need to see everything. This can be a step in the process of fine-tuning your search criteria.
Synchronous searches can return a maximum of 1,000 results. When you need a larger set of results, use asynchronous searches.
library(reticulate)
client <- import("metacontentlibraryapi")$MetaContentLibraryAPIClient
client$set_default_version(client$LATEST_VERSION)
response <- client$get(
path = "search/facebook_posts",
params = list(
"q" = "mountains",
"since" = "1072915200",
"until" = "1721058315",
)
)
all_posts <- list()
while (client$has_next(response)) {
text <- jsonlite::fromJSON(response$text, flatten = TRUE)$data
all_posts <- dplyr::bind_rows(all_posts, text)
response <- client$`next`(response)
}
all_posts
Asynchronous searching capability is available for when you want to work with all of the data returned from a search, up to the search results limit that applies to every individual asynchronous search. Asynchronous searches can take some time (minutes to days) to complete because they return all of the data requested, not just one page at a time. However, because the search happens in the background, you don't have to wait for the results before submitting another search or doing other work. Once the search results are ready, you can fetch them.
The search result limit for each asynchronous search is 100,000 results. See Estimating response size.
Researcher Platform also has an asynchronous search feature, but that is strictly for static database use and does not work the same way. With Content Library API, you can only use the async_search endpoints
described here.
When you submit an asynchronous search, the API returns a handle id indicating the successful submission of the asynchronous query if the expected results are below the 100,000 search results limit, and an error message if the expected results are over the limit.
If the search is successfully submitted, you can check on the status and receive either IN_PROGRESS or COMPLETE.
Use the get_status()
method to get the status and the get_data()
method to get the data:
library(reticulate)
client <- import("metacontentlibraryapi")$MetaContentLibraryAPIClient
async_utils <- import("metacontentlibraryapi")$MetaContentLibraryAPIAsyncUtils
response <- client$get(
path="async_search/facebook_posts",
params=list("q"="mountains", "since"=1072915200, "until"=1720820310))
jsonlite::fromJSON(response$text, flatten = TRUE)
# Check the status of an asynchronous query
status_response <- async_utils$get_status(response=response)
When status shows COMPLETE, you can fetch the data. You can also write the data to a file, which will be stored in the /previous_searches/
folder in your Jupyter environment in JSON format.
# Retrieve the results of an asynchronous query
get_data_response = async_utils.get_data(response=response)
# Write data to file
async_utils$write_data_to_file(response=response)
Use the get_all_async_queries()
method to get a list of all your previously executed asynchronous searches.
response = async_utils$get_all_async_queries()
jsonlite::fromJSON(response$text, flatten=TRUE)
Use the estimate
parameter to get a rough idea of how much data would be returned from a search you have defined. Since the API can only return up to 100,000 results from a single asynchronous search, it can be helpful to know in advance if your search is likely to fail because the response size is too large. If the estimate comes out higher than 100,000, consider modifying the parameters to reduce the response size. You can continue to modify the search parameters and get new estimates until the search results are predicted to fall below the maximum allowed.
This is typically most useful for post searches because the number of results tend to be higher, but it can be used to estimate the size of data that would be returned by any search.
library(reticulate)
client <- import("metacontentlibraryapi")$MetaContentLibraryAPIClient
client$set_default_version(client$LATEST_VERSION)
# Request an estimate
response <- client$get(
path="search/facebook_posts" ,
params=list("q" = "mountains"),
estimate=True)
# Display the estimate
jsonlite::fromJSON(response$text, flatten=TRUE)