Private Computation: Reference

1: Semi-Auto Data Ingestion/Preparation

The following instructions apply to both Private Lift and Private Attribution as they share the same Conversions API Gateway pipeline and requirements. They are provided to support your efforts to set up your Private Computation environment/PCS AWS Infrastructure to complete Step 2 in the Partner Setup Guide (which describes the work needed to set up and run a private measurement study).

Before You Begin

The Conversions API Gateway version should be 1.0.2 or greater.
When setting up Private Computation environment/PCS AWS Infrastructure, Step 2 in the Setup Guide, semi-auto data ingestion needs to be installed as outlined in the instructions detailed below.
The semi-auto pipeline needs at least 10-15 minutes to initialize after deployment.

Glossary

S3 Data Bucket: The bucket created or provided when you run the deploy command during the Private Lift Environment. It is this parameter:

“-d <S3 data output prefix>”

S3 Config Bucket: The bucket created or provided when you run the deploy command during the Private Lift Environment. Parameter:

“-s <S3 config bucket prefix>”

Input

Prepare Your CSV File

Prepare a CSV file containing the columns detailed in the table below. Follow the instructions in the Description column.

Column Name	Description
`email`	Match key to be used to create the Private ID. Format: Trim any leading and trailing spaces Lowercase SHA-256 hashed (please check “How to hash email/phone” section below)
`device_id`	Match key to be used to create the Private ID. Format: Lower case the IDFA/AAID Keep the hyphen Optional, but highly recommend to include this column if available
`phone`	Match key to be used to create the Private ID. Format: Numbers Remove hyphen SHA-256 hashed (please check “How to hash email/phone” section below) Optional, but highly recommend to include this column if available
`client_ip_address`	Match key to be used to create the Private ID. Format: could be either IPv4 or IPv6 Optional, but highly recommend to include this column if available
`data_source_id`	The Pixel or App ID (Must match the data source ID in study).
`timestamp`	Event timestamp (in unix time seconds)
`conversion_value`	Conversion value Convert to USD Can be float or integer
`event_type`	Event types (for example, `Purchase`, `AddToCart`, and so on) (Match event name in study)
`action_source`	The value is `app`, `website`, or `others` depending on your data.

The file name cannot have any spaces or special characters (for example, any bracket, (,[,{.).
The column names should be in lowercase and need to exactly match the names listed above. The order doesn't matter.
If you only have one match key for all events, please don't include other columns into the CSV file. For example, if you will only use email, don't add device_id column into the CSV file.
If you would like to utilize multi-key matches in the future, please include all the PII data columns into the CSV file.
For Windows users, please open the CSV file in a text editor. There may be formatting issues (for example, 1234567890 is automatically changed to 1.23E+10) if you open the CSV file in Excel.

Quick Format/Encryption Check

After you have prepared the CSV file in line with the Input section above, please perform the following quick sanity check to make sure your CSV file is ready:

Check the header names to make sure they exactly match the names (cases, spellings) listed in Table 1 above.
Before hashing the email, the raw email should have the leading and trailing spaces trimmed and be in lowercase.
Check if the email addresses are hashed.
- Check if the email column is SHA256 hashed. The value should look like “30a79640dfd8293d4f4965ec11821f640ca77979ca0a6b365f06372f81a3f602” instead of “123@gmail.com”.
- Check if the SHA256 process is correct. Please do the same SHA256 process on this fake email address "123@gmail.com", and the expected value should be "30a79640dfd8293d4f4965ec11821f640ca77979ca0a6b365f06372f81a3f602
(If phone is available) - Check if phone is formatted and hashed correctly..
- The original phone should be 16125919838 instead of 1-612-591-9838.
- The value should look like dc7c6a9e4f0a28c9be8461414daab780eb50d7567f1548614c5e143c32bfbd8b instead of numbers.
- Check if the SHA256 process is correct, similar to the “email” example above
Check if the timestamp value is in unix time, and in the expected range. Randomly choose one or two values from the timestamp column and convert them to the readable format and make sure they are in the expected time range.

Upload CSV File

Upload the CSV file to the same S3 Data Bucket, under the semi-automated-data-ingestion directory.

Wait until the data mentioned in the Output section is showing in the S3 bucket.

Output

The re-processed and re-partitioned data will be ready in the same S3 Data Bucket, merged with the standard Conversions API Gateway data pipeline storage (not under semi-automated-data-ingestion).

Depending on the data size, it could take 5 to 30 mins to appear in S3.

Sample output (The S3 Data Bucket is “fb-pc-data-1101e2e” in this case.):

Generate Computation-ready Input CSV File

After you get the output, there are two more steps to generate the final/Computation-ready input CSV file to run Private Lift or Private Attribution.

Run AWS Glue Crawler. You have two options:
- Wait for at most one hour. The AWS Glue Crawler runs every hour. Or you can:
- Trigger the AWS Glue Crawler manually. You can navigate to the AWS console -> AWS Glue -> Crawlers (on the left menu bar) , and run the crawler with the name:

mpc-events-crawler-<Tag>

Follow the “Run every time you want to get results for a study” section in the PCS Partner Setup Guide to use AWS Athena to query the data and generate the CSV file.

How to Hash Email

Here is an example of how to perform the sha256 in python 3.x. Equivalent implementations in other languages are also supported. Just make sure it works for the following example:

Example email input: example@fb.com

Example sha256 output:

7a1d9f839aa2d4f3f348e8303bfcf699fd7c243baeb55238ee2d1bcd7b80f30e

Python 3.x:

import hashlib
sha256_output=hashlib.sha256(b'example@fb.com').hexdigest()

Or:

hashlib.sha256(bytes(str(example@fb.com), encoding="utf-8")).hexdigest()

Presto-SQL:

SELECT
  LOWER(
    TO_HEX(
      SHA256(CAST(‘example@fb.com’ AS VARBINARY))
  )
)

2: How to Set “Canary” Tier

Sometimes Meta would like you to run on the “canary” tier. Here is how you can set it up.

Open the Private Computation Infrastructure Shell:

https://<capig.instance.url>/hub/shell

Run the following update commands:

config write CloudResources /IMAGE_TAG canary

3: Configure an AWS IAM User with Minimal Permissions for Future Computation after Initial Infra Deployment

a. Open your AWS account and enter the IAM component.

b. Select an AWS IAM user. Either create a user or re-use an existing user.

c. Enter the user page and click Add permissions.

d. Attach the fb-pc-policy-<tag> to the user.

e. Add permission.

f. Now you can generate the access_key and Secret_key for this user to fill in the next step in Meta cloud platform.

4: Redeploying with Development UI - Data Migration Process Instructions

For those advertisers who have onboarded to Private Computation products and would now like to redeploy with the Deployment UI, please read and follow the instructions detailed below. These instructions apply when an advertiser, after first onboarding and having their data ingested into a data bucket, would then like to redeploy, where the new data will be ingested into a new, separate data bucket created by the Deployment UI. In order to query the data that is in the initial data bucket, the advertiser will need to go through the data migration process detailed in these instructions.

The “old data bucket” is the bucket created before. In this doc, the name for this old data bucket is fb-pc-data-<OLD TAG>.
The “new data bucket” is the bucket created by the Deployment UI. In this doc, the name for the new data bucket is fb-pc-data-<NEW TAG>.
The goal is to move the data from the “old data bucket” (fb-pc-data-<OLD TAG>) to the “new data bucket” (fb-pc-data-<NEW TAG>).
DO NOT undeploy any resources before the data migration is done.

Case 1: If you are NOT using the AWS credentials generated with policy `fb-pc-policy-<TAG>`

After the deployment with the Deployment UI is done, wait at least 15 mins. This is to allow the leftover data to be ingested to the old data bucket (fb-pc-data-<OLD TAG>)
Check if the SSE-KMS type encryption is enabled on the old data bucket (fb-pc-data-<OLD TAG>).
- Go to the bucket and click the Properties tab.
- Scroll to the “Default encryption” section, and if the SSE-KMS encryption is enabled, it should look like this:

- If the SSE-KMS is enabled, you have to do step 3, below, to change the encryption type

Edit server-side encryption (Only do this step if you verified it’s SSE-KMS in step 2). In the old data bucket (fb-pc-data-<OLD TAG>), select all the folders with name matches “year=/”. For example, “year=2022/”, “year=2021/”, and so on.
- Click the “Edit server-side encryption” button under the “Actions” dropdown list.

- Change it to the “SSE-S3” type and click Save changes.

Copy the folders. In the old data bucket (fb-pc-data-<OLD TAG>), select all the folders with name matches “year=/”. For example, “year=2022/”, “year=2021/”, and so on.
- Click the Copy button under the “Actions” dropdown list.

- Copy them to the new data bucket (fb-pc-data-<NEW TAG>) by choosing the new data bucket in the “Destination”, and click the Copy button.

Check the copied folders are showing in the new data bucket (fb-pc-data-<NEW TAG>).

Run the Glue crawler.
- Go to AWS Glue and click the “Crawlers” on the left side.

- Find the crawler with the name “mpc-events-crawler-<NEW TAG>”.
- Click Run Crawler.
- Wait until the crawler finishes.

Verify the data can be queried in Athena.
- Go to AWS Athena.
- Choose the database “mpc-events-db-<NEW TAG>”.
- A table with the name “fb_pc_data_<NEW TAG WITH UNDERSCORE>” should show up in the “Tables” section.
- Run a simple query.

Case 2: If you are using the AWS credentials generated with policy `fb-pc-policy-<OLD TAG>`

After the deployment with the Deployment UI is done, attach the new policy fb-pc-policy-<NEW TAG> as soon as possible.
Follow step 2 to step 7 in Case 1

5: Ad-Hoc System Diagnosis

System diagnosis is a way to validate your deployed infrastructure completeness and connectedness (including VPC peering) before we kick off any computation runs.

Once you have completed the AWS infrastructure deployment, you will see a summary page. Under the deployment summary page a new section is added for validating infrastructure dynamically as shown below:

Click run system diagnostics and provide AWS admin-level credentials to continue. If the system diagnosis finished successfully, you will see the following result:

In case of any failure, you will see a download button to download logs, which you can then share with Meta to further help in debugging.

6: Sharing Diagnostic Data with Meta

For a partner to help Meta troubleshoot issues and improve the product, you can send diagnostic data to Meta either manually or automatically.

In Meta, the diagnostics data will be kept for no longer than 30 days, and will be access controlled.

Limitations for automatic collection of diagnostic data:

Currently only for Private Lift on top of the Private Computation Infrastructure UI.
Logs collection happens at the end of a computation run.
Logs collection won’t happen if the computation run failed to start, for example, due to invalid AWS credentials assigned to config values, failure in input data preparation.

Manual Sharing with Meta

The diagnostic data is always collected automatically after every study run completes (with success or failure), and is saved to two locations in the S3 bucket used for input data, in the advertiser’s cloud account:

In the folder:

s3://fb-pc-data-<ENVIRONMENT_TAG>/logging/

The log archive file should look similar to:

logs_20221105T044117.481056Z_study-14827452455_run-12.zip

The archive file contains multiple logs from: output.txt (that is, coordinator logs), worker containers, data pipeline (Athena, Kinesis, Glue, Crawler).

In the folder containing result data, for example:

s3://fb-pc-data-<ENVIRONMENT_TAG>/query-results/fbpcs_instances_14827452455_12/

Log files can be: output.txt, job-debug.txt, and download_logs_cli.txt. Output.txt is the same as in the above archive file. The other two files help debugging the run launch, logs collection and uploading.

Usually sharing the first part of logs data is sufficient for Meta support engineers to investigate any issue related to a study run. In the computation UI, if you see a clickable link “Share Diagnostic Data with Meta” under a calculation run status, you can click it to trigger the logs upload. Then the logs upload should be finished within 5 minutes.

In case the second part of logs data is required for investigation, the advertiser engineer has to manually download them from S3 buckets, and share with Meta support engineers.

Automatic Sharing with Meta

You can opt in for automatic logs upload in the following stages:

For new deployment of Conversions API Gateway: there is “Share diagnostic data settings” setting in the “Customize environment” step. You can make your choice on whether to automatically share the diagnostic data or not
For upgrade and new deployment of Conversions API Gateway: you will see the one-time popup dialog when refreshing any page (except the “Updates” page) within the Private Computation Solution app. You can make your choice on whether to automatically share the diagnostic data or not, and save the setting.
After deployment of Conversions API Gateway: you can open the Environment tab within the Private Computation Solution app, and find the setting “Automatic diagnostic data sharing”. You can click the Edit button to update your choice, and save the setting.

By default, logs are not automatically uploaded after a successful run. You can manually click the “Share Diagnostic Data with Meta” link under a successful run to upload the corresponding logs. Or you can contact Meta support engineers to change the Conversions API Gateway config to automatically upload the logs after every successful run.

Logs upload status per computation run and troubleshooting:

In normal cases, the logs upload status shows one of the following: a) “Diagnostic Data was shared with Meta at <timestamp with timezone> (non-clickable). This means success of logs upload. b) “Share Diagnostics Data with Meta” (clickable), when the automatic sharing is not enabled, or the computation run succeeded.
The status is “Error Sharing Diagnostic Data with Meta” with clickable “Retry”: the logs upload had failed after multiple internal attempts. You can click the Retry button and then confirm in the dialog, to try again. When investigation is needed, you can contact Meta support engineers for help if logs upload keeps failing after multiple retries.
The status is missing: the computation run failed in early preparation stages of the calculation run, and no logs have been collected automatically. When investigation is needed, Meta support engineers might still ask you to manually retrieve logs from specific S3 buckets or CloudWatch locations, and then share with Meta.

7: Enable Multi-Key for Private Lift

To improve the performance and quality of matching, you can enable the multi-key feature (expected to 8 percent match rate increase, only supports Private Lift at this moment).

Open Private Computation Infrastructure Shell: https://<private-computation-infrastructure.instance.url>/hub/shell
Run the following update commands:
- config write Athena /USE_MULTIKEY true

To disable (disabled by default) the multi-key feature, repeat the steps above but replace true with false.

8: New Requirements on Graph API Access Token Permissions Are Enforced

In June 2022, we updated instructions for generating the GraphAPI token in Step 2. The consolidated list of permissions (required for both Private Lift and Private Attribution) are: ads_management, ads_read, business_management, and private_computation_access. We recommend that you cross-check the access token permission list, to ensure it has the full set of desired permission scopes. Follow these steps:

Go to Access Token Debugger: https://developers.facebook.com/tools/debug/accesstoken
Place access token in use into the input box, then click “Debug”.
Verify if all required permissions (ads_management, ads_read, and business_management, and private_computation_access) are listed in “Scopes”.
- If yes, no actions needed.
- If not, Meta recommends asking the advertiser to re-generate the long-lived access token per instructions in Step 2. Once the new access token is ready, it should be good to go.

9: Advanced Setting on Infrastructure Deployment Page, on Modal Stepper “Get VPC Details from Meta”

While deploying, if instructed by a Meta representative, you can use the advanced option in getting VPC details, as shown below.

Fill in the required fields and click Next.

Amazon Web Services Region: This is the AWS region where the resources would be deployed. It should be the same as the region used for Conversions API Gateway deployment. (This region should also match the Meta side AWS region.)
Publisher (Meta) Account ID: Meta AWS Account number provided by a Meta representative.
Publisher (Meta) VPC ID: Meta VPC ID provided by a Meta representative .

10: How to Retry a Failed VPC Peering Connection during Deployment

In case of a failed VPC peering connection during infrastructure deployment, you should see a screen similar to the following:

Click Retry VPC peering. You should see a pop-up window similar to the one below:

Input your business ID and Graph API token obtained earlier in Step 2 of the Setup Guide, and click Retry.

If the VPC peering status still shows as failed, please contact Meta to further assist you.

11: How to Resolve the JOB_NOT_PROVISIONED_ERROR in the Events Uploader Modal

If you see an error message that looks like this:

Check the following:

Go to the IAM AWS services page.
Click Policies.
Search for the deployed policy.
- It should look like:

‘fb-pc-iam-policy-<deploy_tag>’

Click {} JSON.
Check if the policy has permission to access the glue-ETL-<deploy_tag> resource.
- Search for glue-ETL on that page.
The allowed Resource should look like the following: arn:aws:glue:us-west-2:0123456789:job/glue-ETL-deploytag123

If this glue-ETL resource permission is missing, then:

Click Edit policy -> JSON.
Add this JSON block next to the other statements (first replace the <> sections with your own deployment values):

{
    "Effect": "Allow",
    "Action": [
        "glue:Get*",
        "glue:BatchGet*",
        "glue:List*",
        "glue:QuerySchemaVersionMetadata",
        "glue:CheckSchemaVersionValidity",
        "glue:SearchTables"
    ],
    "Resource": [
        "arn:aws:glue:<region>:<your_AWS_account_id>:job/glue-ETL-<deploy_tag>"
    ]
}

Example of Resource name: arn:aws:glue:us-west-2:0123456789:job/glue-ETL-mydeployment-123.
Do not update the the existing “glue:*” statement. Instead, add a new section with the above block.

Save the updated policy.

Refresh the Deployment UI page.

Open the Uploader modal and check if the problem has been resolved.

12: How to Resolve the BUCKET_CORS_MISSING_ERROR in the Events Uploader Modal

If you see the this error in the Uploader modal:

Follow these steps to resolve the error:

Open the Deployment UI at /hub/pcs/deployment.
Click the Data Ingestion Bucket link.
Click the Permissions tab.
Scroll down to the Cross-origin resource sharing (CORS) section.
Click Edit.
Paste this block into the CORS config section:
- Update the AllowedOrigins to be your EC2 instance’s domain name.

]
    {
        "AllowedHeaders": [],
        "AllowedMethods": [
            "PUT"
        ],
        "AllowedOrigins": [
            "https://<sub.domain.com>"
        ],
        "ExposeHeaders": []
    }
]

Click Save changes.

It should look similar to this:

Refresh the Deployment UI page.
Open the Uploader modal and check if the problem has been resolved.

13: Private Computation Infrastructure Upgrade Guideline and Questions

Q: What if I have an old instance of Private Computation Infrastructure where I have deployed infrastructure already, should I still see a VPC peering status on the deployment UI?

A: No, you would not. The previous VPC peering connection status will just get carry forward. So if the previous VPC peering connection was in pending state, either you could contact META representative to accept the connection request from META manually, or un-deploy and redeploy the infrastructure to avail the latest auto VPC peering feature.

Q: After upgrading, I see a popup dialog box showing “Share diagnostic data with Meta”. What should I do?

A: To help clients better troubleshoot issues and improve the product, it’s highly recommended to opt-in for diagnostic data sharing with Meta. it will automatically upload logs to Meta within 5 minutes after a failed run. No customer data (for example,, user identities, pixel events) will be included in the collected diagnostic data, and the retention days is 30-day maximum, with access controlled. More details can be found in the section Sharing Diagnostic Data with Meta.

14: Ensure Logging Permission Exists