WhatsApp Business Platform > On-Premises API > Guides > Set up & Maintain Your API Client

We are sunsetting On-Premises API. Refer to our On-Premises API Sunset document for details, and to learn how to migrate to our next-generation Cloud API.

High-Throughput Recommendations

High-Throughput messaging is only available to Business Solution Providers (BSPs).

Recommendations for Solution Partners and governments to achieve the best performance with the WhatsApp Business API.

This document covers:

Performance
Setup
Strategies
Multiple Phone Numbers

Performance

See Reference, Messages, Performance.

Monitoring

The first step of understanding and troubleshooting performance bottlenecks is to set up monitoring. While you can build your own monitoring setup by making metrics and stats API calls, we strongly recommend setting up the instance monitoring provided by WhatsApp, which provides granular insights about all components of the system. Recommendations made later in this documentation also frequently refer to graphs of our official monitoring solution.

Evaluating Performance

Load testing is a common engineering practice used to evaluate your system performance under a certain load in a controlled environment. Performance evaluation is available for all Solution Partners. These tests can help evaluate the message throughput of your system under production-like traffic before going into production and can help measure the resilience of your webhook system. It is commonly used to:

Troubleshoot performance issues
Benchmark system performance
Optimize infra cost

You can test a load by submitting a Direct Support ticket. Select the business, then Ask a Question. From the dropdown select:

WABiz: Request Outbound Load Testing (beta) to test outbound loads. During each test, messages are sent from the load test number to real numbers by mentioning throughput. To initiate a load test, you need to:

Prepare a messaging script that will be executed during the load test. This script will send a message to at least 250k phone numbers that can be executed at different throughputs using a message template provided by Meta. Both you and Meta will monitor the execution of the script during the test
Create a new setup that can support high throughput and register an outbound load test number provided by Meta. Note that no messages will be delivered and no charges will be incurred however you will receive the sent, delivered, and read webhook notifications
Set up a Grafana monitoring dashboard for the high throughput template according to our dev docs

You need to run and call the contacts node on all the recipient numbers before starting the outbound load test. This will ensure the most optimum load test setup.

Setup

Message throughput of a WhatsApp Business API number depends on the following factors:

Performance of the WhatsApp Business API client
Performance of the database
Performance of the webhook

In this section, we make recommendations on how to achieve optimal performance for each component.

WhatsApp Business API Client

For setting up a WhatsApp Business API client, it is strongly recommended that you use the deployment templates provided by WhatsApp:

This is recommended for the following reasons:

It is easier to debug for failures as these templates are developed by the WhatsApp team.
The WhatsApp team has developed the templates with best practices incorporated and benchmarked different throughput options. It is easier for businesses to get started with setting up the WhatsApp Business API Client for stable throughput with these templates.

Database

Database is critical to the overall performance of the system. It's important that the database is located as close to the Coreapp as possible. When sending messages or processing callbacks, the Coreapp intensifies IO operations to the database.

Besides the Database Configuration recommendations on the previous section, we recommend Average DB Write Query Latency and Average DB Transaction Latency to be below 15ms.

For phone numbers with a large database (more than 2 million rows in the messages table) or phone numbers that send messages at more than 120 messages / second, you should run DB garbage collection at least once a day during low-traffic time to maintain the database at a stable level.

If you require additional monitoring or insights, modify the RDS instance to apply the following optional changes:

Storage Autoscaling
- Select Enable storage autoscaling
Monitoring
- Select Enable enhanced monitoring
Performance Insights
- Select Enable Performance Insights

In the scenario where you are observing higher DB latency and have large number of rows in messageStore.messages_receipt_log table, we recommend upgrading to v2.39.2 or later, where indexes were added to improve database performance.

For businesses using a MySQL database, running a multiconnect set up with more than 16 shards, and expecting loads of 100 or more messages a second, we recommend setting the max_prepared_stmt_count to 22,000. Businesses should make sure their database servers have enough RAM to support this limit.

Click here for more information about max_prepared_stmt_count.

Webhooks

Webhook is another component critical to the overall messaging performance. It is important to keep the Avg Callback Request Latency below 80ms to ensure that it does not impact the system performance.

When the callback request latency is too high, callbacks start to accumulate in the callback queue, which has a maximum size of 100,000. When the callback queue is full, Coreapp considers the system to be under heavy load and prevents it from sending messages. Meanwhile, Coreapp continues to accept callbacks, such as sent/delivery receipts and incoming messages from the WhatsApp server, and appends them to the callback queue. In such situations, the system cannot send any messages until the queue drains down to a certain size.

To minimize webhook latency, we recommend that you set up your webhooks in following way:

Deploy the webhooks endpoint as close as possible to the Coreapp (but not on the same host).
For each callback request received by the webhook server, first respond with a 200 OK within 80ms before executing any business logic. Failing to do so would significantly increase the webhook latency. We highly recommend that you use a queue system for your webhook implementation and process callback requests asynchronously.

We always recommend having Webhooks configured even if your business use case is to only send notification messages because it's important to listen to customers. The Webhoook also receives error message notifications.

When a Webhook is configured, make sure that it's configured correctly as per the above recommendations; otherwise, it can break the system when it's under heavy load. Thus, even given the importance of having Webhooks configured, it’s better to have no Webhooks when not needed for higher performance, than having incorrectly configured Webhooks.

Mark Messages as Read

The Coreapp dispatches user messages to the Webhook. Once the messages are received, you can choose to mark messages as read (for instance, blue tick marks in the consumer version of WhatsApp). If the Webhooks are configured correctly and message volumes are within the limits, this can create a good user experience. However, if volumes are almost hitting thresholds on the Coreapp, you may not want to implement this.

Media Auto-download

When a message with media is received, the WhatsApp Business API client downloads the media. Once the media is downloaded, you receive a notification through your webhook. By default, media of all types are automatically downloaded. To save bandwidth and Coreapp workloads, you can choose to enable auto-download only for media types you intend to process.

Strategies

Check Contacts

A check contacts API call is only required for one-way notifications. If you are only replying to incoming messages, you do not need to check contact before replying to a message.

If your campaign is outbound focused, it is recommended that you run contact checks on all contacts that you plan to send messages to in advance. For example, if you send significantly more outbound messages than replying to inbound messages, you may want to run contact checks overnight.

When you pre-run contact check before the campaign:

You can break down all contacts into batches and check each batch of contacts with one request.
Check contact results are cached in the database for 7 days. To ensure a fresh 7 days of cached data, you may use the force_check parameter while pre-running contact check.

During the campaign, as the check contact result has been cached, you may skip the check before sending messages.

Media Messages

Media messages reduce the throughput by a significant factor. Please avoid sending media messages during times of demanding performance needs.

Sending a Single Media File to Many Users

You may consider media messages if you have the option to send the same media file for many or all users (for example, your company logo). There are a couple of ways to send media: by media ID or by media URL. In this case, consider sending media only by media ID because the media can be uploaded once to the Coreapp and reused many times. If you choose to send media by URL, the media file is downloaded each time.

Although the upload only happens once, there are additional costly computations that need to be done in the Coreapp to send media messages, so use this option only if necessary. It is recommended that you use only text messages under high load. You may create a feature flag to toggle sending media messages on and off.

Launching

It's strongly recommended you roll out your launch in a graduated manner. Always start with launching to a fraction of your users; learn the behaviors of traffic, then do a wider launch. Similarly, if you are launching globally, consider initially launching to a region with lower traffic, then gradually adding other regions.

Multiple Phone Numbers

If throughput needs are much higher than a single phone number can handle, you can consider using multiple phone numbers.

Follow all of the above recommendations for a single phone number.
Do not use the same database instance to host multiple phone numbers.
Once a phone number is launched to send messages, deprecating it likely leads to bad user experience. Be careful to only launch the required set of phone numbers and plan carefully how to gracefully migrate traffic away from a phone number if you need to deprecate it.

Strategies to split traffic among multiple phone numbers

Choose a rule to split traffic that makes sense for your use case, for example:
- Language-based — One phone number answers English questions while a second phone number answers in Spanish
- Region-based — One phone number answers for users in the United States while a second phone number answers users in India

Customer care messages

If you are driving the incoming traffic through a www.wa.me/$phone URL, please add one more level of redirection via your http server. The (http) server's responsibility is to determine which phone numbers can be used to serve the incoming request and determine the correct phone number using your “split-traffic” strategy. For example, if you use 2 phone numbers, set up the entry point as example.com/whatsapp, and when a user clicks on the link, evenly redirect them to wa.me/$phone1 or wa.me/$phone2 under the hood.
If a user is already messaging to a phone number that is getting unexpected loads of traffic or already hitting its load limits, redirect the user to use a different phone number.
During launch, if you are expecting unexpected loads of traffic, then have one additional phone number running a WhatsApp Business API client that is ready to serve the traffic.

Notifications

For notifications, you can estimate how many users each phone number can handle and divide the traffic accordingly. When the system is getting unexpected loads of traffic, then you can choose to spin-off a new WhatsApp Business API client with a different phone number.
To prevent negative feedback (i.e., spam/blocks), it’s important to map each recipient number to a single WhatsApp Business API client phone number. For example, if multiple phone numbers are hosted for ABC CupCakes and a user placed two orders for cupcakes, they probably expect to receive these two order notifications from a single WhatsApp Business API client number rather than a different number each time.

WhatsApp Business Platform | Cloud API | On-Premises API | Business Management API