We are sunsetting On-Premises API. Refer to our On-Premises API Sunset document for details, and to learn how to migrate to our next-generation Cloud API.

Demystifying Send Message Performance

This doc will be deprecated soon. For higher performance needs, please refer to the High Throughput Recommendations documentation.

The WhatsApp Business API client allows your business to communicate with your customers in a programmatic way over WhatsApp. When you install the API client, it comes with a Webapp container and Coreapp container.

  • The Coreapp is primarily responsible for all the backend operations such as sending messages via the WhatsApp servers, processing callbacks received from the WhatsApp servers, saving incoming messages/callbacks to the database, etc.
  • The Webapp is primarily responsible for hosting the API, enabling communication with the Coreapp.

This document covers:

Background

Check Contacts

To send a message to a recipient, you first need to check whether the recipient has a valid WhatsApp account.

When you send an API call to the contacts endpoint the Coreapp talks to the WhatsApp servers to verify the recipient has a WhatsApp account, then caches that status in the database for 7 days. Because of this cache, the Coreapp does not need to check the recipient account status with the server for the next 7 days. However, it can happen that the Coreapp allows the sending of a message to a recently deleted (< 7 days) WhatsApp account. In this case, no error messages are returned to the WhatsApp Business API client; the message will go to the WhatsApp servers, but will not be sent on to the recipient.

It is also instrumental to know that you can check more than one phone number with a single contacts API call. Similarly, the Coreapp can also include more than one phone number in its calls to the WhatsApp servers. The WhatsApp servers rate limit the Coreapp to only one contacts API call at a given time, but the Coreapp rate limits the WhatsApp Business API client to 40 calls/sec. While the contacts call from the Coreapp to the server is in progress, the Coreapp accumulates all the incoming check contacts requests; once the Coreapp receives a response from the servers, it makes the next contacts call to the server with all the accumulated phone numbers.

Send Message

When a message is sent using the messages endpoint, the Coreapp checks the database to verify the recipient has a existing WhatsApp account. If the account's status is valid, it persists the message in the job queue (database), then asynchronously issues the message ID to the client and attempts to send the message to the recipient number via the servers.

Each recipient in WhatsApp has two keys — a public key and a private key. Public keys are shared with senders so that sender can encrypt the message. Using the private key, the recipient decrypts the message. When your business sends messages with the WhatsApp Business API, the sender is the Coreapp. When the Coreapp messages the recipient for first time, the public key exchange happens between the Coreapp and recipient. Subsequently, these keys are used to encrypt/decrypt the messages. However, if the Coreapp has never messaged the recipient before, it doesn't have a recipient key to encrypt. In this case, the Coreapp talks to the server and requests a pre-key, which the Coreapp can use to encrypt the message. After encrypting the message using the pre-key/public key, the Coreapp will attempt to send the message via the servers.

The above explanations highlight that for the Coreapp to send one message successfully, it needs to make up to three successful roundtrip exchanges with the server:

  1. Checking whether the recipient has a valid WhatsApp account with a contacts API call,
  2. Fetching the pre-key from the server, if it's the first time the Coreapp is communicating with the recipient, in order to encrypt the message, and
  3. Sending the message to the recipient.

Performance

Using the WhatsApp Business API client, the throughput of 70 messages per second (mps) can be achieved. We tested this performance using the setup specified below.

Coreapp performance might degrade if the volume of messaging is above 70 mps. So, the number of API requests to the messages endpoint are rate limited to 80 per second. Request rate limits are implemented for each TCP endpoint. Both messages and contacts calls run on different TCP endpoints. This means it's possible to achieve the performance of both 80 messages per second and 40 check contacts per second without being rate limited.

The below report covers 70 mps with expected latencies and throughput using different components of system with the Coreapp. These numbers can be used as benchmarks to compare and contrast with your own setup.

Setup used

Client

AWS infrastructure is used to measure the performance. We used the AWS template to launch a WhatsApp Business API client. Below are the configuration parameters used for this template while launching it.

General configuration

EnvType

production

HAEnabled

disabled

NumCoreappInstances

2

Network configuration

VpcId

****

SubnetIDs

****

SubnetCount

2

LBScheme

internet-facing

LBSubnets

****

Container configuration

InstanceType

c4.4xlarge

EBSVolumeSize

128

KeyName

****

WAEntContRegistry

docker.whatsapp.biz

WAEntContTag

v2.31.4

Database configuration

ConfigOnDB

TRUE

DBEngineVersion

5.7.26

DBHostname

DBIOPS

10000

DBIdleConnectionTimeoutMS

180000

DBMultiAZEnabled

disabled

DBStorageCapacity

1024

DBStorageType

io1

DBInstanceClass

db.m5.12xlarge

DBUser

****

DBPassword

****

DBPort

3306

PersistDBConn

enabled

* Remaining configuration left with default values

Tests

The tests are designed to generate traffic simulating real-world usage of the WhatsApp Business API client.

A total of 4 concurrency test clients generate messages, each sending a message and waiting for a 200 OK response. As soon as the 200 OK response is received, the test client attempts to send the message again without any delays. For each of test runs, we sent a total of 10,000 messages.

Further, the test clients make a contacts API call for each message before sending. This generally took around ~8-9 minutes to complete. The results below are based on these tests. For each test run, we cleared the contacts cache and all pre-keys; this means the Coreapp has to make two additional roundtrips exchanges with the servers for these two operations.

Monitoring

It's important to configure a monitoring set up for the Coreapp as it helps measure the performance of the Coreapp and its interactions with other system components. While you can build your own monitoring set up by making metrics and stats API calls, we strongly recommend setting up the instance monitoring provided by WhatsApp. This is the set up used for this demonstration.

Results

Terminology

Definitions of terms for each test operation as seen in the following graphs:

  • messaging:ok — A messages call from the WhatsApp Business API client to the Coreapp
  • contacts:ok — A contacts call from the WhatsApp Business API client to the Coreapp
  • UnifiedSyncResult:ok — A contacts call from the Coreapp to the server
  • SendMessage:ok — A messages call from the Coreapp to the server
  • SendGetPreKeyBatchResult:ok — A pre-key fetch call from the Coreapp to the server

Endpoint requests/sec

Endpoint requests/sec

The number of operations per second (OPS) for messaging is a constant 70 for the entire test duration.

Average server request latency

Average server request latency

The latency for each server call may not be the same because the kind of work the server needs to perform for each operation is different. But, in general, these latency values are in the range of 80 to 150 ms. If the latency values with the servers are much higher than this, you should focus on debugging and correcting your configuration.

Server requests/sec

Server requests/sec

The OPS received from the server to the Coreapp to send a message is almost always the same as the throughput received from the WhatsApp Business API client with the Coreapp. The OPS for a pre-key fetch (SendGetPreKeyBatchResult:ok) may be less than or equal to sending a message (SendMessage:ok) this is because if the Coreapp is not reaching out to the recipient for the first time, a pre-key fetch call to server is not required.

The OPS for checking contacts (UnifiedSyncResult:ok) will be less than or equal to the OPS for sending a message. This is because only one call can be made from the Coreapp to the servers. While a contacts call is in progress, the Coreapp consolidates all the contacts requests. It sends these requests in subsequent call, after receiving the response from previous call.

Database write latency

Database write latency

One of the primary factors of performance is the database. It's important that the database is located as close to the Coreapp as possible. When sending messages or processing callbacks, the Coreapp performance intensifies the IO operations to the database. This sample of IO write latency for a throughput of 20 mps makes it clear that the recommended accepted database write latency is ~5ms.

Callback Latency

For each message, there can be up to three callbacks from the WhatsApp servers to the Coreapp. These callbacks are sent, delivered and read notifications. When the callbacks are received, the Coreapp persists these in the database and asynchronously returns success to the WhatsApp server and forwards these to callback server. For an increased number of callbacks, there is a rise in the number of writes to the database; this is another reason to keep monitoring database latency.

Here is the callback latency graph from our testing. Please note that it is processing only sent notifications but not delivered or read.

Callback latency

The default size of callback queue is 100,000. If the callback server latency is high and callback volume is very high, these callbacks are queued in the callback queue. When the callback queue is full, the Coreapp stops accepting all messages API calls but continues to accept callbacks from the server, appending them to the callback queue. Here the Coreapp throughput to send messages becomes zero until the queue is emptied, up to certain point. This is why it's important that the latency with your callback server falls into the above range.

Job queue performance

Job queue performance

The job queue of the Coreapp is almost flat with ~1% usage for entire test duration.

mysqlslap

Please refer to the Debugging section below to learn about mysqlslap and its recommended configuration.

Benchmark
    Running for engine innodb
    Average number of seconds to run all queries: 5.648 seconds
    Minimum number of seconds to run all queries: 5.648 seconds
    Maximum number of seconds to run all queries: 5.648 seconds
    Number of clients running queries: 5
    Average number of queries per client: 1000

Running the suggested query with the database having Single-AZ took ~5 seconds. In other words, for executing a total of 1000 writes, it took approximately ~5 seconds. THis means each write operation took ~5 ms on average.

Common pitfalls

The way a database is configured is important to achieving high throughput. In this section, we show how a simple setting in the database can affect the Coreapp performance and publish those deviant results. It should help to easily compare these metrics when Coreapp performance is not ideal.

Multi-AZ vs. Single-AZ

As mentioned, we used RDS as the database from AWS for our testing. Creating RDS with Multi-AZ reduced the throughput of the Coreapp by at least 25%. This reduced performance is significantly noticed when maintaining the peak for longer periods.

Multi-AZ: Amazon RDS provides high availability and failover support for database instances using Multi-AZ deployments (see High Availability (Multi-AZ) for Amazon RDS for more information). The important part, AWS claims, is that database instances using Multi-AZ deployments may have increased write and commit latency compared to a Single-AZ deployment due to the synchronous data replication that occurs, and this impact is more noticeable for large and write-intensive database instances.

Database write latency

Database write latency

According to the Amazon RDS Under the Hood: Multi-AZ blog post, assessment shows increases in database commit latencies of between 2 ms and 5 ms. With our test client, which has a write-intensive load, the commit latencies are approximately ~15 ms, which is 2 times longer than with Single-AZ. The above graph shows the average database write latency while using Multi-AZ.

Endpoint requests/sec

Endpoint requests/sec

Here is the number of operations per second (OPS) for messaging when using Multi-AZ during one of our test runs. It sustained approximately ~12 OPS. For different database configurations with different storage types (magnetic vs. SSD), the throughput of the Coreapp fluctuated between 10 to 15 OPS when using Multi-AZ. This is a huge dip in performance from 20 OPS when using Single-AZ.

CloudWatch write IOPS

CloudWatch write IOPS

Here is the CloudWatch graph with a comparison of the write IOPS of Multi-AZ and Single-AZ, keeping all other setup constant. While Single-AZ could reach more than 1800 IOPs, Multi-AZ hardly reaches 1000 IOPS. This reduced write IOPS of the database influences the performance of the WhatsApp Business API client. In our testing we noticed the performance of the Coreapp is reduced from 20 mps to ~10 to 15 mps.

Job queue performance

Job queue performance

The above is the job queue utilization with Multi-AZ turned on.

At it's best, the Coreapp using Multi-AZ gives a performance of ~15 mps, we can see the job queue reaching its peak with intense loads. As per the Coreapp design, once the job queue reaches its limit, it waits until the 50% of queue is served before accepting new jobs.

On the other hand, the job queue of the Coreapp using Single-AZ is almost a flat ~1% usage. This is a large difference in the performance of the job queue between using Single-AZ and Multi-AZ. These results should help us understand the important role the database plays in Coreapp performance.

mysqlslap

Please refer to the Debugging section below to learn about mysqlslap and its recommended configuration.

Benchmark
    Running for engine innodb
    Average number of seconds to run all queries: 15.581 seconds
    Minimum number of seconds to run all queries: 15.581 seconds
    Maximum number of seconds to run all queries: 15.581 seconds
    Number of clients running queries: 5
    Average number of queries per client: 1000

Running the suggested query against the database with Multi-Az turned on took ~15 seconds. In other words, executing a total of 1000 writes took approximately ~15 seconds. It means each write operation took ~15 ms on average, while it took ~5ms for Single-AZ.

Debugging

Debugging database latency

The mysqlslap tool can be used to simulate the client load for a MySQL server, emulating as if multiple clients are accessing the server. It then reports the overall time to execute those queries.

Query

mysqlslap -uWA_DB_USERNAME -pWA_DB_PASSWORD -h WA_DB_HOSTNAME -P WA_DB_PORT --auto-generate-sql --concurrency=5 --number-int-cols=5 --auto-generate-sql-load-type=write --auto-generate-sql-secondary-indexes=2 --auto-generate-sql-execute-number=1000 --engine=innodb --commit=1 --auto-generate-sql-add-autoincrement —auto-generate-sql-unique-write-number=200 —verbose

Example query values

mysqlslap -uroot -pmypassword -h dbhostname.rds.amazonaws.com -P 3306 --auto-generate-sql --concurrency=5 --number-int-cols=5 --auto-generate-sql-load-type=write --auto-generate-sql-secondary-indexes=2 --auto-generate-sql-execute-number=1000 --engine=innodb  --commit=1  --auto-generate-sql-add-autoincrement --auto-generate-sql-unique-write-number=200 --verbose

Make sure to run these queries from the machine running the Coreapp. This query simulates 5 concurrent clients doing a total of 1000 inserts into the database.

Debugging callback server latency

The WhatsApp Business API troubleshooting tool (wadebug) can be used to debug callback server latency.

Snapshots

In the above sections, we presented the critical system components that contribute to performance. If you would like to look at all the metrics or more detailed metrics of above components, there are snapshots of those results:

For these two tests, a total of 10,000 templated messages were sent.

Also, here is the snapshot for long hours testing. In this test, a total of 100,000 templated messages are sent.

Delays

If there is a delay in a subset of numbers, then it is likely not an issue affecting the customers integration but rather an issue on the recipients end. These delays in delivery can happen for a number of reasons, including:

  • Phone is off
  • Phone is on airplane mode, data is off or unavailable or has a bad connection
  • Background data is turned off for WhatsApp (in this case messages are only delivered when WhatsApp is open on the phone)