This article was written in collaboration with Aleksandr Sergeev, a Software Engineer at Meta, and Jesslyn Tannady, a Developer Advocate at Meta.
For today's interview, we have Aleksandr Sergeev, a software engineer on the Facebook iOS Reliability Team. The Facebook iOS Reliability Team focuses on ensuring that the Facebook mobile app on iOS works reliably and predictably. You can find his LinkedIn here.
My days tend to fall into two categories: low-urgency days and high-urgency days.
On my low-urgency days, I do work that must be done but do not necessarily need immediate attention. This type of work makes up about 90% of my days. It may be working on systems to prevent bugs from getting into production or creating tools that help fix bugs already in production. Additionally, Meta has systems to monitor crashes and notify a person responsible for fixing the crash. Nonetheless, when the app crashes in a novel way, the system may not know who to notify. Reports like this come to Reliability Engineers like me. It is my job to triage an issue to the person who owns the affected module so that they get fixed; if the owners have a hard time fixing the problem, it is my job to help them. Even though it is not very fun to work in crisis mode, the reward of seeing something important fixed right there and then is more imminent.
On my high-urgency days, I am putting out fires, such as an outage on the Facebook iOS app. I untangle crash logs, read code, and debug on those days. I then either switch unfruitful tests off, write code to fix the issue or report what I learned about the issue to other investigators. Most of the code I work on to optimize app performance lives behind experiments.
Often when we're writing code and rolling out new features, we hypothesize how this will affect our users' experience, but we can never be sure. We generally A/B test feature rollouts to gather metrics like performance and bandwidth usage. We test new functionalities with a small group of users in our experiment sandboxes, and we then roll it out to our broader user base if the result is favorable. Running small-scale tests allows us to rapidly alter application behavior without recompiling it.
Meta is a very metrics-driven company. We're constantly trying to innovate, but we need to ensure that code changes that affect our products result in positive outcomes. As a mobile developer at Meta, the code I write often touches large-scale projects, and I can easily see the ways my work impacts real users.
When working with the Facebook iOS codebase, I primarily work with Objective-C, C++ and a mix of them (Objective-C++). I use XCode as a dev environment, LLDB as a debugger and Meta's internal tools for data analysis, analyzing error logs and A/B testing.
Even though Swift is now the recommended language for developing iOS apps, many libraries for building iOS apps were designed with older languages like Objective-C and C++ in mind. XCode is an excellent dev environment when you're building iOS apps. Apple made it, and it's generally easier to develop within an ecosystem. I like that LLDB, XCode's default debugger, comes with excellent support for debugging on the desktop, iOS devices and simulator.
If you're interested in learning more about Objective-C++ for mobile development, I recommend checking out ComponentKit, a declarative UI framework for iOS. They have a great Getting Started guide, and since ComponentKit is open-sourced, you can check out the source code at their GitHub repo.
I recently fixed a problem with UICollectionView's delegate that was known to cause issues. But first, let me start by explaining UICollectionView and delegate objects.
UICollectionView is a class in Apple's UIKit framework that manages data items and presents them using customizable layouts. We use UICollectionView in the Facebook iOS app (e.g., to display items in Newsfeed).
Each collection view may have one and only one delegate object associated with it. However, we sometimes need more than one delegate to clean the code. For example, we might need one object to collapse the navigation bar when a user scrolls through Newsfeed and another one to handle taps on items in Newsfeed.
Each delegate has a contract (an Objective-C protocol) that they need to conform to. However, this contract is pretty relaxed—all the delegate's methods are "optional," so technically, a developer does not have to implement these delegates.
UICollectionView can figure out which methods have been implemented, but this process is time-consuming; therefore, UICollectionView keeps a cache of methods that the delegate implements.
Here's where things get complicated. Any delegate in a chain of delegates can disappear at any point, i.e., be nullified unexpectedly. Delegates may get nullified because weak references store delegates, and it's common to use weak references to avoid retaining cycles and memory leaks. In some cases, when a second delegate gets nullified, the code wouldn't correctly handle these unexpected delegates' nullifications.
This problem was a rare occasion before but popped up more and more recently. Something changed in our second delegate lifecycle. Our new Facebook iOS release started behaving unpredictably and throwing somewhat obscure errors. UICollectionView was trying to send a message to its delegate. Still, the chain of delegates was broken as the second delegate was nullified, and we were unable to process the message.
The person who wrote this code back in 2017 was no longer at Meta, so I had to do a lot of research into what happened and how to fix it. In the end, the fix was to attach a guarding object to our delegates. Such a guarding object would notify us when a delegate in a chain is about to be nullified to update UICollectionView on our ability to handle messages. UICollectionView will then update its internal cache of methods that delegate implements.
Usually, we can mitigate problems by switching off the experiments that guard the problematic code. However, we had to submit a hotfix to the App Store in this case.
We are still trying to figure out what caused the decrease in lifespan of this specific delegate, and we hope to get to the bottom of this problem soon.
Absolutely, for example, we launched an experiment to improve video performance on the Facebook iOS app. We hypothesized that enabling buffered disk writes for video cache would reduce disk writes and crashes, and we also hypothesized that this change might slightly increase cache misses.
So we ran the experiment and were pleased to see massive improvements with disk management. However, we discovered that this came at memory and network usage costs. Ultimately, we decided to cut the experiment short to evaluate how we could reduce our code's memory and network regressions. We hope to rerun this experiment soon after improving our code.
Yes, I do from time to time. For example, I found a bug in the Clang compiler in 2020. I discovered this bug when I saw an obscure log message that read something close to "
__weak variable at 0x7ffeefbff410 holds 0x7ffeefbff430 instead of 0x1006b2750. This is probably an incorrect use of objc_storeWeak() and objc_loadWeak(). Break on objc_weak_error to debug."
I looked at the code and kept stripping lines until I isolated 20 lines that were causing the issue. Even though I still didn't understand the problem, I successfully identified where the issue was stemming from. I shared my findings with the LLVM support group. LLVM is the organization that Clang stemmed out of. Other devs in the support group confirmed a compiler problem there.
It's incredible that just identifying the source of a problem is a value add. I didn't have to be a Clang expert or know how to fix the problem to be helpful. Discovering bugs like this doesn't happen every day, however, it was exciting that I was able to contribute to the changes in Clang.
Yes, there are some big misconceptions. For example, I often hear people talk about how mobile development is as simple as parsing JSON files generated by a server and displaying that data on a screen. Modern mobile development is more diverse than that! One can work on a product, infrastructure or application reliability, and all of these are very different and essential jobs.
Mobile developers who work on products focus on features close to the end-user (e.g., Facebook Newsfeed). They typically follow strict feature launch deadlines, so they need tools to make sure they can meet these deadlines.
Mobile developers who work on infrastructure, work on tools that accelerate feature development. For example, the engineers who work on tools like ComponentKit help iOS developers at Meta build UIs faster. ComponentKit is a React-inspired view framework for iOS that was initially created at Meta and later open-sourced so that anyone can use it.
And finally, mobile developers that work on application reliability like me make sure that we have scalable and highly reliable software systems. Once features get launched, we need to ensure that the product works reliably. If an app kept crashing, all of the hard work from the developers on product and infrastructure would go to waste.
The most exciting part about working in the mobile space today is investigating and resolving very complex problems, and collaborating with brilliant people worldwide. I am looking forward to seeing new interfaces for interacting with mobile devices in the future. It seems like there's a lot of exciting work being done at Meta around wrist-based neural interfaces, hand-tracking on mobile devices and haptic experiences. It's fantastic to see all the next-gen inventions developing natural, intuitive ways to interact with computing platforms. I'm eager to see how this translates to pushing frontiers in human-mobile computer interaction.
I am a big fan of Apple's documentation, and they go over many low-level details of mobile development that tend to get overlooked but are very important. I think that I neither would have uncovered that LLVM bug, nor would I have understood what was wrong with the UICollectionView delegate if I had not read the Apple documentation on memory management. And it's all thanks to my first mentor for recommending I read these docs many years ago.
Having an expert with a lot of experience tackling the types of problems you're trying to tackle is a game-changer. It's helpful to have someone to answer my questions and point me at resources relevant to what I am trying to learn that helped me focus my growth as a mobile developer. It would have taken me much longer to stumble upon the resources that my mentor pointed me to!