TikTok is a video-sharing app that let users create and share short videos. It impresses users with its personalized recommendations just “for you” precisely. It is highly additive and very popular among young people. Behind it, it is powered by artificial intelligence technologies.
Table of Content
- TikTok Architecture
- Big data frameworks
- Machine learning
- Microservices architecture
- Back-of-the-envelope calculation
1. TikTok Architecture
The architecture of the TikTok recommendation system includes three components: big data frameworks, machine learning, and microservices architecture.
1. Big data frameworks are the starting point of the system. It provides real-time data streaming processing, data computing and data storage.
2. Machine learning is the brain of the recommendation system. A range of machine learning and deep learning algorithms and techniques are applied to build models and generate recommendations to suit individual preferences.
3. Microservices architecture is the infrastructure underneath to make the whole system serve fast and efficient.
2. Big data frameworks
No data, no intelligence.
Most data are coming from the users’ smart phones. That includes operating system and installed app etc. More importantly, TikTok pay special attentions to the users activity logs, such as watch time, swipe, likes, shares and comments.
The log data are collected and aggregated through flume and scribe. They are piped into Kafka queue. Then Apache Storm processes data streams in real time with other components in Apache Hadoop ecosystem.
Apache Hadoop ecosystem is a distributed system for data processing and storage. This includes MapReduce, the first generation of distributed data processing system. It processes data in parallel with batch processing. YARN is a framework for job scheduling and cluster resource management. HDFS is a distributed file system. HBase is a scalable, distributed database that supports structured data storage for large tables. Hive is a data warehouse infrastructure that provides data summarization and querying. Zookeeper is a high-performance coordination service.
As data volumes grow fast, the real-time data processing frameworks come to the picture. Apache Spark is the 3rd generation framework that helps with near real-time distributed processing for big data workloads. Spark enhances the performance of MapReduce by doing the processing in memory. In the last couple years, TikTok applyies the 4th generation framework Flink. It is designed to do real-time streaming processing natively.
The database systems include MySQL, mongoDB and many others.
3. Machine learning
This is the center how TikTok earn household name of “hyper-personalized, addictive algorithm”.
After vast datasets pour in, next is content analysis, user profiling and context analysis. The neural-network deep learning frameworks such as TensorFlow are used to perform computer vision and native language processing (NLP). Computer vision will decipher images with photo and videos. NLP includes classification, labeling and evaluations.
The classic machine learning algorithms are used, including logistic regression(LR), convolutional neural network (CNN), recurrent neural network (RNN) and gradient boosting decision trees(GBDT). The common recommendation approaches are applied, such as content based filtering(CBF), collaborative filtering(CF) and more advanced matrix factorization(MF).
The secret weapons that TikTok use to read your mind are:
1. Algorithm experimental platform: the engineers experiment the mixing of multiple machine learning algorithms such as LR and DNN. Then run the testing (A/B test) and do the adjustment.
2. Extensive classification and labeling: The models is based on the users engagement such as watch time, swipe in addition to the common used likes or shares (not what you say in public eyes, but what you do as reflection of your subconscious says more about you). The number of user features, vectors and categories are more than most of the recommendation systems in the world. And they keep adding more.
3. User feedback engine: It updates the models after retrieving feedback from the users in multiple iterations. The experience management platform is built on this engine, and ultimately improves the perditions and recommendations.
To solve cold-start problem in recommendation, the recall strategy is used. It is to select thousands of candidates from tens of millions of videos that have been proved to be popular and have high-quality.
Meanwhile some of the AI work have been moved to the client side for super-fast response. That includes real-time training, modeling and reasoning in smaller size done on the devices. The machine learning frameworks such as TensorFlow Lite or ByteNN are used at the client side.
4. Microservices architecture
TikTok have embraced cloud-native infrastructure. The recommendation components such as user profiling, predictions, cold-start, recall, user feedback engine are serving as APIs. The services are hosted in cloud such as Amazon AWS and Microsoft Azure. As the outcome of the system, the video curation will be pushed to the users through cloud.
TikTok employ Kubernetes-based containerization technology. Kubernetes is known as container orchestrator. It is the toolset to automate the applications life cycle. Kubeflow is dedicated to making deployments of machine learning workflows on Kubernetes.
As part of cloud-native stack, service mesh is another tool to handle service-to-service communication. It controls how different parts of an application share data with one another. It inserts features or services at platform layers, rather application layer.
Due to the requirement of high-concurrency, the services are built with Go language and gRPC. In TikTok, Go has become the dominant language in service development because of its good build-in network and concurrency support. gRPC is a Remote Procedure Control framework to build and connect services efficiently.
The success of Tiktok is that they would go extra miles to provide the best user experience. They build in-house tools to maximize the performance at low-level (system-level). For example, ByteMesh is improved version of Service Mesh, KiteX is high performance Golang gRPC framework, and Sonic is enhanced Golang JSON library. Other in-house tools or systems include parameter servers, ByteNN, abase to name a few.
As a TikTok machine learning principal Xiang Liang put it, sometimes the infrastructure beneath is more important than the (machine learning) algorithms above.
5. Back-of-the-envelope calculation (2022)
# of daily active users (US) | 50 millions |
# of monthly active users (worldwide/US) | 1.2 billions/138 millions |
# of video watched per minute | 167 millions |
time spent daily(average) | 52 minutes |
time feedback update | Within 10 minutes |
# of daily training data | 300TB ~ 1PB |
*Above is not official statistics. The numbers may go up and down as we speak.