Retail Rocket Technologies

Retail Rocket software engineering team’s vision is to change the world of e-commerce and make it truly personalized.

Retail Rocket At A Glance:

Analytical cluster of more than 250 servers in 6 different data centers
More than 230 million unique visitors use our clients' websites monthly
More than 1,000 companies connected to Retail Rocket worldwide
Our servers process more than 450,000 external API requests per minute
API requests per second at peak: 15,000+
More than 200 man-years have been invested in the development of the platform
Zero customer data lost in 8 years

Data Science Approach

The essence of Retail Rocket’s work is to identify the needs of an online store customer by analyzing their behavior (clickstream, search queries, transaction history, etc.). In addition, we focus on the product matrix of the retailer and also personalize communications in any channel (website, mobile app, email, SMS, etc.). To form personalized recommendations, our data science team developed a scalable mathematical foundation. The following are several approaches we use today:

Content Filtering
Bayesian Statistics
Collaborative Filtering
Real-time Hybrid Personalization Algorithms
Predictive Analytics Based on Machine Learning and Markov Chains
Many Others.

Technology Stack

Analytical Platform

For machine learning, we use Spark based on the Hadoop Yarn platform – a cluster computing system that is best suited for our current tasks. As for native Hadoop components, we have Apache Kafka for data delivery, the distributed Machine Learning Mahout Library, and the Oozie Task Scheduler.

Retail Rocket team has a repository on GitHub with many interesting projects: an engine for A/B tests in JavaScript, a Spark MultiTool library in Scala, scripts for deploying a Hadoop cluster using Puppet, and others.

Apache Spark
Hadoop
Clickhouse
Scala
Kafka
Redis

Frontend

Almost everything the end-user receives is processed in clusters of Linux servers. The code is written in C#, Asp.Net MVC. All data is stored and distributed in three database management systems: Redis, MongoDB, and PostgreSQL.

When we need to ensure the interaction of distributed components, for example, when calculating a user segment, Thrift is used. Additionally, in order for various subsystems to receive a data stream from online stores, Kafka transport, mentioned above, is used.

.NET Core
C#
Kafka
AWS Lambda
PostgreSQL
Redis
NGINX
MongoDB

Development Process

In development, our team sticks to the methodology of continuous delivery of new functionality to customers (2,000+ stores are currently connected to us).

We use a bundle of Git + GitLab with the unit tests (as of the beginning of 2021, we have more than 3,000+ tests), acceptance tests, and code review.