Retail Rocket Technologies

Since the 2012, when the first line of code of Retail Rocket was written, Retail Rocket engineering and data science team is passionate about changing the world of Ecommerce and making it truly personalized.

A few numbers briefly describing Retail Rocket:
• More than 80 servers (mostly in Germany).
• 100+ million unique visitors (unique cookies) processed per month.
• 1000+ webshops connected to Retail Rocket all over the world.
• 450000+ external requests per minute (on average).
• 45 man-years invested in the development.

Data science approach

The core of the Retail Rocket is identifying needs of webshop users by analyzing their behavior and product databases. Generation of real-time personalized recommendations requires using all the state of the art fundamental approaches and algorithms, including:

  • Content filtering.
  • Collaborative filtering.
  • Predictive analytics based on machine learning and Markov chains.
  • Bayesian statistics.
  • Real-time hybrid personalization algorithms.

… and many more.

Roman Zykov, Retail Rocket Chief Data Scientist, speaking during the RecSys 2016 annual conference @ MIT University (Boston)
Activity in the community of Data Science

Our engineering team is a very active member of the data science community with a number of publications and awards at the data science competitions, as well as talks at the notable industry events.




Analytics cluster

For machine learning we use Spark framework based on Hadoop Yarn platform – a cluster computing system using Scala – a functional programming language. As for native Hadoop components – we use Apache Flume for the data transferring, Mahout Machine Learning and Oozie scheduler.

Retail Rocket team supports a repository the GitHub with a number of interesting projects: Simple A/B-testing engine in JavaScript, Spark MultiTool Library for Scala, Hadoop cluster deploying scripts using Puppet.



Almost everything that a user interacts with, is processed by IIS web servers, the code is written in C #, Asp.Net MVC. Databases we use: Redis, MongoDB, PostgreSQL.

When we need to execute the interactions between distributed components, for example, for the calculation of the user segment based on the User-Agent (audience profiling) we use Thrift. And to build the data stream from the online webshops to various subsystems we use Flume transport, that was already mentioned above.

Development process

We are advocates of the continuous delivery methodology as the most efficient for our customers (as of today we are working with more than 1000 stores).

To support it we use a chain of technologies Git + GitLab + TeamCity with automated unit tests (1200+ as of the beginning of 2017), acceptance tests and code review procedures.