AI for Hyper-Personalized Shopping

Deliver tailored experiences that enhance customer satisfaction with the power of AI.

Image courtesy of Verneek


Generative AI


Retail/ Consumer Packaged Goods

Business Goal

Return on Investment


Apache Spark

Adapting to Customer Expectations in the Digital Age

The consumer shift to e-commerce shopping is rapidly transforming the retail industry. The average consumer is exposed to up to 10,000 advertisements per day, but today’s digitally connected customers are savvy. According to McKinsey, 71 percent of customers expect an increasingly more personalized online shopping experience. They also expect an endless selection of products and are conditioned to look elsewhere if they have a poor shopping experience.

Meanwhile, a retailer’s inventory is complex, with thousands, if not millions of products that change seasonally. Not to mention the immensely competitive marketplace that’s emerged over the past decade, with tens of retailers selling the same or similar products. Experience differentiation is increasingly becoming a key to attracting and retaining​ ​customers. The winners have harnessed the power of AI and data science to offer real-time, hyper-personalized customer experiences that increase cart size, build brand affinity, and increase conversion.

How Retailers Are Scaling Personalization With AI

Recommender Systems

Making real-time product recommendations is extremely challenging. Customer preferences and needs change constantly. What they want today may change tomorrow or even within a single shopping session.

To use insights gained from the mass volumes of data collected while customers are shopping, a retailer needs a system that can quickly create a 360-degree view of customers to accurately predict preferences and make real-time recommendations. The broader product offerings are, the wider operations and a customer base grow and the harder this puzzle becomes to solve—which is why it requires AI. 

Retailers are using AI recommender systems to drive every action shoppers can take, from visiting a webpage to interacting with a shopping advisor (chatbot) to using social media for shopping. They also improve conversion by offering relevant products from the exponential number of available options.

With 25.7 million active customers and 150,000 products on site at a time, ASOS implemented NVIDIA Triton™ Inference Server to develop recommendations and personalized search systems that offer roughly 500,000 recommendations per second and 5 billion recommendations daily.

Personalized Recommendations

Brands that excel at delivering personalized shopping experiences can expect loyal customers, increased revenue, and a significant boost to their bottom line. 

Stitch Fix, the fashion e-commerce company, combines the art of personal styling expertise with insights powered by GPU-accelerated deep learning to recommend custom outfits to their clients. Clients use Pinterest imagery to communicate their personal style to the Stitch Fix stylists, but as Stitch Fix rapidly grew to 4 million active users, having stylists map the volume of photos to match to inventory wasn’t scalable. Stitch Fix changed the game with their image recognition system that automates outfit styling with deep learning. Now Stitch Fix’s more than 50 style recommendation algorithms match clothing and accessories to clients based on their unique style preferences.

“We aim to provide our customers with the best shopping experience. Our catalog contains over 150,000 products with 100 items being added every week. To ensure we can offer personalized recommendations at scale, we leverage machine learning for high-quality recommendations throughout the customer journey.”

Rick Bruins, Machine Learning Engineer, ASOS

Streamlining the Creation of Omnichannel Experiences

NVIDIA Merlin for Recommender Systems

NVIDIA Merlin™ is an end-to-end  framework for building high-performing recommenders at any scale. Using NVIDIA Merlin, data scientists and machine learning engineers are empowered to streamline building pipelines for session-based recommendations and more. Merlin components and capabilities are optimized to support the retrieval, filtering, scoring, and ordering of hundreds of terabytes of data, all accessible through easy-to-use APIs. With Merlin, better predictions, increased click-through rates, and faster deployment to production are within reach.

Building personalized customer engagement, retention, and brand loyalty strengthens companies’ economic engines. Companies that rely heavily upon engaging online with potential and established customers to drive customer retention and upsell are considering session-based recommender methods. Session-based recommenders enable data scientists, machine learning engineers, and their companies to build a streamlined recommender pipeline when little or no online user history is available. Leading companies are using session-based recommenders to increase model accuracy and drive quality customer engagement. The NVIDIA Merlin AI workflow for next- item prediction is designed to help companies build effective, personalized recommendations.

Optimal Inference for Generative AI Workloads

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use inference microservices designed to accelerate the deployment of generative AI across your enterprise. This versatile runtime supports open community models and NVIDIA AI Foundation models from the NVIDIA API catalog, as well as custom AI models. NIM builds on NVIDIA Triton™ Inference Server, a powerful and scalable open source platform for deploying AI models, and is optimized for large language model (LLM) inference on NVIDIA GPUs with NVIDIA TensorRT-LLM. NIM is engineered to facilitate seamless AI inferencing with the high throughput and low latency, while preserving the accuracy of predictions. You can now deploy AI applications anywhere with confidence, whether on-premises or in the cloud.

NVIDIA RAPIDS for GPU-Accelerated Data Processing

Retailers are realizing significant ROI and cost savings using data analytics and machine learning. 

Many enterprises use Apache Spark for key operations such as ingesting raw data into data lakes, business process analytics, loading data into data warehouses, and data preprocessing at the start of machine learning pipelines. However, slow, CPU-based infrastructure is constraining growing workloads. And slow processing costs time, money, and energy — resulting in a larger carbon footprint.

The NVIDIA RAPIDS™ Accelerator for Apache Spark takes advantage of NVIDIA GPUs to accelerate Apache Spark workloads without code changes. It operates as a plug-in to popular Apache Spark platforms. The RAPIDS Accelerator speeds up selected Spark operations while allowing other operations to continue running on the CPU. As a result, processing time is accelerated up to 5X, allowing the same work to be completed with 4X less infrastructure.

“Clients know what they like when they see it, so imagery is very helpful in allowing our clients to express their style to us.”

TJ Torres, Data Scientist, Stitch Fix


Faster Execution Time

  • Move data in and out of data lakes more quickly
  • Take advantage of faster analytics
  • Accelerate Al pipelines


Lower Costs

  • Save on cloud usage costs
  • Reduce power consumption and carbon footprint

NVIDIA Merlin for Recommenders Systems

NVIDIA Merlin is an end-to-end framework for building high-performing recommender systems at any scale.

Merlin includes libraries, methods, and tools that streamline the building of recommenders by addressing common preprocessing, feature engineering, training, inference, and deploying-to-production challenges. This is all accessible through easy-to-use APIs.

NVIDIA NeMo for a Hyper-Personalized Shopping Advisor

NVIDIA NeMo™ is an end-to-end platform for developing custom generative AI, anywhere. It includes tools for training, and retrieval-augmented generation, guardrailing and toolkits, data curation tools, and pretrained models, offering enterprises an easy, cost-effective, and fast way to adopt generative AI.

NVIDIA NIM for Inference

NVIDIA NIM™ is a containerized inference microservice that includes industry-standard APIs, domain-specific code, optimized inference engines, and an enterprise runtime. Leveraging tailored inference engines for each model and hardware configuration ensures optimal latency and throughput on accelerated infrastructure, lowering the cost of scaling inference workloads and enhancing the end-user experience. Developers can further enhance accuracy and performance by aligning and fine-tuning models with proprietary data sources within their data center's boundaries, in addition to supporting optimized community models.

Accelerating Apache Spark

For enterprises using Apache Spark, the RAPIDS Accelerator for Apache Spark uses NVIDIA GPUs to accelerate processing of extract, transform, and load (ETL) pipelines. 

The software provides automatic acceleration of Spark jobs via a plug-in that integrates with all major Spark platforms. No code changes are required—operations that can’t be accelerated will continue to run on the CPU with Spark’s built-in implementations.

The market often misunderstands the capabilities of GPUs, assuming they’re solely for AI projects. However, GPUs can offer significant speed and efficiency gains for data processing tasks.

The RAPIDS Accelerator for Apache Spark transparently accelerates existing Spark jobs—it doesn’t replace it.  It adds the benefit of GPU acceleration while still running Spark, ensuring compatibility and maximizing performance.

The RAPIDS Accelerator inspects the Spark physical plan.  It looks for executors and operators that have been coded to run on the GPU and replaces them in the physical plan.  For operations that aren't accelerated on the GPU, it continues to use the CPU executor or operator. The RAPIDS Accelerator inserts a row-to-columnar or columnar-to-row step when going from the CPU to GPU and vice versa.  Users can observe the CPU operators replaced by GPU operators in the Spark UI, in the explain output or in the driver log.

Building recommenders is hard. There’s no one-size-fits-all solution. Merlin is designed and built by NVIDIA AI data scientists, engineers, and researchers with deep and practical recommender expertise. Merlin provides a selection of libraries, wrapped model architectures, and methods that are anchored in best practices to help accelerate building, training, optimizing, and deploying recommender systems. Data scientists, machine learning engineers, and their companies can choose which NVIDIA Merin components, or libraries, to integrate into their existing workflow, or they can build a recommender system from scratch.

Build Hyper-Personalized Shopping Experiences