The payoff for the move to NVIDIA GPUs was instantaneous, with inference latency reduced immediately by 10X. But Bing’s engineers weren’t about to stop there.
They incorporated the NVIDIA cuDNN GPU-accelerated deep learning library into their code and updated their driver mode from the Windows Display Driver Model to the Tesla Compute Cluster, dropping latency to 40 milliseconds for a total performance improvement of 60X. To detect more object categories on an image, they moved from a fast R-CNN two-stage process to a one-stage “single shot detection” process. This sped up the feature 10X and enables detection of over 80 image categories.
The Bing team also leverages a filter triggering model and Microsoft’s ObjectStore key-value store to limit the amount of data they need to process and cache results for future use. This helps them save over 90 percent of their costs, making it more economically feasible to service the volume of requests they receive daily.
The user experience offered by Bing Visual Search reflects these extra efforts. From the Bing search page, a user can select “image search,” type in text or upload a picture, and then either select hotspots automatically detected on the picture or draw a box on the parts of interest to trigger near-instantaneous search results. Drawing the box over, say, a purse, generates numerous purse-buying opportunities, complete with pricing.
On the development and deployment side, switching to NVIDIA GPUs has empowered the Bing team to be more agile and increase their rate of learning and innovation. With CPUs, it would take months to run updated models on the entire dataset of billions of images after every significant change. With GPUs, this process is now instantaneous, making it practical to update the models frequently and provide more features for Bing’s users.