High performance CPU/GPU clusters -Increasing throughput and decreasing latency by leveraging mechanical sympathy

Location:

Legends Ballroom - Robinson-Whitman

Track:

Abstract:

Certain areas of finance, in particular Risk Management, have been running large compute clusters for years. However, simply relying on the cluster providers for orchestration does not necessarily lead to increased performance and definitely does not exploit the hardware resources efficiently. Similarly, various hardware acceleration devices such as FPGAs and GPUs have been gaining traction in finance, but leveraging them correctly in clusters raises the bar even higher. Achieving ROI on a large cluster of heterogeneous resources requires tailored data layouts and protocols, leverage of mechanical sympathy and specialized job orchestration algorithms. In this talk we explore some interesting strategies to increase cluster performance using techniques like topology-aware communication, job-aware caching, pre-fetching, real-time/streaming vs batch processing etc. We will also look into the techniques that are employed by some of the foremost technology pioneers and leaders in the cluster field and evaluate their applicability within the world of Finance.

Clive Saha

Clive graduated with a BS in Computer Science from Cornell and went on to get a Master's in Electrical and Computer Engineering from the Cornell Systems Lab. After spending several years on Wall Street working on distributed job control systems for Risk Management, he moved to Google. Over the last 8 years there, he has worked on some of the biggest distributed systems in the world - first on Web Search and then on a Paxos implementation that underpins most of Google's cluster software. He's currently working on large scale machine learning systems for better video experiences at YouTube in Paris