Native GPGPU Tools in Quant Finance: Libor Swaption Portfolio Pricer (Monte-Carlo)

Benchmark Description

This application prices a portfolio of LIBOR swaptions on a LIBOR Market Model using a Monte-Carlo simulation. It also computes Greeks.

In each Monte-Carlo path, the LIBOR forward rates are generated randomly at all required maturities following the LIBOR Market Model, starting from the initial LIBOR rates. The swaption portfolio payoff is then computed and discounted to the pricing date. Averaging the per-path prices gives the final net present value of the portfolio.

The full algorithm is illustrated in the processing graph below:


More details can be found in Prof. Mike Giles’ notes [1].

This benchmark uses a portfolio of 15 swaptions with maturities between 4 and 40 years and 80 forward rates (and hence 80 delta Greeks). The performance is measured with varying numbers of Monte-Carlo paths (from 64K to 2,048K).

[1] M. Giles, “Monte Carlo evaluation of sensitivities in computational finance,” HERCMA Conference, Athens, Sep. 2007.

  • Application Class: Pricer
  • Model: Libor Market Model
  • Instrument Type: Swaption Portfolio
  • Numerical Method: Monte-Carlo
  • Portfolio Size: 15 swaptions
  • Maturities: 4 to 40 years
  • Number of Forward Rates: 80
  • Number of Sensitivities: 80
  • Monte-Carlo Paths: 64K-2,048K
  • Operating System: Red Hat Enterprise Linux 7.1 (64bit)
  • CPU: Intel Xeon E5-2666 v3 (Haswell)
  • RAM: 64GB RAM
  • GPU: Nvidia Tesla K40
  • GPU clocks: maximum
  • GPU driver: 352.79
  • Precision Mode: double
  • Compiler: GCC 4.8
The application is executed repeatedly, recording the wall-clock time for each run, until the estimated timing error is below a specified value. The full algorithm execution time from inputs to outputs is measured. This includes GPU memory allocations and data transfers. The speedup vs. a sequential implementation on a single CPU core is reported.

GPGPU Toolkit Comparison

ToolVersionProviderRandom Number GeneratorReductionThreads (Core Kernel)Global Memory (1024K paths)
CUDA7.5NvidiacuRand (MRG32K3a)thrust1 thread/path; 128 threads/block336MB
OpenCL1.2NvidiaclRNG (MRG32K3a)handcrafted1 thread/path; 128 threads/work group336MB

Speedup vs. Sequential*

(click on the legend to select the data series to display)

*the sequential version runs on a single core of an Intel Xeon E5-2698 v3 CPU

Request the Source Code