Vectorising and distributing NTTs to count Goldbach partitions on Arm-based supercomputers

Ricardo Jorge Bastos Cordeiro de Jesus, Tomas Oliveira e Silva, Michele Weiland

Research output: Contribution to conferencePaper

Abstract / Description of output

In this paper we explore the usage of SVE to vectorise number-theoretic transforms (NTTs). In particular, we show that 64-bit modular arithmetic operations, including modular multiplication, can be efficiently implemented with SVE instructions. The vectorisation of NTT loops and kernels involving 64-bit modular operations was not possible in previous Arm-based SIMD architectures, since these architectures lacked crucial instructions to efficiently implement modular multiplication. We test and evaluate our SVE implementation on the A64FX processor in an HPE Apollo 80 system. Furthermore, we implement a distributed NTT for the computation of large-scale exact integer convolutions. We evaluate this transform on HPE Apollo 70, Cray XC50, and HPE Apollo 80 systems, where we demonstrate good scalability to thousands of cores. Finally, we describe how these methods can be utilised to count the number of Goldbach partitions of all even numbers to large limits. We present some preliminary results concerning this problem, in particular a histogram of the number of Goldbach partitions of the even numbers up to 240.
Original languageEnglish
Number of pages15
Publication statusPublished - 3 May 2021
EventCray User Group 2021 - Online
Duration: 3 May 20215 May 2021


ConferenceCray User Group 2021
Abbreviated titleCUG 2021

Keywords / Materials (for Non-textual outputs)

  • A64FX
  • Arm
  • Goldbach partitions
  • modular multiplication
  • NTT
  • SVE
  • ThunderX2
  • vectorisation


Dive into the research topics of 'Vectorising and distributing NTTs to count Goldbach partitions on Arm-based supercomputers'. Together they form a unique fingerprint.

Cite this