Khronos Releases SYCL 2020 Specification
Major update includes dozens of new features and closer alignment with ISO C++ Significant SYCL adoption in embedded, desktop, and HPC markets
Beaverton, OR – February 9, 2021 – Today, The Khronos® Group, an open consortium of industry-leading companies creating advanced interoperability standards, announces the ratification and public release of the SYCL™ 2020 final specification—the open standard for single source C++ parallel programming. A major milestone encompassing years of specification development, SYCL 2020 builds on the functionality of SYCL 1.2.1 to provide improved programmability, smaller code size and increased performance. Based on C++17, SYCL 2020 enables easier acceleration of standard C++ applications and drives a closer alignment with the ISO C++ roadmap.
First introduced in 2014, SYCL is a C++ based heterogeneous parallel programming framework for accelerating High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and tensor accelerators. SYCL 2020 will further accelerate adoption and deployment of SYCL across multiple platforms, including the use of diverse acceleration API backends in addition to OpenCL™.
SYCL 2020 integrates more than 40 new features including updates for streamlined coding and smaller code size. Some key additions include:
● Unified Shared Memory (USM) enables code with pointers to work naturally without buffers or accessors
● Parallel reductions add a built-in reduction operation to avoid boilerplate code and achieve maximum performance on hardware with built-in reduction operation acceleration
● Work group and subgroup algorithms add efficient parallel operations between work items.
● Class template argument deduction (CTAD) and template deduction guides simplify class template instantiation
● Simplified use of Accessors with a built-in reduction operation reduces boilerplate code and streamlines the use of C++ software design patterns
● Expanded interoperability enables efficient acceleration by diverse backend acceleration APIs
● SYCL atomic operations are now more closely aligned to standard C++ atomics to enhance parallel programming freedom
More information can be found in the SYCL FAQ posted on the Khronos Blog today.
“SYCL 2020’s primary goal is to achieve closer convergence with ISO C++, furthering our work to bring parallel heterogeneous programming to modern C++ through open standards. SYCL can leverage diverse processors to accelerate problems in many application domains including HPC, automotive, and machine learning,” said Michael Wong, Codeplay distinguished engineer, ISO C++ Directions Group and SYCL working group chair. “SYCL has a growing number of implementers and researchers working on real-world applications in markets ranging from supercomputing to embedded processing. The insights from that work, along with the feedback we collected from the SYCL 2020 provisional specification, has enabled the SYCL Working Group to deliver a feature-rich final specification that balances enhanced performance with backwards compatibility. I am excited by the simplicity and higher expressiveness offered by SYCL 2020 and we will continue to evolve SYCL to meet market needs.”
In parallel with the release of the SYCL 2020 specification, the SYCL ecosystem continues to grow with increased development of compilers, runtimes, libraries, and tools. Intel’s oneAPI Data Parallel C++ (DPC++) already incorporates many SYCL 2020 features. Codeplay’s ComputeCpp SYCL 1.2.1 conformant implementation includes selected SYCL 2020 features as extensions, including support for DSPs and RISC-V with more features being added over time. The Intel and Codeplay implementations are based on the LLVM open-source compiler framework. hipSYCL from Heidelberg University also supports key SYCL 2020 features starting from version 0.9. Developers can download many of these implementations and experiment with SYCL 2020 features today.
At the Argonne National Laboratory, SYCL enables developers to easily scale C++ applications to use accelerator clusters in exascale supercomputer systems. In Europe, the Cineca Supercomputing center is using the Celerity distributed runtime system, built on top of SYCL, to program the new Marconi100 cluster that is ranked #11 in the Top500 (Nov 2020).
The SYCL Working Group encourages users and tool implementers to download and explore the new specification. Feedback on the SYCL standard is always welcome, including requests for future features. Feedback can be provided by visiting the Khronos SYCL Community Forum, the SYCL tech site, or Khronos Slack Channel.
IWOCL & SYCLcon 2021, chaired by SYCL Working Group Chair, Michael Wong and sponsored by Khronos, takes place online on April 27-29. It will include an online SYCL tutorial covering the new SYCL 2020 features, as well as a dedicated SYCL panel discussion. Registration is now open at www.iwocl.org/.
Industry Support for SYCL 2020
“Our users will benefit from features in the SYCL 2020 specification. New features, such as support for unified memory (USM) and reductions, are important capabilities for programming high-performance-computing hardware. In addition, support for C++17 will allow our users to write better C++ code, with both language features (such as deduction guides) and library features (such as std::optional). Other new features (such as softening the requirements on kernel functions and sharing data between host and devices) are an important step for implementing backend support for SYCL in the Kokkos and RAJA performance portability ecosystems.” said Nevin Liber, computer scientist, Argonne National Laboratory’s Leadership Computing Facility
“At Cineca, based on our experience, we confirm the value that SYCL is bringing to the development of high-performance computing in a hybrid environment. In fact, through SYCL, it is possible to build a common and portable environment for the development of computing-intensive applications to be executed on HPC architectures configured with floating point accelerators, which allows industries and scientific communities to use the common availability of development tools, libraries of algorithms, accumulated experience,” said Sanzio Bassini, director of supercomputing, Application Innovation Dept, Cineca. “Cineca is already running the distributed Celerity runtime on top of several SYCL implementations on the new Marconi100 cluster, ranked no. 11 in the Top500, providing users with a unified API for both about 4000 NVIDIA Volta V100 GPUs and IBM Power9 host processors. SYCL 2020 is a big step towards a much leaner API that unlocks all the potential provided by modern C++ standards for accelerated data-parallel kernels, making the development of large-scale scientific software easier and more sustainable, either for industrial oriented domain applications for industries, either for scientific domain-oriented applications.”
“Codeplay has been deeply involved in SYCL from its original definition and we are now enabling the standard on a range of systems with our ComputeCpp product. We strongly believe SYCL is the only software standard to link all the high performance processors to a unified programming solution,” said Andrew Richards, founder and CEO, Codeplay Software. “Developers will find that SYCL 2020 refines the standard to streamline their development and adds some crucial new enhancements to improve productivity.”
“Imagination recognises the benefit of SYCL across multiple markets. Our software stacks have been designed to improve SYCL performance, enabling a straightforward path to exploit the teraflops of compute performance in our latest IP,” said Mark Butler, Vice President of Software Engineering, Imagination Technologies. “The ability to quickly port workloads from other proprietary APIs is a huge benefit, easing the transition from development on desktop to deployment on embedded systems. SYCL 2020 is a positive step forward for this API, enabling higher levels of performance, which will benefit developers and platform creators.”
“The SYCL 2020 final specification brings significant features to the industry that enable C++ developers to more productively build high-performance heterogeneous applications with unified programming across XPU architectures,” said Jeff McVeigh, Intel vice president, Datacenter XPU Products and Solutions. “Several capabilities pioneered in the open source oneAPI C++/DPC++ compiler, such as unified shared memory, group algorithms, and sub-groups, contributed to this community effort. Open, cross-architecture programming is required for accelerated distributed computing; we look forward to continuing our collaboration to address the needs of the developer ecosystem.”
“With thousands of users and a wide range of applications using NERSC’s resources, we must support a wide range of programming models. In addition to directive-based approaches, we see modern C++ language-based approaches to accelerator programming, such as SYCL, as an important component of our programming environment offering for users of Perlmutter,” said Brandon Cook, application performance specialist at NERSC. “Further, this work supports the productivity of scientific application developers and users through performance portability of applications between Aurora and Perlmutter.”
“NSITEXE supports the SYCL 2020 technology, which is gaining attention in embedded applications,” said Hideki Sugimoto, CTO, NSITEXE, Inc. “SYCL is very important to increase productivity by hiding complexities from users. We are considering adopting this technology in our next generation of IP platforms.”
“For Renesas, SYCL is a key enabler for automotive ADAS/AD software developers that allows them to easily use the highly-efficient, heterogeneous accelerators of the R-Car SoC Series through the open Khronos standard,” said Cyril Cordoba, Director of ADAS Segment Marketing Department, Renesas.
“We are excited about the extensive list of features and improvements released with the new SYCL 2020 specification,” said Thomas Fahringer, head of the Distributed and Parallel Systems Group at the University of Innsbruck. “The API becomes terser and more developer friendly, while also introducing new ways for expert users to exercise fine-grained control over state-of-the-art hardware features. The move to a generalized backend model opens up new possibilities to integrate with existing legacy solutions, which is especially important in scientific research environments. As co-developers of the Celerity project, together with the University of Salerno, we are welcoming these changes and look forward to applying them within distributed-memory research and industry applications, for example as part of the recently launched EuroHPC LIGATE project.”
“Xilinx is excited about the progress achieved with SYCL 2020,” said Ralph Wittig, fellow, Xilinx. “This single-source C++ framework unifies host and device code for various kinds of accelerators in the same C++ program. With host-fallback device execution, developers can emulate device code on a CPU, exploring hardware-software co-design for adaptable computing devices. SYCL is now extensible via customizable back-ends, enabling device plug-ins for FPGAs and ACAPs.”