Software transactional memory for gpu architectures. Nilanjan goswami gpu architect advanced computing lab. Transactional memory for heterogeneous cpugpu systems. Acle version acle q3 2019 acle acle q3 2019 documentation. Towards a software transactional memory for heterogeneous cpu. Transactional synchronization extensions wikipedia. Tm transactional memory stm software transactional memory htm hardware transactional memory hytm hybrid transactional memory tsx intels transactional synchronization extensions hle hardware lock elision rtm restricted transactional memory gpu graphics processing unit gpgpu general purpose computation on graphics processing units cpu central. Towards a software transactional memory for graphics processors. The heterogeneous accelerated processing units apus integrate a multicore cpu and a gpu within the same chip. Sep 15, 2008 3 the graphics memory is the gpu s version of host memory. To improve gpus programmability and thus extend their usage to a wider range of applications, the authors propose to enable transactional memory tm on gpus. Gpustm, a software tm for gpus enables simplified data synchronizations on gpus scales to s of txs ensures livelockfreedom runs on commercially available gpus and runtime outperforms gpu coarsegrain locks by up to 20x.
Pdf modern gpus have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. Sadayappan, yongjian chen, haibo lin and tinfook ngai. Scheduling techniques for gpu architectures with processinginmemory capabilities ashutosh pattnaik1 xulong tang1 adwait jog2 onur kay. Toward a software transactional memory for heterogeneous. Were upgrading the acm dl, and would like your input. Both hardware and software transactional memories have been proposed for the gpu architectures. Evaluation of amds advanced synchronization facility within a complete transactional memory stack performance evaluation of intel transactional synchronization extensions for highperformance computing software transactional memory. On the gpu, main memory is accessed via a cache hierarchy where, in most cases, the l1 data cache is not coherent. If this mechanism is required very often it may harm performance. As the downside, software implementations usually come with a performance penalty, when compared to hardware. The unconverted parts of the java program could use up the cpu multicore resources with its multithreaded workload. Ennals, efficient software transactional memory, technical report, intel research cambridge, uk, 2005.
The major challenges include ensuring good scalability with respect to the massively multithreading of gpus, and. Transactional memory for heterogeneous systems arxiv. Software transactional memory for gpu architectures proceedings. Toward a software transactional memory for heterogeneous cpu. A question that arises in our smart highways use case is this. Exploration of lockbased software transactional memory justin gottschlich. Cpu and gpu architectures, memory subsystem design, hardwaresoftware codesign.
Energy e ciency of software transactional memory in a. Tm simplifies software development for parallel architectures by providing the programmer with the illusion that code blocks, called transactions, execute. Improvements in hardware transactional memory for gpu architectures 3 proposed. Matt software transactional memory, herlihys hardware accelerator concept. Systemwide data consistency issues can be handled by a gpu friendly design of software transactional memory. Thesis, department of electrical and computer engineering, university of colorado. Hardware support for scratchpad memory transactions on gpu. To make applications with dynamic data sharing benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpustm. Hardware transactional memory for gpu architectures ubc ece. For a set of tmenhanced gpu applications, kilo tm captures 59% of the performance of finegrained locking, and is on average 128x faster than executing all transactions serially, for an estimated hardware area overhead of 0. However, ensuring atomicity for complex data types is a task delegated to programmers.
To make applications with dynamic data sharing among threads benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpustm. Advanced computer architecture and systems detailed. Today most people who make effective use of gpus undergo a steep learning curve and are forced to program close to the machine using special gpu programming languages. To evaluate tlll, we use it to implement six widely used programs, and compare it with the stateoftheart adhoc gpu synchronization, gpu software transactional memory stm, and cpu hardware. Gpu localtm allocates transactional metadata in the existing memory resources, minimizing the storage requirements for tm support. To improve gpus programmability and thus extend their usage to a wider range of applications, the authors propose to enable transactional memory tm on gpus via kilo tm, a novel hardware tm system that scales to thousands of concurrent transactions. In addition, it ensures forward progress through an automatic serialization mechanism. A stm system that supports perthread transactions faces new challenges. Transactional synchronization extensions tsx, also called transactional synchronization extensions new instructions tsxni, is an extension to the x86 instruction set architecture isa that adds hardware transactional memory support, speeding up execution of multithreaded software through lock elision. Computing without processors august 2011 communications. Next generation cuda architecture, code named fermi. The major challenges include ensuring good scalability with respect to the massively multithreading of gpus, and preventing livelocks. There are three ways to copy data to the gpu memory, either implicitly through calresmapcalresunmap or explicitly via calctxmemcopy or via a custom copy shader that reads from pcie memory and writes to gpu memory.
Rafael ubal david kaeli department of electrical and computer engineering. The ability of the gpu to handle considerably more threads than the cpu has recently led to increased interest in utilising transactional memory for gpu. To make applications with dynamic data sharing among threads benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpu stm. While transactional memory for processors with hundreds of cores is likely to require hardware support, software implementations will be required for backward compatibility with current and near. This dissertation aims to reduce the burden on gpu software developers with two major enhancements to gpu architectures. I have been working on software transactional memory for in memory database.
Modern gpu architectures have a memory hierarchy that needs to be explicitly programmed to obtain good performance. It is only accessible by the gpu and not accessible via the cpu. The major challenges include ensuring good scalability with respect to the massively multithreading of gpus, and preventing livelocks caused by the simt execution paradigm of gpus. Gpu computing architecture for irregular parallelism ubc. One hardware proposal, kilo tm, can scale to s of concurrent transaction. On the hardware side, kilo tm was proposed in 2011. To make applications with dynamic data sharing among threads benefit from gpu acceleration, we propose a novel software transactional. Secondly, the con ict detection mechanism is based on uni ed readwrite signatures i. Scheduling techniques for gpu architectures with processing. Accelerating gpu hardware transactional memory with snapshot. To appear in the 12th annual ieeeacm international symposium on code generation and optimization cgo, 2014. Hardware transactional memory for gpu architectures. In this paper, we analyze the performance and energy ef. An efficient software transactional memory using committime invalidation.
First, thread block compaction tbc is a microarchitecture innovation that reduces the performance penalty caused by branch divergence in gpu applications. Yunlong xu, rui wang, nilanjan goswami, tao li and depei qian. Transactional memory tm is an optimistic approach to achieve this goal. Programming gpus is challenging for applications with irregular finegrained communication between threads. Qingda lu, christophe alias, uday bondhugula, sriram krishnamoorthy, j. With tm, the programmer does not need to write code with locks to ensure mutual exclusion. Data layout transformation for enhancing locality on nuca chip multiprocessors. Improvements in hardware transactional memory for gpu. Software transactional memory for gpu architectures ieee xplore.
However, performance and energy overhead of kilo tm may deter gpu vendors from incorporating it into future designs. Modern apus implement cpugpu platform atomics for simple data types. His research interests include parallel programming, software transactional memory, and distributed architectures. Software transactional memory for gpu architectures nilanjan.
Efficient transactionalmemorybased implementation of morph. View anup holeys profile on linkedin, the worlds largest professional community. Many tm systems have been proposed in the last two decades for multicore architectures 7, implemented either in hardware or software or a combination. Software transactional memory for gpu architectures yunlong xu. Compiler, architecture and tools conference program abstracts. To reduce this effort, prior work has proposed supporting transactional memory on gpu architectures. Towards a software transactional memory for heterogeneous. Hardware transactional memory for gpu architectures wilson w. Aamodt university of british columbia, canada motivation.
Pdf hardware transactional memory for gpu architectures. Each kernel launch dispatches a hierarchy of threads a grid of blocks. A cuda program starts on a cpu and then launches parallel compute kernels onto a gpu. Software transactional memory for gpu architectures ieee. Pdf software transactional memory for gpu architectures. And now having read about intels hw tm i have many curious questions. Nov 11, 20 compiler, architecture and tools conference program abstracts. Hardware support for local memory transactions on gpu. Or would these kinds of building blocks be just what we want. We propose gpu localtm, a hardware transactional memory tm, as an alternative to data locking mechanisms in local memory. Hardware support for local memory transactions on gpu architectures alejandro villegas angeles navarro. Software transactional memory provides transactional memory semantics in a software runtime library or the programming language, and requires minimal hardware support typically an atomic compare and swap operation, or equivalent.
226 1462 676 182 1009 1139 602 1393 1503 560 114 783 536 168 199 320 1342 938 15 94 652 638 553 253 827 1268 1216 1274 859 774 156 435 755 168 540 334 1272 630 1490 951 209 383 775 1295 639