Shared Memory Programming

GPUで用いられるシェアードメモリの原理

DRAMで作られたメモリからデータを読み、処理を行ってメモリに書き戻すというのは時間が掛かり効率が悪い。このため、CPUでは小容量だが高速のメモリをキャッシュとして使うが、GPUでは伝統的にローカルな小容量のスクラッチパッドメモリを用いてきた。

マイナビニュース

超並列プロセサ - GeForceアーキテクチャとCUDAプログラミング

このループの中で、_shared_ float As[BLOCK_SIZE][BLOCK_SIZE];でシェアードメモリ上に部分行列Asの領域を確保する。また、Bsに対しても同様にシェアードメモリ上に領域を確保する。そして、As[ty][tx] = A[a + wA * ty + tx];とBs[ty][tx] = B[b + wB * ty + tx];で、各スレッドは自分が ...

Nature

Memory Models and Consistency in Concurrent Programming

Memory models offer the formal frameworks that define how operations on memory are executed in environments with concurrent processes. By establishing rules for the ordering and visibility of memory ...

insideHPC

Samsung Joins OpenMP Architecture Review Board

Beaverton, Oregon — The OpenMP Architecture Review Board (ARB) today announced that Samsung has joined the board. The OpenMP ARB is a group of hardware and software vendors and research organizations ...

Communications of the ACM

The Golden Rule of Big Memory: Persistence Is Not Harmful

Large-scale applications, such as generative AI, recommendation systems, big data, and HPC systems, require large-capacity ...

一部の結果でアクセス不可の可能性があるため、非表示になっています。

アクセス不可の結果を表示する