|
|
Venues (Conferences, Journals, ...)
|
|
GrowBag graphs for keyword ? (Num. hits/coverage)
Group by:
The graphs summarize 96 occurrences of 36 keywords
|
|
|
Results
Found 134 publication records. Showing 133 according to the selection in the facets
Hits ?▲ |
Authors |
Title |
Venue |
Year |
Link |
Author keywords |
145 | Bo Kågström, Per Ling, Charles Van Loan |
GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark.  |
ACM Trans. Math. Softw.  |
1998 |
DBLP DOI BibTeX RDF |
GEMM-based level 3 BLAS, matrix-matrix kernels, parallelization, memory hierarchy, vectorization, FORTRAN 77, blocked algorithms |
101 | Yinan Li 0002, Jack J. Dongarra, Stanimire Tomov |
A Note on Auto-tuning GEMM for GPUs.  |
ICCS (1)  |
2009 |
DBLP DOI BibTeX RDF |
matrix multiply, GPUs, Auto-tuning, dense linear algebra |
83 | Isak Jonsson, Bo Kågström |
Parallel Triangular Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking.  |
PARA  |
2000 |
DBLP DOI BibTeX RDF |
Sylvester-type matrix equations, recursion, superscalar, level 3 BLAS, GEMM-based, automatic blocking |
78 | Michel J. Daydé, Iain S. Duff, Antoine Petitet |
A parallel block implementation of Level-3 BLAS for MIMD vector processors.  |
ACM Trans. Math. Softw.  |
1994 |
DBLP DOI BibTeX RDF |
matrix-matrix kernels, parallelization, vectorization, Level-3 BLAS |
66 | Bo Kågström, Charles Van Loan |
Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues.  |
ACM Trans. Math. Softw.  |
1998 |
DBLP DOI BibTeX RDF |
GEMM-based level 3 BLAS, matrix-matrix kernels, parallelization, memory hierarchy, vectorization, FORTRAN 77, blocked algorithms |
63 | Michael J. Feeley, Norman C. Hutchinson, Suprio Ray |
Realistic Mobility for Mobile Ad Hoc Network Simulation.  |
ADHOC-NOW  |
2004 |
DBLP DOI BibTeX RDF |
GEMM, MANET, Mobility Model |
59 | Ahmed Sherif Zekri, Stanislav G. Sedukhin |
The general matrix multiply-add operation on 2D torus.  |
IPDPS  |
2006 |
DBLP DOI BibTeX RDF |
|
51 | John S. McCaskill, Thomas Maeke, Udo Gemm, Ludger Schulte, Uwe Tangen |
NGEN: A Massively Parallel Reconfigurable Computer for Biological Simulation: Towards a Self-Organizing Computer.  |
ICES  |
1996 |
DBLP DOI BibTeX RDF |
|
45 | Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen |
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
45 | Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen |
FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs.  |
HPDC  |
2023 |
DBLP DOI BibTeX RDF |
|
44 | Robert Granat, Bo Kågström |
Evaluating Parallel Algorithms for Solving Sylvester-Type Matrix Equations: Direct Transformation-Based Versus Iterative Matrix-Sign-Function-Based Methods.  |
PARA  |
2004 |
DBLP DOI BibTeX RDF |
Sylvester matrix equation, Bartels–Stewart method, explicit blocking, c-stable matrices, PSLICOT, level 3 BLAS, continuous-time, GEMM-based, ScaLAPACK, Newton iteration, matrix sign function |
44 | Bo Kågström |
Management of Deep Memory Hierarchies - Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Computations.  |
PARA  |
2004 |
DBLP DOI BibTeX RDF |
automatic variable blocking, hybrid data structures, superscalar kernels, SMP parallelization, library software, ESSL, RECSY, periodic systems, factorizations, recursion, superscalar, LAPACK, level 3 BLAS, dense linear algebra, GEMM-based, SLICOT, matrix equations |
44 | Robert Granat, Isak Jonsson, Bo Kågström |
Combining Explicit, Recursive Blocking for Solving Triangular Sylvester-Type Matrix Equations on Distributed Memory Platforms.  |
Euro-Par  |
2004 |
DBLP DOI BibTeX RDF |
Sylvester matrix equation, Bartels–Stewart method, ScaLAPACK-style algorithms, RECSY, blocking, LAPACK, recursive algorithms, level 3 BLAS, continuous-time, GEMM-based, automatic blocking |
44 | Isak Jonsson, Bo Kågström |
RECSY - A High Performance Library for Sylvester-Type Matrix Equations.  |
Euro-Par  |
2003 |
DBLP DOI BibTeX RDF |
Sylvester-type matrix equations, RECSY, recursion, superscalar, LAPACK, level 3 BLAS, GEMM-based, SLICOT, automatic blocking |
44 | Robert Granat, Bo Kågström, Peter Poromaa |
Parallel ScaLAPACK-Style Algorithms for Solving Continuous-Time Sylvester Matrix Equations.  |
Euro-Par  |
2003 |
DBLP DOI BibTeX RDF |
Sylvester matrix equation, Bartels-Stewart method, ScaLAPACK-style algorithms, blocking, level 3 BLAS, continuous-time, GEMM-based, SLICOT |
44 | Isak Jonsson, Bo Kågström |
Recursive blocked algorithms for solving triangular systems - Part I: one-sided and coupled Sylvester-type matrix equations.  |
ACM Trans. Math. Softw.  |
2002 |
DBLP DOI BibTeX RDF |
SMP parallelization, generalized coupled Sylvester, standard Sylvester and Lyapunov, recursion, superscalar, LAPACK, level-3 BLAS, GEMM-based, SLICOT, Matrix equations, automatic blocking |
44 | Isak Jonsson, Bo Kågström |
Recursive blocked algorithms for solving triangular systems - Part II: two-sided and generalized Sylvester and Lyapunov matrix equations.  |
ACM Trans. Math. Softw.  |
2002 |
DBLP DOI BibTeX RDF |
SMP parallelization, generalized Sylvester and Lyapunov, standard discrete-time Sylvester and Lyapunov, recursion, superscalar, LAPACK, level-3 BLAS, GEMM-based, SLICOT, Matrix equations, automatic blocking |
39 | Vasily Volkov, James Demmel |
Benchmarking GPUs to tune dense linear algebra.  |
SC  |
2008 |
DBLP DOI BibTeX RDF |
|
39 | Bjarne Stig Andersen, Jerzy Wasniewski, Fred G. Gustavson |
A recursive formulation of Cholesky factorization of a matrix in packed storage.  |
ACM Trans. Math. Softw.  |
2001 |
DBLP DOI BibTeX RDF |
Cholesky factorization and solution, complex Hermitian matrices, novel packed matrix data structures, real symmetric matrices, BLAS, recursive algorithms, positive definite matrices |
39 | Fred G. Gustavson, Isak Jonsson |
High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage.  |
PARA  |
2000 |
DBLP DOI BibTeX RDF |
packed format, level 3 BLAS parallelism, recursive algorithm, Cholesky factorization, recursive data structure |
39 | Michel J. Daydé, Iain S. Duff |
The RISC BLAS: a blocked implementation of level 3 BLAS for RISC processors.  |
ACM Trans. Math. Softw.  |
1999 |
DBLP DOI BibTeX RDF |
matrix-matrix kernels, blocking, loop-unrolling, level 3 BLAS, RISC processors |
24 | Samuel Williams 0001, John Shalf, Leonid Oliker, Shoaib Kamil 0001, Parry Husbands, Katherine A. Yelick |
Scientific Computing Kernels on the Cell Processor.  |
Int. J. Parallel Program.  |
2007 |
DBLP DOI BibTeX RDF |
GEMM, SpMV, three level memory, FFT, sparse matrix, Cell processor, Stencil |
24 | Samuel Williams 0001, John Shalf, Leonid Oliker, Shoaib Kamil 0001, Parry Husbands, Katherine A. Yelick |
The potential of the cell processor for scientific computing.  |
Conf. Computing Frontiers  |
2006 |
DBLP DOI BibTeX RDF |
GEMM, SpMV, three level memory, FFT, sparse matrix, cell processor, stencil |
23 | Susana Ortega-Cisneros |
Design and Implementation of an NoC-Based Convolution Architecture With GEMM and Systolic Arrays.  |
IEEE Embed. Syst. Lett.  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Cong Guo 0003, Fengchen Xue, Jingwen Leng, Yuxian Qiu, Yue Guan, Weihao Cui, Quan Chen 0002, Minyi Guo |
Accelerating Sparse DNNs Based on Tiled GEMM.  |
IEEE Trans. Computers  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, Tiejun Li |
SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow.  |
ACM Trans. Design Autom. Electr. Syst.  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Venkata Sai Praneeth Karempudi, Sairam Sri Vatsavai, Ishan G. Thakkar, Oluwaseun Adewunmi Alo, Jeffrey Todd Hastings, Justin Scott Woods |
A Low-Dissipation and Scalable GEMM Accelerator with Silicon Nitride Photonics.  |
CoRR  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Sairam Sri Vatsavai, Venkata Sai Praneeth Karempudi, Oluwaseun Adewunmi Alo, Ishan G. Thakkar |
A Comparative Analysis of Microrings Based Incoherent Photonic GEMM Accelerators.  |
CoRR  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Cong Guo 0003, Fengchen Xue, Jingwen Leng, Yuxian Qiu, Yue Guan, Weihao Cui, Quan Chen 0002, Minyi Guo |
Accelerating Sparse DNNs Based on Tiled GEMM.  |
CoRR  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Jaeyong Jang, Yulhwa Kim, Juheun Lee, Jae-Joon Kim |
FIGNA: Integer Unit-Based Accelerator Design for FP-INT GEMM Preserving Numerical Accuracy.  |
HPCA  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Seonghun Jeong, Jooyeon Lee, Jaeha Kung |
A Full SW-HW Demonstration of GEMM Accelerators with RISC-V Instruction Extensions.  |
ICEIC  |
2024 |
DBLP DOI BibTeX RDF |
|
23 | Lili Xu, Binjie Chen, Chenhao Huang, Mengmeng Zhou, Shucheng You, Fangming Jiang, Weirong Chen, Jinsong Deng |
Identifying PM2.5-Related Health Burden in the Context of the Integrated Development of Urban Agglomeration Using Remote Sensing and GEMM Model.  |
Remote. Sens.  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Sandeep Kumar Sharma, Amit Chaurasia, Vijay Shankar Sharma, Chiranji Lal Chowdhary, Shakila Basheer |
GEMM, a Genetic Engineering-Based Mutual Model for Resource Allocation of Grid Computing.  |
IEEE Access  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Jordi Fornt, Pau Fontova-Musté, Martí Caro, Jaume Abella 0001, Francesc Moll, Josep Altet, Christoph Studer |
An Energy-Efficient GeMM-Based Convolution Accelerator With On-the-Fly im2col.  |
IEEE Trans. Very Large Scale Integr. Syst.  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Iryna De Albuquerque Silva, Thomas Carle, Adrien Gauffriau, Claire Pagetti |
Extending a predictable machine learning framework with efficient gemm-based convolution routines.  |
Real Time Syst.  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Hyeonjin Kim, William J. Song |
LAS: Locality-Aware Scheduling for GEMM-Accelerated Convolutions in GPUs.  |
IEEE Trans. Parallel Distributed Syst.  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Louis Ledoux, Marc Casas |
Open-Source GEMM Hardware Kernels Generator: Toward Numerically-Tailored Computations.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen |
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin 0001, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna |
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Saeed Maleki |
Look-Up mAI GeMM: Increasing AI GeMMs Performance by Nearly 2.5x via msGeMM.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan 0002, Siva Kumar Sastry Hari, Timothy Tsai 0002, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant J. Nair, Kevin J. Barker, Ang Li 0006 |
MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Ruqing G. Xu, Field G. Van Zee, Robert A. van de Geijn |
GEMMFIP: Unifying GEMM in BLIS.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré |
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Devangi N. Parikh, Robert A. van de Geijn, Greg M. Henry |
Cascading GEMM: High Precision from Low Precision.  |
CoRR  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Enrico Reggiani, Alessandro Pappalardo, Max Doblas, Miquel Moretó, Mauro Olivieri, Osman Sabri Unsal, Adrián Cristal |
Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices.  |
HPCA  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Ranggi Hwang, Minhoo Kang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu |
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks.  |
HPCA  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Geonhwa Jeong, Sana Damani, Abhimanyu Rajeshkumar Bambhaniya, Eric Qin 0001, Christopher J. Hughes, Sreenivas Subramoney, Hyesoon Kim, Tushar Krishna |
VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs.  |
HPCA  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Susmita Dey Manasi, Suvadeep Banerjee, Abhijit Davare, Anton A. Sorokin, Steven M. Burns, Desmond A. Kirkpatrick, Sachin S. Sapatnekar |
Reusing GEMM Hardware for Efficient Execution of Depthwise Separable Convolution on ASIC-Based DNN Accelerators.  |
ASP-DAC  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Jie Lei, Héctor Martínez, José Flich, Enrique S. Quintana-Ortí |
GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal.  |
ISC Workshops  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Guosheng Yu, Zhihong Lv, Haijiang Wang, Zilong Huang, Jicheng Chen |
Task-aware Scheduling and Performance Optimization on Yitian710 SoC for GEMM-based Workloads on the Cloud.  |
AICAS  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | RuQing G. Xu, Field G. Van Zee, Robert A. van de Geijn |
Towards a Unified Implementation of GEMM in BLIS.  |
ICS  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen |
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs.  |
ICS  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Alexey Romanov, Andrei Turkin, Oleg Myakinin, Fiodar Tsupko, Jiexing Gao |
Parameter Estimation via Time Modeling for MLIR Implementation of GEMM.  |
OPTIMA  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Yongseung Yu, Donghyun Son, Younghyun Lee, Sunghyun Park 0004, Giha Ryu, Myeongjin Cho, Jiwon Seo 0002, Yongjun Park 0001 |
Tailoring CUTLASS GEMM using Supervised Learning.  |
ICCD  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Harideep Nair, Prabhu Vellaisamy, Albert Chen, Joseph Finn, Anna Li, Manav Trivedi, John Paul Shen |
tuGEMM: Area-Power-Efficient Temporal Unary GEMM Architecture for Low-Precision Edge AI.  |
ISCAS  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Evan Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré |
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.  |
NeurIPS  |
2023 |
DBLP BibTeX RDF |
|
23 | Tahsin Tariq Banna, Swakshar Deb, Sejuti Rahman, Shafin Rahman |
GEMM: A Graph Embedded Model for Memorability Prediction.  |
IJCNN  |
2023 |
DBLP DOI BibTeX RDF |
|
23 | Zhiwei Yang, Lu Lu, Ruimin Wang |
A batched GEMM optimization framework for deep learning.  |
J. Supercomput.  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Thomas Faingnaert, Tim Besard, Bjorn De Sutter |
Flexible Performant GEMM Kernels on GPUs.  |
IEEE Trans. Parallel Distributed Syst.  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Sergio Barrachina 0001, Manuel F. Dolz, Pablo San Juan, Enrique S. Quintana-Ortí |
Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors.  |
J. Parallel Distributed Comput.  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Nihat Mert Cicek, Xipeng Shen, Ozcan Ozturk 0001 |
Energy Efficient Boosting of GEMM Accelerators for DNN via Reuse.  |
ACM Trans. Design Autom. Electr. Syst.  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Yunan Zhang, Po-An Tsai, Hung-Wei Tseng 0001 |
SIMD2: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM.  |
CoRR  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Jianyu Yao, Boqian Shi, Chunyang Xiang, Haipeng Jia, Chendi Li, Hang Cao, Yunquan Zhang |
IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM.  |
CoRR  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Minhoo Kang, Ranggi Hwang, Jiwon Lee, Dongyun Kam, Youngjoo Lee, Minsoo Rhu |
GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks.  |
CoRR  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Mark Gates, Asim YarKhan, Dalal Sukkari, Kadir Akbudak, Sébastien Cayrols, Daniel Bielich, Mohammed A. Al Farhan, Jack J. Dongarra |
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.  |
|
2022 |
DOI RDF |
|
23 | Mark Gates, Asim YarKhan, Dalal Sukkari, Kadir Akbudak, Sébastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Mohammed A. Al Farhan, Jack J. Dongarra |
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.  |
|
2022 |
DOI RDF |
|
23 | Bo Wang, Sheng Ma, Zhong Liu, Libo Huang, Yuan Yuan 0034, Yi Dai |
SADD: A Novel Systolic Array Accelerator with Dynamic Dataflow for Sparse GEMM in Deep Learning.  |
NPC  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Cunyang Wei, Haipeng Jia, Yunquan Zhang, Kun Li, Luhan Wang |
LBBGEMM: A Load-balanced Batch GEMM Framework on ARM CPU s.  |
HPCC/DSS/SmartCity/DependSys  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Arthur Francisco Lorenzon, Sandro Matheus V. N. Marques, Antoni C. Navarro, Vicenç Beltran 0001 |
Seamless optimization of the GEMM kernel for task-based programming models.  |
ICS  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Chunhua Xiao, Chen Shi, Dandan Xu, Fangzhu Lin, Kun Ning |
SDST-Accelerating GEMM-based Convolution through Smart Data Stream Transformation.  |
CCGRID  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Bo Wang, Sheng Ma, Yuan Yuan 0034, Yi Dai, Wei Jiang, Xiang Hou, Xiao Yi, Rui Xu |
SparG: A Sparse GEMM Accelerator for Deep Learning Applications.  |
ICA3PP  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Dennis Agyemanh Nana Gookyi, Eunchong Lee, Kyungho Kim, Sung-Joon Jang, Sang-Seol Lee |
Exploring GEMM Operations on Different Configurations of the Gemmini Accelerator.  |
ISOCC  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Bingyi Zhang, Akhilesh R. Jaiswal, Clynn Mathew, Ravi Teja Lakkireddy, Ajey P. Jacob, Sasindu Wijeratne, Viktor K. Prasanna |
Modeling the Energy Efficiency of GEMM using Optical Random Access Memory.  |
HPEC  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Yunan Zhang, Po-An Tsai, Hung-Wei Tseng 0001 |
SIMD2: a generalized matrix instruction set for accelerating tensor computation beyond GEMM.  |
ISCA  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Ananda Samajdar, Eric Qin 0001, Michael Pellauer, Tushar Krishna |
Self adaptive reconfigurable arrays (SARA): learning flexible GEMM accelerator configuration and mapping-space using ML.  |
DAC  |
2022 |
DBLP DOI BibTeX RDF |
|
23 | Di Wu 0016, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim 0001, Joshua San Miguel |
uGEMM: Unary Computing for GEMM Applications.  |
IEEE Micro  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Qingchang Han, Hailong Yang, Ming Dun, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian |
Towards efficient tile low-rank GEMM computation on sunway many-core processors.  |
J. Supercomput.  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Mochamad Asri, Dhairya Malhotra, Jiajun Wang, George Biros, Lizy K. John, Andreas Gerstlauer |
Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods.  |
IEEE Trans. Parallel Distributed Syst.  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Ananda Samajdar, Michael Pellauer, Tushar Krishna |
Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration.  |
CoRR  |
2021 |
DBLP BibTeX RDF |
|
23 | Ratko Pilipovic, Vladimir Risojevic, Janko Bozic, Patricio Bulic, Uros Lotric |
An Approximate GEMM Unit for Energy-Efficient Object Detection.  |
Sensors  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Reza Hojabr, Ali Sedaghati, Amirali Sharifian, Ahmad Khonsari, Arrvindh Shriraman |
SPAGHETTI: Streaming Accelerators for Highly Sparse GEMM on FPGAs.  |
HPCA  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Jianyu Yao, Boqian Shi, Chunyang Xiang, Haipeng Jia, Chendi Li, Hang Cao, Yunquan Zhang |
IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM.  |
ICPADS  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Malith Jayaweera, Kaustubh Shivdikar, Yanzhi Wang, David R. Kaeli |
JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries.  |
SEED  |
2021 |
DBLP DOI BibTeX RDF |
|
23 | Zhi Gang Liu, Paul N. Whatmough, Matthew Mattina |
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference.  |
IEEE Comput. Archit. Lett.  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Uday Bondhugula |
High Performance Code Generation in MLIR: An Early Case Study with GEMM.  |
CoRR  |
2020 |
DBLP BibTeX RDF |
|
23 | Zhi Gang Liu, Paul N. Whatmough, Matthew Mattina |
Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference.  |
CoRR  |
2020 |
DBLP BibTeX RDF |
|
23 | Thomas Faingnaert, Tim Besard, Bjorn De Sutter |
Flexible Performant GEMM Kernels on GPUs.  |
CoRR  |
2020 |
DBLP BibTeX RDF |
|
23 | Natalie Beams, Ahmad Abdelfattah, Stan Tomov, Jack J. Dongarra, Tzanio V. Kolev, Yohann Dudouit |
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs.  |
ScalA@SC  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Eric Qin 0001, Ananda Samajdar, Hyoukjun Kwon, Vineet Nadella, Sudarshan Srinivasan, Dipankar Das 0002, Bharat Kaul, Tushar Krishna |
SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training.  |
HPCA  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Ioannis Oroutzoglou, Dimosthenis Masouros, Konstantina Koliogeorgi, Sotirios Xydis, Dimitrios Soudris |
Exploration of GPU sharing policies under GEMM workloads.  |
SCOPES  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Guoning Lu, Dong Xu 0015, Ning Wang, Xiao Zhang, Degen Zhen, Hong Lei, Yunlong Bai, Dehui Kong, Hang Ruan, Zhifeng Chi, Xiankui Xiong, Ke Xu 0014 |
A Design of 16TOPS Efficient GEMM Module in Deep Learning Accelerator.  |
ICTA  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Yunping Zhao, Jianzhuang Lu, Xiaowen Chen |
A Design of GEMM Parallel Computing Accelerator Based on Vector SIMD Technology.  |
ICCTA  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Philip Colangelo, Shayan Sengupta, Martin Margala |
Sparse Persistent GEMM Accelerator using OpenCL for Intel FPGAs.  |
ISCAS  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Andrew Anderson 0001, Aravind Vasudevan, Cormac Keane, David Gregg |
High-Performance Low-Memory Lowering: GEMM-based Algorithms for DNN Convolution.  |
SBAC-PAD  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Sheng Wei Pang, Chai Quek, Dilip K. Prasad |
GEMM-eMFIS (FRI/E): A Novel General Episodic Memory Mechanism For Fuzzy Neural Networks.  |
IJCNN  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | Di Wu 0016, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim 0001, Joshua San Miguel |
UGEMM: Unary Computing Architecture for GEMM Applications.  |
ISCA  |
2020 |
DBLP DOI BibTeX RDF |
|
23 | S. Kala, Babita R. Jose, Jimson Mathew, Nalesh Sivanandan |
High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture.  |
IEEE Trans. Very Large Scale Integr. Syst.  |
2019 |
DBLP DOI BibTeX RDF |
|
23 | Roktaek Lim, Yeongha Lee, Raehyun Kim, Jaeyoung Choi, Myungho Lee |
Auto-tuning GEMM kernels on the Intel KNL and Intel Skylake-SP processors.  |
J. Supercomput.  |
2019 |
DBLP DOI BibTeX RDF |
|
23 | Xing Su, Xiangke Liao, Hao Jiang 0001, Canqun Yang, Jingling Xue |
SCP: Shared Cache Partitioning for High-Performance GEMM.  |
ACM Trans. Archit. Code Optim.  |
2019 |
DBLP DOI BibTeX RDF |
|
23 | Wenlei Bao, Li-Wen Chang, Yang Chen, Ke Deng, Amit Agarwal, Emad Barsoum, Abe Taha |
NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques.  |
CoRR  |
2019 |
DBLP BibTeX RDF |
|
Displaying result #1 - #100 of 133 (100 per page; Change: ) Pages: [ 1][ 2][ >>] |
|