MetaKernel: Enabling Efficient Encrypted Neural Network Inference Through Unified MVM and Convolution
Practical encrypted neural network inference under the CKKS fully homomorphic encryption (FHE) scheme relies heavily on accelerating two key kernel operations: Matrix-Vector Multiplication (MVM) and Convolution (Conv). However, existing solutions—such as expert-tuned libraries and domain-specific languages—are designed in an ad hoc manner, leading to significant inefficiencies caused by excessive rotations.
We introduce MKR, a novel composition-based compiler approach that optimizes MVM and Conv kernel operations for DNN models under CKKS within a unified framework. MKR decomposes each kernel into composable units, called MetaKernels, to enhance SIMD parallelism within ciphertexts (via horizontal batching) and computational parallelism across them (via vertical batching). Our approach tackles previously unaddressed challenges, including reducing rotation overhead through a rotation-aware cost model for data packing, while also ensuring high slot utilization, uniform handling of inputs with arbitrary sizes, and compatibility with the output tensor layout. Implemented in a production-quality FHE compiler, MKR achieves inference time speedups of $10.08\times$-$185.60\times$ for individual MVM and Conv kernels and $1.75\times$-$11.84\times$ for end-to-end inference compared to a state-of-the-art FHE compiler. Moreover, MKR enables homomorphic execution of large DNN models for the first time, where prior methods fail, significantly advancing the practicality of FHE compilers.
Thu 16 OctDisplayed time zone: Perth change
16:00 - 17:30 | Neural NetworkOOPSLA at Orchid West Chair(s): Jiasi Shen The Hong Kong University of Science and Technology | ||
16:00 15mTalk | Convex Hull Approximation for Activation Functions OOPSLA Zhongkui Ma The University of Queensland, Zihan Wang The University of Queensland and CSIRO's Data61, Guangdong Bai University of Queensland | ||
16:15 15mTalk | Cost of Soundness in Mixed-Precision Tuning OOPSLA Pre-print | ||
16:30 15mTalk | Finch: Sparse and Structured Tensor Programming with Control Flow OOPSLA Willow Ahrens Massachusetts Institute of Technology, Teodoro F. Collin MIT CSAIL, Radha Patel MIT CSAIL, Kyle Deeds University of Washington, Changwan Hong Massachusetts Institute of Technology, Saman Amarasinghe Massachusetts Institute of Technology | ||
16:45 15mTalk | MetaKernel: Enabling Efficient Encrypted Neural Network Inference Through Unified MVM and Convolution OOPSLA Peng Yuan Ant Group, Yan Liu Ant Group, Jianxin Lai Ant Group, Long Li Ant Group, Tianxiang Sui Ant Group, Linjie Xiao Ant Group, Xiaojing Zhang Ant Group, Qing Zhu Ant Group, Jingling Xue University of New South Wales | ||
17:00 15mTalk | Quantization with Guaranteed Floating-Point Neural Network Classifications OOPSLA | ||
17:15 15mTalk | The Continuous Tensor Abstraction: Where Indices are Real OOPSLA Jaeyeon Won MIT, Willow Ahrens Massachusetts Institute of Technology, Teodoro F. Collin MIT CSAIL, Joel S Emer MIT/NVIDIA, Saman Amarasinghe Massachusetts Institute of Technology |