2023
A Survey of Techniques for Optimizing Transformer Inference
TransformersInference OptimizationPruningQuantizationKnowledge DistillationNeural Architecture SearchHardware Acceleration
A comprehensive survey of techniques for optimizing the inference phase of transformer networks.