no code implementations • 24 Nov 2023 • Seonghak Kim, Gyeongdo Ham, SuIn Lee, Donggon Jang, Daeshik Kim
To distill optimal knowledge by adjusting non-target class predictions, we apply a higher temperature to low energy samples to create smoother distributions and a lower temperature to high energy samples to achieve sharper distributions.
no code implementations • 24 Nov 2023 • Gyeongdo Ham, Seonghak Kim, SuIn Lee, Jae-Hyeok Lee, Daeshik Kim
Furthermore, we propose a method called cosine similarity weighted temperature (CSWT) to improve the performance.
no code implementations • 23 Nov 2023 • Seonghak Kim, Gyeongdo Ham, Yucheol Cho, Daeshik Kim
The improvement in the performance of efficient and lightweight models (i. e., the student model) is achieved through knowledge distillation (KD), which involves transferring knowledge from more complex models (i. e., the teacher model).
no code implementations • 22 Feb 2020 • Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H. Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W. Lee, Deog-Kyoon Jeong
The attention mechanism is widely adopted by many state-of-the-art neural networks for computer vision, natural language processing, and machine translation, and accounts for a large portion of total execution time.