Search Results for author: H. T. Kung

Found 27 papers, 8 papers with code

DeepMachining: Online Prediction of Machining Errors of Lathe Machines

no code implementations25 Mar 2024 Xiang-Li Lu, Hwai-Jung Hsu, Che-Wei Chou, H. T. Kung, Chen-Hsin Lee, Sheng-Mao Cheng

Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states.

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

1 code implementation23 Feb 2024 Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen

First, current personalization techniques fail to reliably extend to multiple concepts -- we hypothesize this to be due to the mismatch between complex scenes and simple text descriptions in the pre-training dataset (e. g., LAION).

Image Generation

Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

1 code implementation8 Jul 2023 Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting

We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs).

Management Scheduling

MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

no code implementations12 Apr 2023 Andrew Sabot, Vikas Natesh, H. T. Kung, Wei-Te Ting

We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems.

Scheduling

StitchNet: Composing Neural Networks from Pre-Trained Fragments

1 code implementation5 Jan 2023 Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H. T. Kung

We then show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks at a fraction of computing resource and data requirements.

SpeedLimit: Neural Architecture Search for Quantized Transformer Models

no code implementations25 Sep 2022 Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints.

Neural Architecture Search Quantization +1

SphereFed: Hyperspherical Federated Learning

no code implementations19 Jul 2022 Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung

Federated Learning aims at training a global model from multiple decentralized devices (i. e. clients) without exchanging their private local data.

Federated Learning

SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

no code implementations CVPR 2022 Xin Dong, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T. Kung, Ziyun Li

We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints.

3D Classification Distributed Computing +1

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

no code implementations28 Oct 2021 Sai Qian Zhang, Bradley McDanel, H. T. Kung

Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values.

Quantization

Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

no code implementations13 Jul 2021 Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung

Prior works usually assume that SC offers privacy benefits as only intermediate features, instead of private data, are shared from devices to the cloud.

Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

no code implementations CVPR 2022 Xin Dong, Junfeng Guo, Ang Li, Wei-Te Ting, Cong Liu, H. T. Kung

Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data.

General Classification Out-of-Distribution Detection +1

exBERT: Extending Pre-trained Models with Domain-specific Vocabulary Under Constrained Training Resources

no code implementations Findings of the Association for Computational Linguistics 2020 Wen Tai, H. T. Kung, Xin Dong, Marcus Comiter, Chang-Fu Kuo

We introduce exBERT, a training method to extend BERT pre-trained models from a general domain to a new pre-trained model for a specific domain with a new additive vocabulary under constrained training resources (i. e., constrained computation and data).

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

no code implementations13 Jul 2020 H. T. Kung, Bradley McDanel, Sai Qian Zhang

To perform conversion from binary to SDR, we develop an efficient encoding method called HESE (Hybrid Encoding for Signed Expressions) that can be performed in one pass looking at only two bits at a time.

Quantization

DaiMoN: A Decentralized Artificial Intelligence Model Network

1 code implementation19 Jul 2019 Surat Teerapittayanon, H. T. Kung

A main feature of DaiMoN is that it allows peers to verify the accuracy improvement of submitted models without knowing the test labels.

Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations

1 code implementation MAPL 2019 Philippe Tillet, H. T. Kung, David Cox

The validation and deployment of novel research ideas in the field of Deep Learning is often limited by the availability of efficient compute kernels for certain basic primitives.

CheckNet: Secure Inference on Untrusted Devices

no code implementations17 Jun 2019 Marcus Comiter, Surat Teerapittayanon, H. T. Kung

CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker.

Full-stack Optimization for Accelerating CNNs with FPGA Validation

no code implementations1 May 2019 Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

A highlight of our full-stack approach which attributes to the achieved high energy efficiency is an efficient Selector-Accumulator (SAC) architecture for implementing the multiplier-accumulator (MAC) operation present in any digital CNN hardware.

Adversarial Learning of Semantic Relevance in Text to Image Synthesis

no code implementations12 Dec 2018 Miriam Cha, Youngjune L. Gwon, H. T. Kung

Instead of selecting random training examples, we perform negative sampling based on the semantic distance from a positive example in the class.

Image Generation MS-SSIM +1

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

no code implementations7 Nov 2018 H. T. Kung, Bradley McDanel, Sai Qian Zhang

We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators.

Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference

no code implementations21 Oct 2017 Bradley McDanel, Surat Teerapittayanon, H. T. Kung

At inference time, the number of channels used can be dynamically adjusted to trade off accuracy for lowered power consumption and reduced latency by selecting only a beginning subset of channels.

Image Classification

Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

1 code implementation6 Sep 2017 Surat Teerapittayanon, Bradley McDanel, H. T. Kung

In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.

Distributed Computing Object Recognition +1

BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

2 code implementations6 Sep 2017 Surat Teerapittayanon, Bradley McDanel, H. T. Kung

Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer.

Embedded Binarized Neural Networks

2 code implementations6 Sep 2017 Bradley McDanel, Surat Teerapittayanon, H. T. Kung

Beyond minimizing the memory required to store weights, as in a BNN, we show that it is essential to minimize the memory used for temporaries which hold intermediate results between layers in feedforward inference.

Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

no code implementations5 Sep 2017 Miriam Cha, Youngjune Gwon, H. T. Kung

We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression.

Clustering Language Modelling +2

Adversarial nets with perceptual losses for text-to-image synthesis

no code implementations30 Aug 2017 Miriam Cha, Youngjune Gwon, H. T. Kung

Recent approaches in generative adversarial networks (GANs) can automatically synthesize realistic images from descriptive text.

Descriptive Image Generation

Multimodal Sparse Coding for Event Detection

no code implementations17 May 2016 Youngjune Gwon, William Campbell, Kevin Brady, Douglas Sturim, Miriam Cha, H. T. Kung

Unsupervised feature learning methods have proven effective for classification tasks based on a single modality.

Classification Event Detection +1

Multimodal sparse representation learning and applications

no code implementations19 Nov 2015 Miriam Cha, Youngjune Gwon, H. T. Kung

In this paper, we present a multimodal framework for learning sparse representations that can capture semantic correlation between modalities.

Classification Dictionary Learning +7

Cannot find the paper you are looking for? You can Submit a new open access paper.