Search Results for author: Wen-mei Hwu

Found 48 papers, 23 papers with code

xER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem

no code implementations NAACL (TrustNLP) 2021 Samhita Vadrevu, Rakesh Nagi, JinJun Xiong, Wen-mei Hwu

In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method.

Entity Resolution graph partitioning

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

1 code implementation7 Mar 2024 Ali Hassani, Wen-mei Hwu, Humphrey Shi

We observe that our fused kernels successfully circumvent some of the unavoidable inefficiencies in unfused implementations.

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

1 code implementation28 Jun 2023 Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-mei Hwu

To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy.

Graph Sampling

Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

no code implementations16 Jan 2023 Kun Wu, Mert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-mei Hwu

Relational graph neural networks (RGNNs) are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs.

8k C++ code +1

Submission-Aware Reviewer Profiling for Reviewer Recommender System

no code implementations8 Nov 2022 Omer Anjum, Alok Kamatar, Toby Liang, JinJun Xiong, Wen-mei Hwu

We propose an approach that learns from each abstract published by a potential reviewer the topics studied and the explicit context in which the reviewer studied the topics.

Recommendation Systems

Can Language Models Be Specific? How?

1 code implementation11 Oct 2022 Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

We hope this work can bring to awareness the notion of specificity of language models and encourage the research community to further explore this important but understudied problem.

Language Modelling Specificity

DEER: Descriptive Knowledge Graph for Explaining Entity Relationships

1 code implementation21 May 2022 Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

Experiments demonstrate that our system can extract and generate high-quality relation descriptions for explaining entity relationships.

BIG-bench Machine Learning Descriptive +4

Understanding Jargon: Combining Extraction and Generation for Definition Modeling

1 code implementation14 Nov 2021 Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

From the composition of this phrase, machines may guess twin prime is a certain kind of prime, but it is still difficult to deduce exactly what twin stands for without additional knowledge.

Text Generation

Graph Neural Network Training with Data Tiering

no code implementations10 Nov 2021 Seung Won Min, Kun Wu, Mert Hidayetoğlu, JinJun Xiong, Xiang Song, Wen-mei Hwu

With our data tiering method, we additionally provide a new data placement and access strategy to further minimize the CPU-GPU communication overhead.

Fraud Detection

MLHarness: A Scalable Benchmarking System for MLCommons

no code implementations9 Nov 2021 Yen-Hsiang Chang, Jianhao Pu, Wen-mei Hwu, JinJun Xiong

With the society's growing adoption of machine learning (ML) and deep learning (DL) for various intelligent solutions, it becomes increasingly imperative to standardize a common set of measures for ML/DL models with large scale open datasets under common development practices and resources so that people can benchmark and compare models quality and performance on a common ground.

Benchmarking

Open Relation Modeling: Learning to Define Relations between Entities

1 code implementation Findings (ACL) 2022 Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

Relations between entities can be represented by different instances, e. g., a sentence containing both entities or a fact in a Knowledge Graph (KG).

Open Relation Modeling Relation +1

Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach

1 code implementation ACL 2021 Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

To support a fine-grained domain without relying on a matching corpus for supervision, we develop hierarchical core-fringe learning, which learns core and fringe terms jointly in a semi-supervised manner contextualized in the hierarchy of the domain.

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

1 code implementation29 Apr 2021 Jiachen Li, Bowen Cheng, Rogerio Feris, JinJun Xiong, Thomas S. Huang, Wen-mei Hwu, Humphrey Shi

Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric.

Object object-detection +1

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

1 code implementation4 Mar 2021 Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help.

Recommendation Systems

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

1 code implementation20 Jan 2021 Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step.

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation1 Jan 2021 Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

1 code implementation28 Dec 2020 Carl Pearson, Kun Wu, I-Hsin Chung, JinJun Xiong, Wen-mei Hwu

MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications.

Distributed, Parallel, and Cluster Computing

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations14 Oct 2020 Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

1 code implementation28 Jul 2020 Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, JinJun Xiong, Rakesh Nagi, Wen-mei Hwu

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020.

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations6 May 2020 Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

1 code implementation CVPR 2020 Zhonghao Wang, Mo Yu, Yunchao Wei, Rogerio Feris, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work.

Semantic Segmentation Unsupervised Domain Adaptation

DLSpec: A Deep Learning Task Exchange Specification

no code implementations26 Feb 2020 Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu

Deep Learning (DL) innovations are being introduced at a rapid pace.

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale

no code implementations19 Feb 2020 Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.

Benchmarking

The Design and Implementation of a Scalable DL Benchmarking Platform

no code implementations19 Nov 2019 Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

MLModelScope defines abstractions for frameworks and supports board range of DL models and evaluation scenarios.

Benchmarking

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)

no code implementations18 Nov 2019 Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e. g. within $95\%$ accuracy and up to $4. 4\times$ benchmarking time speedup on Amazon EC2 c5. xlarge).

Benchmarking Image Classification +3

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations18 Nov 2019 Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

no code implementations16 Nov 2019 Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities.

Benchmarking

MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale

no code implementations25 Sep 2019 Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.

Benchmarking

PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space

no code implementations IJCNLP 2019 Omer Anjum, Hongyu Gong, Suma Bhat, Wen-mei Hwu, JinJun Xiong

Finding the right reviewers to assess the quality of conference submissions is a time consuming process for conference organizers.

Topic Models

SPGNet: Semantic Prediction Guidance for Scene Parsing

no code implementations ICCV 2019 Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, JinJun Xiong, Thomas Huang, Wen-mei Hwu, Honghui Shi

The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path.

Pose Estimation Scene Parsing +2

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

no code implementations19 Aug 2019 Cheng Li, Abdul Dakkak, JinJun Xiong, Wei Wei, Lingjie Xu, Wen-mei Hwu

Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack).

BIG-bench Machine Learning

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

1 code implementation25 Jun 2019 Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

object-detection Object Detection

A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications

no code implementations22 Jun 2019 Omer Anjum, Wen-mei Hwu, JinJun Xiong

Recently we decided to conduct a more thorough study based on all past papers of International Symposium on Computer Architecture (ISCA) from 1973 to 2018, which resulted this article.

document understanding Natural Language Understanding

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2 code implementations20 May 2019 Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

object-detection Object Detection

Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking

no code implementations29 Apr 2019 Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

An increasingly complex and diverse collection of Machine Learning (ML) models as well as hardware/software stacks, collectively referred to as "ML artifacts", are being proposed - leading to a diverse landscape of ML.

Benchmarking BIG-bench Machine Learning

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

2 code implementations9 Apr 2019 Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

C++ code object-detection +1

Frustrated with Replicating Claims of a Shared Model? A Solution

no code implementations24 Nov 2018 Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that model owners and evaluators are hard-pressed analyzing and studying them.

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

no code implementations24 Nov 2018 Abdul Dakkak, Cheng Li, Simon Garcia de Gonzalo, JinJun Xiong, Wen-mei Hwu

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines.

Distributed, Parallel, and Cluster Computing

A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization

no code implementations23 Nov 2018 Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively.

General Classification Image Classification +6

SCOPE: C3SR Systems Characterization and Benchmarking Framework

2 code implementations18 Sep 2018 Carl Pearson, Abdul Dakkak, Cheng Li, Sarah Hashash, JinJun Xiong, Wen-mei Hwu

This report presents the design of the Scope infrastructure for extensible and portable benchmarking.

Performance

Cannot find the paper you are looking for? You can Submit a new open access paper.