Search Results for author: Hongyu Zhang

Found 52 papers, 30 papers with code

APIRecX: Cross-Library API Recommendation via Pre-Trained Language Model

no code implementations • EMNLP 2021 • Yuning Kang, Zan Wang, Hongyu Zhang, Junjie Chen, Hanmo You

APIRecX can migrate the knowledge of existing libraries to a new library, and can recommend APIs that are previously regarded as OOV.

Language Modelling

Paper
Add Code

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

no code implementations • 26 Apr 2024 • Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, Hai Jin

In particular, we first explore the ways of transforming structured tabular data into sequential text prompts, as to feed them into LLMs and analyze which table content contributes most to the NL2Vis.

Paper
Add Code

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation

1 code implementation • 24 Apr 2024 • Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin

We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures?

counterfactual Counterfactual Explanation +2

Paper
Code

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

1 code implementation • 24 Apr 2024 • Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Yulei Sui, Pan Zhou, Lichao Sun

As Large Language Models (LLMs) are increasingly used to automate code generation, it is often desired to know if the code is AI-generated and by which model, especially for purposes like protecting intellectual property (IP) in industry and preventing academic misconduct in education.

Code Generation

240

Paper
Code

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

1 code implementation • 22 Apr 2024 • Yao Wan, Guanghua Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Pan Zhou, Hai Jin, Lichao Sun

Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model.

Code Completion Memorization

240

Paper
Code

VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs

no code implementations • 9 Apr 2024 • Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Shaoling Dong, Xing Zhou, Wenbin Jiang

Automatically generating UI code from webpage design visions can significantly alleviate the burden of developers, enabling beginner developers or designers to directly generate Web pages from design diagrams.

Code Generation

Paper
Add Code

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

no code implementations • 25 Mar 2024 • Zhangqian Bi, Yao Wan, Zheng Wang, Hongyu Zhang, Batu Guan, Fangxin Lu, Zili Zhang, Yulei Sui, Xuanhua Shi, Hai Jin

Large language models (LLMs) have shown remarkable progress in automated code generation.

Code Generation

Paper
Add Code

FedHCDR: Federated Cross-Domain Recommendation with Hypergraph Signal Decoupling

1 code implementation • 5 Mar 2024 • Hongyu Zhang, Dongyi Zheng, Lin Zhong, Xu Yang, Jiyuan Feng, Yunqing Feng, Qing Liao

Specifically, to address the data heterogeneity across domains, we introduce an approach called hypergraph signal decoupling (HSD) to decouple the user features into domain-exclusive and domain-shared features.

Contrastive Learning Data Augmentation +6

Paper
Code

NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

no code implementations • 20 Feb 2024 • Wei Zhao, Zhitao Hou, Siyuan Wu, Yan Gao, Haoyu Dong, Yao Wan, Hongyu Zhang, Yulei Sui, Haidong Zhang

Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis.

Natural Language Queries

Paper
Add Code

High-dimensional Bayesian Optimization via Covariance Matrix Adaptation Strategy

1 code implementation • 5 Feb 2024 • Lam Ngo, Huong Ha, Jeffrey Chan, Vu Nguyen, Hongyu Zhang

To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum, and then use BO to optimize the objective function within these regions.

Bayesian Optimization

Paper
Code

On the Semantics of LM Latent Space: A Vocabulary-defined Approach

no code implementations • 29 Jan 2024 • Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

In response, we introduce a pioneering method called vocabulary-defined semantics, which establishes a reference frame within the LM latent space, ensuring disentangled semantic analysis grounded in LM vocabulary.

Retrieval

Paper
Add Code

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

1 code implementation • 16 Jan 2024 • Wei Tao, Yucheng Zhou, Yanlin Wang, Hongyu Zhang, Haofen Wang, Wenqiang Zhang

However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i. e., good-practice commits), while the rest do not.

Denoising

Paper
Code

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

1 code implementation • 12 Jan 2024 • Yuling Shi, Hongyu Zhang, Chengcheng Wan, Xiaodong Gu

Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code.

Code Generation

Paper
Code

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

no code implementations • 30 Dec 2023 • Yao Wan, Yang He, Zhangqian Bi, JianGuo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip S. Yu

We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models.

Representation Learning

Paper
Add Code

Neuron-level LLM Patching for Code Generation

no code implementations • 8 Dec 2023 • Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

In this paper, we propose a novel and effective model editing approach, \textsc{MENT}, to patch LLMs in coding tasks.

Code Generation Model Editing

Paper
Add Code

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

1 code implementation • 15 Sep 2023 • Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao

Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL.

Data Augmentation Disentanglement +3

Paper
Code

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

no code implementations • 7 Sep 2023 • Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, JianHua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu

Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals.

EEG

Paper
Add Code

SoTaNa: The Open-Source Software Development Assistant

1 code implementation • 25 Aug 2023 • Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.

Code Summarization

125

Paper
Code

Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection

no code implementations • 24 Aug 2023 • Lin Yang, Junjie Chen, Shutao Gao, Zhihao Gong, Hongyu Zhang, Yue Kang, Huaan Li

This addresses the issue of unseen log events in training data, enhancing log representation.

Anomaly Detection

Paper
Add Code

Decarbonizing the European energy system in the absence of Russian gas: Hydrogen uptake and carbon capture developments in the power, heat and industry sectors

no code implementations • 17 Aug 2023 • Goran Durakovic, Hongyu Zhang, Brage Rugstad Knudsen, Asgeir Tomasgard, Pedro Crespo del Granado

Hydrogen and carbon capture and storage are pivotal to decarbonize the European energy system in a broad range of pathway scenarios.

Paper
Add Code

Modularizing while Training: A New Paradigm for Modularizing DNN Models

1 code implementation • 15 Jun 2023 • Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao

In this paper, we propose a novel approach that incorporates modularization into the model training process, i. e., modularizing-while-training (MwT).

Paper
Code

Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

no code implementations • 12 Jun 2023 • Huong Ha, Vu Nguyen, Hongyu Zhang, Anton Van Den Hengel

Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures unbiased estimation from the observed data.

Bayesian Optimization

Paper
Add Code

Log Parsing: How Far Can ChatGPT Go?

1 code implementation • 2 Jun 2023 • Van-Hoang Le, Hongyu Zhang

Our results show that ChatGPT can achieve promising results for log parsing with appropriate prompts, especially with few-shot prompting.

Language Modelling Large Language Model +1

Paper
Code

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

1 code implementation • 11 Apr 2023 • Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun

Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.

Paper
Code

Reusing Deep Neural Network Models through Model Re-engineering

1 code implementation • 1 Apr 2023 • Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang, Zhaotian Li, Xudong Liu

Prior approaches to DNN model reuse have two main limitations: 1) reusing the entire model, while only a small part of the model's functionalities (labels) are required, would cause much overhead (e. g., computational and time costs for inference), and 2) model reuse would inherit the defects and weaknesses of the reused model, and hence put the new system under threats of security attack.

Paper
Code

xASTNN: Improved Code Representations for Industrial Practice

no code implementations • 13 Mar 2023 • Zhiwei Xu, Min Zhou, Xibin Zhao, Yang Chen, Xi Cheng, Hongyu Zhang

The proposed xASTNN has three advantages.

Clone Detection Code Classification

Paper
Add Code

Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

no code implementations • 27 Dec 2022 • Huong Ha, Zongwen Fan, Hongyu Zhang

We also develop a novel uncertainty calibration technique to ensure the reliability of the confidence intervals generated by a Bayesian prediction model.

Paper
Add Code

Automatic Semantic Modeling for Structural Data Source with the Prior Knowledge from Knowledge Base

1 code implementation • 21 Dec 2022 • Jiakang Xu, Wolfgang Mayer, Hongyu Zhang, Keqing He, Zaiwen Feng

Therefore, an automatic approach for learning the semantics of a data source is desirable.

Graph Matching

Paper
Code

Exploring Representation-Level Augmentation for Code Search

1 code implementation • 21 Oct 2022 • Haochen Li, Chunyan Miao, Cyril Leung, Yanxian Huang, Yuan Huang, Hongyu Zhang, Yanlin Wang

In this paper, we explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training, and based on this we propose a general format of representation-level augmentation that unifies existing methods.

Code Search Contrastive Learning +1

Paper
Code

Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory Instances

no code implementations • 17 Sep 2022 • Minghua Ma, Zhao Tian, Max Hort, Federica Sarro, Hongyu Zhang, QIngwei Lin, Dongmei Zhang

In this paper, we propose an approach for the selection of the initial seeds to generate IDIs for fairness testing.

Decision Making Fairness

Paper
Add Code

LogGD:Detecting Anomalies from System Logs by Graph Neural Networks

no code implementations • 16 Sep 2022 • Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies.

Anomaly Detection

Paper
Add Code

Patching Weak Convolutional Neural Network Models through Modularization and Composition

1 code implementation • 11 Sep 2022 • Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang

To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model.

Paper
Code

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

1 code implementation • 24 Jul 2022 • Chaozheng Wang, Yuanhang Yang, Cuiyun Gao, Yun Peng, Hongyu Zhang, Michael R. Lyu

Besides, the performance of fine-tuning strongly relies on the amount of downstream data, while in practice, the scenarios with scarce data are common.

Code Summarization Code Translation

Paper
Code

CoCoSoDa: Effective Contrastive Learning for Code Search

no code implementations • 7 Apr 2022 • Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

However, there is still a lot of room for improvement in using contrastive learning for code search.

Code Search Contrastive Learning +2

Paper
Add Code

Accelerating Code Search with Deep Hashing and Code Classification

no code implementations • ACL 2022 • Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael R. Lyu

Code search is to search reusable code snippets from source code corpus based on natural languages queries.

Classification Code Classification +2

Paper
Add Code

RACE: Retrieval-Augmented Commit Message Generation

2 code implementations • 5 Mar 2022 • Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

Furthermore, RACE can boost the performance of existing Seq2Seq models in commit message generation.

Information Retrieval Retrieval +2

Paper
Code

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

1 code implementation • 14 Feb 2022 • Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin

In this paper, we conduct a thorough structural analysis aiming to provide an interpretation of pre-trained language models for source code (e. g., CodeBERT, and GraphCodeBERT) from three distinctive perspectives: (1) attention analysis, (2) probing on the word embedding, and (3) syntax tree induction.

Code Completion Code Search +1

240

Paper
Code

Log-based Anomaly Detection with Deep Learning: How Far Are We?

1 code implementation • 9 Feb 2022 • Van-Hoang Le, Hongyu Zhang

Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data.

Anomaly Detection

144

Paper
Code

Cross-Language Binary-Source Code Matching with Intermediate Representations

1 code implementation • 19 Jan 2022 • Yi Gui, Yao Wan, Hongyu Zhang, Huifang Huang, Yulei Sui, Guandong Xu, Zhiyuan Shao, Hai Jin

Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment.

Malware Detection

240

Paper
Code

Graph-based Incident Aggregation for Large-Scale Online Service Systems

1 code implementation • 27 Aug 2021 • Zhuangbin Chen, Jinyang Liu, Yuxin Su, Hongyu Zhang, Xuemin Wen, Xiao Ling, Yongqiang Yang, Michael R. Lyu

The proposed framework is evaluated with real-world incident data collected from a large-scale online service system of Huawei Cloud.

Graph Representation Learning Management

Paper
Code

Log-based Anomaly Detection Without Log Parsing

1 code implementation • 4 Aug 2021 • Van-Hoang Le, Hongyu Zhang

The log parsing errors could cause the loss of important information for anomaly detection.

Anomaly Detection Log Parsing

Paper
Code

On the Evaluation of Neural Code Summarization

1 code implementation • 15 Jul 2021 • Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun

To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.

Code Summarization Source Code Summarization

Paper
Code

On the Evaluation of Commit Message Generation Models: An Experimental Study

1 code implementation • 12 Jul 2021 • Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang

We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.

Retrieval

Paper
Code

Embedding API Dependency Graph for Neural Code Generation

1 code implementation • 29 Mar 2021 • Chen Lyu, Ruyun Wang, Hongyu Zhang, Hanwen Zhang, Songlin Hu

In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description.

Code Generation Graph Embedding

Paper
Code

A New Look and Convergence Rate of Federated Multi-Task Learning with Laplacian Regularization

2 code implementations • 14 Feb 2021 • Canh T. Dinh, Tung T. Vu, Nguyen H. Tran, Minh N. Dao, Hongyu Zhang

Non-Independent and Identically Distributed (non- IID) data distribution among clients is considered as the key factor that degrades the performance of federated learning (FL).

Few-Shot Learning Multi-Task Learning +1

Paper
Code

Confused Modulo Projection based Somewhat Homomorphic Encryption -- Cryptosystem, Library and Applications on Secure Smart Cities

no code implementations • 19 Dec 2020 • Xin Jin, Hongyu Zhang, XiaoDong Li, Haoyang Yu, Beisheng Liu, Shujiang Xie, Amit Kumar Singh, Yujie Li

To make this algorithm easy to use, we also designed and implemented an efficient general blind computing library based on CMP-SWHE.

Cloud Computing object-detection +2

Paper
Add Code

Language Modelling for Source Code with Transformer-XL

1 code implementation • 31 Jul 2020 • Thomas Dowdell, Hongyu Zhang

It has been found that software, like natural language texts, exhibits "naturalness", which can be captured by statistical language models.

Language Modelling

Paper
Code

Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention

1 code implementation • 27 Dec 2019 • Thomas Dowdell, Hongyu Zhang

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner.

Language Modelling

Paper
Code

Cost-Effective Testing of a Deep Learning Model through Input Reduction

1 code implementation • 25 Sep 2019 • Jianyi Zhou, Feng Li, Jinhao Dong, Hongyu Zhang, Dan Hao

Experiments with various DL models and datasets show that our approach can reduce the whole testing data to 4. 6\% on average, and can reliably estimate the performance of DL models.

Paper
Code

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

no code implementations • 25 Apr 2017 • Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings.

Paper
Add Code

Neural Programming by Example

no code implementations • 15 Mar 2017 • Chengxun Shu, Hongyu Zhang

In this paper, we propose a deep neural networks (DNN) based PBE model called Neural Programming by Example (NPBE), which can learn from input-output strings and induce programs that solve the string manipulation problems.

Paper
Add Code

Deep API Learning

no code implementations • 27 May 2016 • Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query.

Information Retrieval Language Modelling +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.