Search Results for author: Hongyu Zhang

Found 52 papers, 30 papers with code

APIRecX: Cross-Library API Recommendation via Pre-Trained Language Model

no code implementations EMNLP 2021 Yuning Kang, Zan Wang, Hongyu Zhang, Junjie Chen, Hanmo You

APIRecX can migrate the knowledge of existing libraries to a new library, and can recommend APIs that are previously regarded as OOV.

Language Modelling

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

no code implementations26 Apr 2024 Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, Hai Jin

In particular, we first explore the ways of transforming structured tabular data into sequential text prompts, as to feed them into LLMs and analyze which table content contributes most to the NL2Vis.

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation

1 code implementation24 Apr 2024 Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin

We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures?

counterfactual Counterfactual Explanation +2

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

1 code implementation24 Apr 2024 Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Yulei Sui, Pan Zhou, Lichao Sun

As Large Language Models (LLMs) are increasingly used to automate code generation, it is often desired to know if the code is AI-generated and by which model, especially for purposes like protecting intellectual property (IP) in industry and preventing academic misconduct in education.

Code Generation

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

1 code implementation22 Apr 2024 Yao Wan, Guanghua Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Pan Zhou, Hai Jin, Lichao Sun

Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model.

Code Completion Memorization

VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs

no code implementations9 Apr 2024 Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Shaoling Dong, Xing Zhou, Wenbin Jiang

Automatically generating UI code from webpage design visions can significantly alleviate the burden of developers, enabling beginner developers or designers to directly generate Web pages from design diagrams.

Code Generation

FedHCDR: Federated Cross-Domain Recommendation with Hypergraph Signal Decoupling

1 code implementation5 Mar 2024 Hongyu Zhang, Dongyi Zheng, Lin Zhong, Xu Yang, Jiyuan Feng, Yunqing Feng, Qing Liao

Specifically, to address the data heterogeneity across domains, we introduce an approach called hypergraph signal decoupling (HSD) to decouple the user features into domain-exclusive and domain-shared features.

Contrastive Learning Data Augmentation +6

NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

no code implementations20 Feb 2024 Wei Zhao, Zhitao Hou, Siyuan Wu, Yan Gao, Haoyu Dong, Yao Wan, Hongyu Zhang, Yulei Sui, Haidong Zhang

Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis.

Natural Language Queries

High-dimensional Bayesian Optimization via Covariance Matrix Adaptation Strategy

1 code implementation5 Feb 2024 Lam Ngo, Huong Ha, Jeffrey Chan, Vu Nguyen, Hongyu Zhang

To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum, and then use BO to optimize the objective function within these regions.

Bayesian Optimization

On the Semantics of LM Latent Space: A Vocabulary-defined Approach

no code implementations29 Jan 2024 Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

In response, we introduce a pioneering method called vocabulary-defined semantics, which establishes a reference frame within the LM latent space, ensuring disentangled semantic analysis grounded in LM vocabulary.

Retrieval

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

1 code implementation16 Jan 2024 Wei Tao, Yucheng Zhou, Yanlin Wang, Hongyu Zhang, Haofen Wang, Wenqiang Zhang

However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i. e., good-practice commits), while the rest do not.

Denoising

Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human Programmers

1 code implementation12 Jan 2024 Yuling Shi, Hongyu Zhang, Chengcheng Wan, Xiaodong Gu

Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code.

Code Generation

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

no code implementations30 Dec 2023 Yao Wan, Yang He, Zhangqian Bi, JianGuo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip S. Yu

We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models.

Representation Learning

Neuron-level LLM Patching for Code Generation

no code implementations8 Dec 2023 Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

In this paper, we propose a novel and effective model editing approach, \textsc{MENT}, to patch LLMs in coding tasks.

Code Generation Model Editing

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

1 code implementation15 Sep 2023 Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao

Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL.

Data Augmentation Disentanglement +3

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

no code implementations7 Sep 2023 Cunhang Fan, Hongyu Zhang, Wei Huang, Jun Xue, JianHua Tao, Jiangyan Yi, Zhao Lv, Xiaopei Wu

Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals.

EEG

SoTaNa: The Open-Source Software Development Assistant

1 code implementation25 Aug 2023 Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.

Code Summarization

Modularizing while Training: A New Paradigm for Modularizing DNN Models

1 code implementation15 Jun 2023 Binhang Qi, Hailong Sun, Hongyu Zhang, Ruobing Zhao, Xiang Gao

In this paper, we propose a novel approach that incorporates modularization into the model training process, i. e., modularizing-while-training (MwT).

Provably Efficient Bayesian Optimization with Unbiased Gaussian Process Hyperparameter Estimation

no code implementations12 Jun 2023 Huong Ha, Vu Nguyen, Hongyu Zhang, Anton Van Den Hengel

Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process, and employs a novel training loss function for the GP hyperparameter estimation process that ensures unbiased estimation from the observed data.

Bayesian Optimization

Log Parsing: How Far Can ChatGPT Go?

1 code implementation2 Jun 2023 Van-Hoang Le, Hongyu Zhang

Our results show that ChatGPT can achieve promising results for log parsing with appropriate prompts, especially with few-shot prompting.

Language Modelling Large Language Model +1

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

1 code implementation11 Apr 2023 Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun

Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.

Reusing Deep Neural Network Models through Model Re-engineering

1 code implementation1 Apr 2023 Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang, Zhaotian Li, Xudong Liu

Prior approaches to DNN model reuse have two main limitations: 1) reusing the entire model, while only a small part of the model's functionalities (labels) are required, would cause much overhead (e. g., computational and time costs for inference), and 2) model reuse would inherit the defects and weaknesses of the reused model, and hence put the new system under threats of security attack.

Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks

no code implementations27 Dec 2022 Huong Ha, Zongwen Fan, Hongyu Zhang

We also develop a novel uncertainty calibration technique to ensure the reliability of the confidence intervals generated by a Bayesian prediction model.

Exploring Representation-Level Augmentation for Code Search

1 code implementation21 Oct 2022 Haochen Li, Chunyan Miao, Cyril Leung, Yanxian Huang, Yuan Huang, Hongyu Zhang, Yanlin Wang

In this paper, we explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training, and based on this we propose a general format of representation-level augmentation that unifies existing methods.

Code Search Contrastive Learning +1

LogGD:Detecting Anomalies from System Logs by Graph Neural Networks

no code implementations16 Sep 2022 Yongzheng Xie, Hongyu Zhang, Muhammad Ali Babar

They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies.

Anomaly Detection

Patching Weak Convolutional Neural Network Models through Modularization and Composition

1 code implementation11 Sep 2022 Binhang Qi, Hailong Sun, Xiang Gao, Hongyu Zhang

To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model.

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

1 code implementation24 Jul 2022 Chaozheng Wang, Yuanhang Yang, Cuiyun Gao, Yun Peng, Hongyu Zhang, Michael R. Lyu

Besides, the performance of fine-tuning strongly relies on the amount of downstream data, while in practice, the scenarios with scarce data are common.

Code Summarization Code Translation

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

1 code implementation14 Feb 2022 Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin

In this paper, we conduct a thorough structural analysis aiming to provide an interpretation of pre-trained language models for source code (e. g., CodeBERT, and GraphCodeBERT) from three distinctive perspectives: (1) attention analysis, (2) probing on the word embedding, and (3) syntax tree induction.

Code Completion Code Search +1

Log-based Anomaly Detection with Deep Learning: How Far Are We?

1 code implementation9 Feb 2022 Van-Hoang Le, Hongyu Zhang

Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data.

Anomaly Detection

Cross-Language Binary-Source Code Matching with Intermediate Representations

1 code implementation19 Jan 2022 Yi Gui, Yao Wan, Hongyu Zhang, Huifang Huang, Yulei Sui, Guandong Xu, Zhiyuan Shao, Hai Jin

Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment.

Malware Detection

Graph-based Incident Aggregation for Large-Scale Online Service Systems

1 code implementation27 Aug 2021 Zhuangbin Chen, Jinyang Liu, Yuxin Su, Hongyu Zhang, Xuemin Wen, Xiao Ling, Yongqiang Yang, Michael R. Lyu

The proposed framework is evaluated with real-world incident data collected from a large-scale online service system of Huawei Cloud.

Graph Representation Learning Management

Log-based Anomaly Detection Without Log Parsing

1 code implementation4 Aug 2021 Van-Hoang Le, Hongyu Zhang

The log parsing errors could cause the loss of important information for anomaly detection.

Anomaly Detection Log Parsing

On the Evaluation of Neural Code Summarization

1 code implementation15 Jul 2021 Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun

To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.

Code Summarization Source Code Summarization

On the Evaluation of Commit Message Generation Models: An Experimental Study

1 code implementation12 Jul 2021 Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang

We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.

Retrieval

Embedding API Dependency Graph for Neural Code Generation

1 code implementation29 Mar 2021 Chen Lyu, Ruyun Wang, Hongyu Zhang, Hanwen Zhang, Songlin Hu

In recent years, many deep learning based approaches have been proposed, which can generate a sequence of code from a sequence of textual program description.

Code Generation Graph Embedding

A New Look and Convergence Rate of Federated Multi-Task Learning with Laplacian Regularization

2 code implementations14 Feb 2021 Canh T. Dinh, Tung T. Vu, Nguyen H. Tran, Minh N. Dao, Hongyu Zhang

Non-Independent and Identically Distributed (non- IID) data distribution among clients is considered as the key factor that degrades the performance of federated learning (FL).

Few-Shot Learning Multi-Task Learning +1

Language Modelling for Source Code with Transformer-XL

1 code implementation31 Jul 2020 Thomas Dowdell, Hongyu Zhang

It has been found that software, like natural language texts, exhibits "naturalness", which can be captured by statistical language models.

Language Modelling

Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention

1 code implementation27 Dec 2019 Thomas Dowdell, Hongyu Zhang

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner.

Language Modelling

Cost-Effective Testing of a Deep Learning Model through Input Reduction

1 code implementation25 Sep 2019 Jianyi Zhou, Feng Li, Jinhao Dong, Hongyu Zhang, Dan Hao

Experiments with various DL models and datasets show that our approach can reduce the whole testing data to 4. 6\% on average, and can reliably estimate the performance of DL models.

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

no code implementations25 Apr 2017 Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings.

Neural Programming by Example

no code implementations15 Mar 2017 Chengxun Shu, Hongyu Zhang

In this paper, we propose a deep neural networks (DNN) based PBE model called Neural Programming by Example (NPBE), which can learn from input-output strings and induce programs that solve the string manipulation problems.

Deep API Learning

no code implementations27 May 2016 Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query.

Information Retrieval Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.