Code Search

48 papers with code • 5 benchmarks • 10 datasets

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Libraries

Use these libraries to find Code Search models and implementations

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

salesforce/codet5 13 May 2023

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

2,597
13 May 2023

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation

fsoft-ai4code/thevault 9 May 2023

We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code.

78
09 May 2023

Code Execution with Pre-trained Language Models

microsoft/CodeBERT 8 May 2023

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code.

1,991
08 May 2023

REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models

reinforest-team/reinforest 5 May 2023

This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs) by including both static and dynamic features as well as utilizing both similar and dissimilar examples during training.

4
05 May 2023

One Adapter for All Programming Languages? Adapter Tuning for Code Search and Summarization

wangdeze18/multilingual-adapter-for-se 28 Mar 2023

To alleviate the potentially catastrophic forgetting issue in multilingual models, we fix all pre-trained model parameters, insert the parameter-efficient structure adapter, and fine-tune it.

17
28 Mar 2023

Global Contrastive Batch Sampling via Optimization on Sample Permutations

vinayak1/gcbs 23 Oct 2022

Contrastive Learning has recently achieved state-of-the-art performance in a wide range of tasks.

5
23 Oct 2022

Exploring Representation-Level Augmentation for Code Search

alex-haochenli/racs 21 Oct 2022

In this paper, we explore augmentation methods that augment data (both code and query) at representation level which does not require additional data processing and training, and based on this we propose a general format of representation-level augmentation that unifies existing methods.

24
21 Oct 2022

XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence

reddy-lab-code-research/xlcost 16 Jun 2022

To the best of our knowledge, it is the largest parallel dataset for source code both in terms of size and the number of languages.

57
16 Jun 2022

NS3: Neuro-Symbolic Semantic Code Search

shushanarakelyan/modular_code_search 21 May 2022

We compare our model - NS3 (Neuro-Symbolic Semantic Search) - to a number of baselines, including state-of-the-art semantic code retrieval methods, and evaluate on two datasets - CodeSearchNet and Code Search and Question Answering.

7
21 May 2022

UniXcoder: Unified Cross-Modal Pre-training for Code Representation

microsoft/CodeBERT ACL 2022

Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.

1,991
08 Mar 2022