Code Search

49 papers with code • 5 benchmarks • 10 datasets

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Benchmarks

Add a Result

These leaderboards are used to track progress in Code Search

Dataset	Best Model	Compare
CodeSearchNet	cpt-code M	See all
CoDesc	Self-attention	See all
CodeXGLUE - AdvTest	CodeT5+ 770M	See all
CodeSearchNet - Ruby	Uni-SBT	See all
CodeXGLUE - WebQueryTest	CodeBERT	See all

Libraries

Use these libraries to find Code Search models and implementations

microsoft/CodeBERT

5 papers

1,977

facebookresearch/CodeGen

2 papers

673

Datasets

Subtasks

Annotated Code Search

Latest papers

Most implemented Social Latest No code

AutoCodeRover: Autonomous Program Improvement

nus-apr/auto-code-rover • 8 Apr 2024

Recent progress in Large Language Models (LLMs) has significantly impacted the development process, where developers can use LLM-based programming assistants to achieve automated coding.

2,034

08 Apr 2024

Paper
Code

ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search

jordane95/procqa • • 25 Mar 2024

Retrieval-based code question answering seeks to match user queries in natural language to relevant code snippets.

25 Mar 2024

Paper
Code

Source Code Clone Detection Using Unsupervised Similarity Measures

jorge-martinez-gil/codesim • • 18 Jan 2024

Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation.

18 Jan 2024

Paper
Code

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation

iamfaith/transformcode • • 10 Nov 2023

Our framework has several advantages over existing methods: (1) It is flexible and adaptable, because it can easily be extended to other downstream tasks that require code representation (such as code-clone detection and classification); (2) it is efficient and scalable, because it does not require a large model or a large amount of training data, and it can support any programming language; (3) it is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives; and (4) it can also adjust the number of encoder parameters based on computing resources.

10 Nov 2023

Paper
Code

Language Models are Universal Embedders

izhx/uni-rep • 12 Oct 2023

As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario.

12 Oct 2023

Paper
Code

Rethinking Negative Pairs in Code Search

Alex-HaochenLi/Soft-InfoNCE • • 12 Oct 2023

In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE.

12 Oct 2023

Paper
Code

MELT: Mining Effective Lightweight Transformations from Pull Requests

squareslab/melt • 28 Aug 2023

By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in \comby, a language for structural code search and replace.

28 Aug 2023

Paper
Code

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

ynklab/xcodesearchnet • 27 Jun 2023

Code search is a task to find programming codes that semantically match the given natural language queries.

27 Jun 2023

Paper
Code

Structure-Aware Language Model Pretraining Improves Dense Retrieval on Structured Data

openmatch/openmatch • • 31 May 2023

SANTA proposes two pretraining methods to make language models structure-aware and learn effective representations for structured data: 1) Structured Data Alignment, which utilizes the natural alignment relations between structured data and unstructured data for structure-aware pretraining.

133

31 May 2023

Paper
Code

Backdooring Neural Code Search

wssun/badcode • • 27 May 2023

Neural code search models are hence behind many such engines.

27 May 2023

Paper
Code

Code Search

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result