About

The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.

Source: When Deep Learning Met Code Search

Benchmarks

You can find evaluation results in the subtasks. You can also submitting evaluation metrics for this task.

Subtasks

Datasets

Latest papers without code

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

15 Feb 2021

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.

CODE SEARCH LANGUAGE MODELLING SELF-SUPERVISED LEARNING

PalmTree: Learning an Assembly Language Model for Instruction Embedding

21 Jan 2021

Deep learning has demonstrated its strengths in numerous binary analysis tasks, including function boundary detection, binary code search, function prototype inference, value set analysis, etc.

BOUNDARY DETECTION CODE SEARCH LANGUAGE MODELLING

InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees

13 Dec 2020

We trained an InferCode model instance using the Tree-based CNN as the encoder of a large set of Java code and applied it to downstream unsupervised tasks such as code clustering, code clone detection, cross-language code search or reused under a transfer learning scheme to continue training the model weights for supervised tasks such as code classification and method name prediction.

CODE SEARCH GRAPH CONSTRUCTION METHOD NAME PREDICTION SELF-SUPERVISED LEARNING TRANSFER LEARNING

COSEA: Convolutional Code Search with Layer-wise Attention

19 Oct 2020

However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes.

CODE SEARCH

Evaluation of Siamese Networks for Semantic Code Search

12 Oct 2020

With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common.

CODE SEARCH

GraphCodeBERT: Pre-training Code Representations with Data Flow

ICLR 2021

Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.

CODE SEARCH CODE SUMMARIZATION LANGUAGE MODELLING

Simplifying Deep-Learning-Based Model for Code Search

29 May 2020

Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97% in terms of MRR (a widely used accuracy measure for code search), and it is over 66 times faster than DeepCS.

CODE SEARCH INFORMATION RETRIEVAL

A Multi-Perspective Architecture for Semantic Code Search

ACL 2020

The ability to match pieces of code to their corresponding natural language descriptions and vice versa is fundamental for natural language search interfaces to software repositories.

CODE SEARCH TEXT MATCHING

SCELMo: Source Code Embeddings from Language Models

28 Apr 2020

Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair.

CODE SEARCH PROGRAM REPAIR

Semantic Source Code Search: A Study of the Past and a Glimpse at the Future

15 Aug 2019

With the recent explosion in the size and complexity of source codebases and software projects, the need for efficient source code search engines has increased dramatically.

CODE SEARCH INFORMATION RETRIEVAL