The goal of Code Search is to retrieve code fragments from a large code corpus that most closely match a developer’s intent, which is expressed in natural language.
Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks.
Deep learning has demonstrated its strengths in numerous binary analysis tasks, including function boundary detection, binary code search, function prototype inference, value set analysis, etc.
We trained an InferCode model instance using the Tree-based CNN as the encoder of a large set of Java code and applied it to downstream unsupervised tasks such as code clustering, code clone detection, cross-language code search or reused under a transfer learning scheme to continue training the model weights for supervised tasks such as code classification and method name prediction.
CODE SEARCH GRAPH CONSTRUCTION METHOD NAME PREDICTION SELF-SUPERVISED LEARNING TRANSFER LEARNING
However, most existing studies overlook the code's intrinsic structural logic, which indeed contains a wealth of semantic information, and fails to capture intrinsic features of codes.
With the increase in the number of open repositories and discussion forums, the use of natural language for semantic code search has become increasingly common.
Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.
Experimental results showed the simplified model CodeMatcher outperforms DeepCS by 97% in terms of MRR (a widely used accuracy measure for code search), and it is over 66 times faster than DeepCS.
The ability to match pieces of code to their corresponding natural language descriptions and vice versa is fundamental for natural language search interfaces to software repositories.
Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair.
With the recent explosion in the size and complexity of source codebases and software projects, the need for efficient source code search engines has increased dramatically.