Trending Research

Ordered by accumulated GitHub stars in last 3 days
Trending Latest Greatest
1
Card image cap
A Multi-Object Rectified Attention Network for Scene Text Recognition
The MORAN consists of a multi-object rectification network and an attention-based sequence recognition network. It decreases the difficulty of recognition and enables the attention-based sequence recognition network to more easily read irregular text.

118
1.20 stars / hour
 Paper  Code
2
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
A key technical challenge in performing 6D object pose estimation from RGB-D image is to fully leverage the two complementary data sources. Prior works either extract information from the RGB image and depth separately or use costly post-processing steps, limiting their performances in highly cluttered scenes and real-time applications.

101
1.08 stars / hour
 Paper  Code
3
Card image cap
ALiPy: Active Learning in Python
Supervised machine learning methods usually require a large set of labeled examples for model training. However, in many real applications, there are plentiful unlabeled data but limited labeled data; and the acquisition of labels is costly.

50
0.91 stars / hour
 Paper  Code
4
Card image cap
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, \textit{Transformer-XL}, that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence.
662
0.73 stars / hour
 Paper  Code
5
Card image cap
Gradient Harmonized Single-stage Detector
Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i.e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples. In this work, we first point out that the essential effect of the two disharmonies can be summarized in term of the gradient.

143
0.66 stars / hour
 Paper  Code
6
Card image cap
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
11,194
0.54 stars / hour
 Paper  Code
7
Card image cap
U-Net: Convolutional Networks for Biomedical Image Segmentation
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently.

86
0.51 stars / hour
 Paper  Code
8
Card image cap
Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation
More specifically, these techniques have been successfully applied to medical image classification, segmentation, and detection tasks. In this paper, we propose a Recurrent Convolutional Neural Network (RCNN) based on U-Net as well as a Recurrent Residual Convolutional Neural Network (RRCNN) based on U-Net models, which are named RU-Net and R2U-Net respectively.

86
0.51 stars / hour
 Paper  Code
9
Card image cap
Attention U-Net: Learning Where to Look for the Pancreas
We propose a novel attention gate (AG) model for medical imaging that automatically learns to focus on target structures of varying shapes and sizes. Models trained with AGs implicitly learn to suppress irrelevant regions in an input image while highlighting salient features useful for a specific task.

86
0.51 stars / hour
 Paper  Code
10
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards.
118,880
0.46 stars / hour
 Paper  Code
11
Card image cap
FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction
The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e.g., image-level, region-level, and pixel-level are diverging. Generally, network structures designed specifically for image classification are directly used as default backbone structure for other tasks including detection and segmentation, but there is seldom backbone structure designed under the consideration of unifying the advantages of networks designed for pixel-level or region-level predicting tasks, which may require very deep features with high resolution.
246
0.46 stars / hour
 Paper  Code
12
Card image cap
Weight Uncertainty in Neural Networks
We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood.
118
0.43 stars / hour
 Paper  Code
13
Card image cap
Bayesian Convolutional Neural Networks with Variational Inference
We introduce Bayesian convolutional neural networks with variational inference, a variant of convolutional neural networks (CNNs), in which the intractable posterior probability distributions over weights are inferred by Bayes by Backprop. We demonstrate how this reliable variational inference method can serve as a fundamental construct for various network architectures.
118
0.43 stars / hour
 Paper  Code
14
Card image cap
Weight Uncertainty in Neural Networks
We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood.
118
0.43 stars / hour
 Paper  Code
15
Card image cap
Bayesian Convolutional Neural Networks with Variational Inference
We introduce Bayesian convolutional neural networks with variational inference, a variant of convolutional neural networks (CNNs), in which the intractable posterior probability distributions over weights are inferred by Bayes by Backprop. We demonstrate how this reliable variational inference method can serve as a fundamental construct for various network architectures.
118
0.43 stars / hour
 Paper  Code
16
Card image cap
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

779
0.42 stars / hour
 Paper  Code
17
Card image cap
models
Models and examples built with TensorFlow
47,356
0.40 stars / hour
 Paper  Code
18
Card image cap
Quasi-hyperbolic momentum and Adam for deep learning
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step.
36
0.38 stars / hour
 Paper  Code
19
Card image cap
EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
656
0.33 stars / hour
 Paper  Code
20
A General Optimization-based Framework for Global Pose Estimation with Multiple Sensors
Local estimations, produced by existing VO/VIO approaches, are fused with global sensors in a pose graph optimization. We highlight that our system is a general framework, which can easily fuse various global sensors in a unified pose graph optimization.

254
0.31 stars / hour
 Paper  Code
21
A General Optimization-based Framework for Local Odometry Estimation with Multiple Sensors
In this paper, we proposed a general optimization-based framework for odometry estimation, which supports multiple sensor sets. We validate the performance of our system on public datasets and through real-world experiments with multiple sensors.

254
0.31 stars / hour
 Paper  Code
22
Card image cap
Deep Residual Learning for Image Recognition
We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
11,155
0.29 stars / hour
 Paper  Code
23
Card image cap
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
2,848
0.27 stars / hour
 Paper  Code
24
Card image cap
Neural Ordinary Differential Equations
Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed.
1,627
0.26 stars / hour
 Paper  Code
25
Card image cap
FaceBoxes: A CPU Real-time Face Detector with High Accuracy
The RDCL is designed to enable FaceBoxes to achieve real-time speed on the CPU. The MSCL aims at enriching the receptive fields and discretizing anchors over different layers to handle faces of various scales.
306
0.24 stars / hour
 Paper  Code
26
Card image cap
Markerless tracking of user-defined features with deep learning
Quantifying behavior is crucial for many applications in neuroscience. Videography provides easy methods for the observation and recording of animal behavior in diverse settings, yet extracting particular aspects of a behavior for further analysis can be highly time consuming.

515
0.22 stars / hour
 Paper  Code
27
Card image cap
Consistent Individualized Feature Attribution for Tree Ensembles
Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristic and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature's assigned importance when the true impact of that feature actually increases.

3,357
0.22 stars / hour
 Paper  Code
28
insightface
Face Recognition Project on MXNet
2,436
0.21 stars / hour
 Paper  Code
29
Card image cap
Auto-Keras: Efficient Neural Architecture Search with Network Morphism
Neural architecture search (NAS) has been proposed to automatically tune deep neural networks, but existing search algorithms usually suffer from expensive computational cost. Network morphism, which keeps the functionality of a neural network while changing its neural architecture, could be helpful for NAS by enabling a more efficient training during the search.
4,362
0.21 stars / hour
 Paper  Code
30
Card image cap
Detectron
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
18,724
0.21 stars / hour
 Paper  Code
31
Card image cap
SplineCNN: Fast Geometric Deep Learning with Continuous B-Spline Kernels
We present Spline-based Convolutional Neural Networks (SplineCNNs), a variant of deep neural networks for irregular structured and geometric input, e.g., graphs or meshes. Our main contribution is a novel convolution operator based on B-splines, that makes the computation time independent from the kernel size due to the local support property of the B-spline basis functions.
1,085
0.21 stars / hour
 Paper  Code
32
Card image cap
Self-Attention Generative Adversarial Networks
In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps.
22
0.20 stars / hour
 Paper  Code
33
Card image cap
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations?

56
0.20 stars / hour
 Paper  Code
34
Card image cap
Targeted Syntactic Evaluation of Language Models
We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence.

56
0.20 stars / hour
 Paper  Code
35
Mask R-CNN
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection.
9,747
0.20 stars / hour
 Paper  Code
36
Card image cap
Link and code: Fast indexing with graphs and compact regression codes
Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements. In this paper, we revisit these approaches by considering, additionally, the memory constraint required to index billions of images on a single server.
5,550
0.20 stars / hour
 Paper  Code
37
Card image cap
Billion-scale similarity search with GPUs
Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art.
5,550
0.20 stars / hour
 Paper  Code
38
Card image cap
Polysemous codes
This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance.

5,550
0.20 stars / hour
 Paper  Code
39
Card image cap
BindsNET: A machine learning-oriented spiking neural networks library in Python
In this paper, we describe a new Python package for the simulation of spiking neural networks, specifically geared towards machine learning and reinforcement learning. We also provide an interface into the OpenAI gym library, allowing for training and evaluation of spiking networks on reinforcement learning problems.
227
0.19 stars / hour
 Paper  Code
40
Card image cap
tensor2tensor
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

6,516
0.19 stars / hour
 Paper  Code
41
Card image cap
Deep Painterly Harmonization
Copying an element from a photo and pasting it into a painting is a challenging task. Applying photo compositing techniques in this context yields subpar results that look like a collage --- and existing painterly stylization algorithms, which are global, perform poorly when applied locally.

5,118
0.19 stars / hour
 Paper  Code
42
Card image cap
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems. Despite great progress, existing methods seem to have a strong bias towards low- or high-order interactions, or require expertise feature engineering.

528
0.18 stars / hour
 Paper  Code
43
Card image cap
Progressive Neural Architecture Search
We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based optimization (SMBO) strategy, in which we search for structures in order of increasing complexity, while simultaneously learning a surrogate model to guide the search through structure space.
3,240
0.18 stars / hour
 Paper  Code
44
Card image cap
openpose
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
10,782
0.18 stars / hour
 Paper  Code
45
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process. Also, in order to utilize recent advances in machine intelligence and deep learning we need to collect a large amount of annotated training data in a variety of conditions and environments.

6,777
0.18 stars / hour
 Paper  Code
46
Card image cap
openpose_unity_plugin
OpenPose's Unity Plugin for Unity users

39
0.18 stars / hour
 Paper  Code
47
Card image cap
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. COCO test-dev results are up to 41.4 mAP for RetinaMask-101 vs 39.1mAP for RetinaNet-101, while the runtime is the same during evaluation.

97
0.18 stars / hour
 Paper  Code
48
Card image cap
Analogical Reasoning on Chinese Morphological and Semantic Relations
Analogical reasoning is effective in capturing linguistic regularities. This paper proposes an analogical reasoning task on Chinese.
3,782
0.17 stars / hour
 Paper  Code
49
Addressing the Fundamental Tension of PCGML with Discriminative Learning
This approach presents a fundamental tension: the more design effort expended to produce detailed training examples for shaping a generator, the lower the return on investment from applying PCGML in the first place. In response, we propose the use of discriminative models (which capture the validity of a design rather the distribution of the content) trained on positive and negative examples.

11,272
0.17 stars / hour
 Paper  Code
50
Card image cap
EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning
The edge generator hallucinates edges of the missing region (both regular and irregular) of the image, and the image completion network fills in the missing regions using hallucinated edges as a priori. We evaluate our model end-to-end over the publicly available datasets CelebA, Places2, and Paris StreetView, and show that it outperforms current state-of-the-art techniques quantitatively and qualitatively.
254
0.17 stars / hour
 Paper  Code
51
Card image cap
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
282
0.17 stars / hour
 Paper  Code
52
Card image cap
Recovering Realistic Texture in Image Super-resolution by Deep Spatial Feature Transform
In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps.
282
0.17 stars / hour
 Paper  Code
53
Card image cap
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

282
0.17 stars / hour
 Paper  Code
54
Card image cap
Attention Is All You Need
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism.
1,916
0.17 stars / hour
 Paper  Code
55
Card image cap
A Structured Self-attentive Sentence Embedding
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence.

1,916
0.17 stars / hour
 Paper  Code
56
Card image cap
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers.
2,068
0.16 stars / hour
 Paper  Code