Browse > Methodology > Feature Engineering

Feature Engineering

117 papers with code · Methodology

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Named Entity Recognition with Bidirectional LSTM-CNNs

TACL 2016 zalandoresearch/flair

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance.

ENTITY LINKING FEATURE ENGINEERING NAMED ENTITY RECOGNITION WORD EMBEDDINGS

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

CVPR 2018 charlesq34/pointnet

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality.

3D OBJECT DETECTION AUTONOMOUS NAVIGATION FEATURE ENGINEERING OBJECT LOCALIZATION

Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data

1 Jul 2018shenweichen/DeepCTR

User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search.

CLICK-THROUGH RATE PREDICTION FEATURE ENGINEERING INFORMATION RETRIEVAL RECOMMENDATION SYSTEMS

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

12 Apr 2018shenweichen/DeepCTR

In this paper, we study two instances of DeepFM where its "deep" component is DNN and PNN respectively, for which we denote as DeepFM-D and DeepFM-P. Comprehensive experiments are conducted to demonstrate the effectiveness of DeepFM-D and DeepFM-P over the existing models for CTR prediction, on both benchmark data and commercial data.

CLICK-THROUGH RATE PREDICTION FEATURE ENGINEERING RECOMMENDATION SYSTEMS

Deep & Cross Network for Ad Click Predictions

17 Aug 2017shenweichen/DeepCTR

Feature engineering has been the key to the success of many prediction models.

CLICK-THROUGH RATE PREDICTION FEATURE ENGINEERING

Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

18 Apr 2017shenweichen/DeepCTR

CTR prediction in real-world business is a difficult machine learning problem with large scale nonlinear sparse data.

CLICK-THROUGH RATE PREDICTION FEATURE ENGINEERING

DeepFM: A Factorization-Machine based Neural Network for CTR Prediction

13 Mar 2017shenweichen/DeepCTR

Learning sophisticated feature interactions behind user behaviors is critical in maximizing CTR for recommender systems.

CLICK-THROUGH RATE PREDICTION FEATURE ENGINEERING RECOMMENDATION SYSTEMS

Wide & Deep Learning for Recommender Systems

24 Jun 2016shenweichen/DeepCTR

Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort.

CLICK-THROUGH RATE PREDICTION FEATURE ENGINEERING RECOMMENDATION SYSTEMS

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF

ACL 2016 guillaumegenthial/sequence_tagging

State-of-the-art sequence labeling systems traditionally require large amounts of task-specific knowledge in the form of hand-crafted features and data pre-processing.

FEATURE ENGINEERING NAMED ENTITY RECOGNITION PART-OF-SPEECH TAGGING

De-identification of Patient Notes with Recurrent Neural Networks

10 Jun 2016Franck-Dernoncourt/NeuroNER

We compare the performance of the system with state-of-the-art systems on two datasets: the i2b2 2014 de-identification challenge dataset, which is the largest publicly available de-identification dataset, and the MIMIC de-identification dataset, which we assembled and is twice as large as the i2b2 2014 dataset.

FEATURE ENGINEERING