Feature Engineering

392 papers with code • 1 benchmarks • 5 datasets

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Benchmarks

Add a Result

These leaderboards are used to track progress in Feature Engineering

Trend	Dataset	Best Model	Paper	Code	Compare
	2019_test set	CNN			See all

Libraries

Use these libraries to find Feature Engineering models and implementations

shenweichen/DeepCTR

6 papers

7,342

xue-pai/FuxiCTR

6 papers

774

UlionTse/mlgb

6 papers

306

DataCanvasIO/DeepTables

4 papers

635

See all 12 libraries.

Datasets

Subtasks

Imputation

Most implemented papers

Most implemented Social Latest No code

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems

hwwang55/KGNN-LS • • 11 May 2019

Here we propose Knowledge-aware Graph Neural Networks with Label Smoothness regularization (KGNN-LS) to provide better recommendations.

Paper
Code

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

opendilab/DI-engine • • 1 Mar 2016

We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems.

Paper
Code

DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network

jaredleekatzman/DeepSurv • • 2 Jun 2016

We introduce DeepSurv, a Cox proportional hazards deep neural network and state-of-the-art survival method for modeling interactions between a patient's covariates and treatment effectiveness in order to provide personalized treatment recommendations.

Paper
Code

Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks

kimiyoung/transfer • 18 Mar 2017

Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks.

Paper
Code

Neural Vector Spaces for Unsupervised Information Retrieval

cvangysel/cuNVSM • 9 Aug 2017

We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval.

Paper
Code

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Abdulk084/Smiles2vec • 6 Dec 2017

Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software.

Paper
Code

Disfluency Detection using Auto-Correlational Neural Networks

pariajm/deep-disfluency-detector • • EMNLP 2018

In recent years, the natural language processing community has moved away from task-specific feature engineering, i. e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves.

Paper
Code

ML-Net: multi-label classification of biomedical texts with deep neural networks

jingcheng-du/ML_Net-1 • • 13 Nov 2018

Due to this nature, the multi-label text classification task is often considered to be more challenging compared to the binary or multi-class text classification problems.

Paper
Code

SAFE ML: Surrogate Assisted Feature Extraction for Model Learning

olagacek/SAFE • 28 Feb 2019

Complex black-box predictive models may have high accuracy, but opacity causes problems like lack of trust, lack of stability, sensitivity to concept drift.

Paper
Code

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

csinva/disentangled-attribution-curves • 18 May 2019

Tree ensembles, such as random forests and AdaBoost, are ubiquitous machine learning models known for achieving strong predictive performance across a wide variety of domains.

Paper
Code

Feature Engineering

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result