Feature Engineering

393 papers with code • 1 benchmarks • 5 datasets

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Benchmarks

Add a Result

These leaderboards are used to track progress in Feature Engineering

Trend	Dataset	Best Model	Paper	Code	Compare
	2019_test set	CNN			See all

Libraries

Use these libraries to find Feature Engineering models and implementations

shenweichen/DeepCTR

6 papers

7,354

xue-pai/FuxiCTR

6 papers

788

UlionTse/mlgb

6 papers

311

DataCanvasIO/DeepTables

4 papers

636

See all 12 libraries.

Datasets

Subtasks

Imputation

Latest papers

Most implemented Social Latest No code

Feature Interaction Aware Automated Data Representation Transformation

ehtesam3154/inhrecon • • 29 Sep 2023

Creating an effective representation space is crucial for mitigating the curse of dimensionality, enhancing model generalization, addressing data sparsity, and leveraging classical models more effectively.

29 Sep 2023

Paper
Code

Context-Based Tweet Engagement Prediction

jovan_ns/2020recsystwitter • 28 Sep 2023

In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations.

28 Sep 2023

Paper
Code

Baichuan 2: Open Large-scale Language Models

baichuan-inc/baichuan2 • • 19 Sep 2023

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.

3,937

19 Sep 2023

Paper
Code

Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring

ai-zahran/E2E-R • • IEEE Access 2023

In the first step, the pre-trained SSL model is fine-tuned on a phoneme recognition task to obtain better representations for the pronounced phonemes.

19 Sep 2023

Paper
Code

SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems

s3l-official/slmia-sr • • 14 Sep 2023

Our attack is versatile and can work in both white-box and black-box scenarios.

14 Sep 2023

Paper
Code

Native Language Identification with Big Bird Embeddings

sergeykramp/mthesis-bigbird-embeddings • • 13 Sep 2023

Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language.

13 Sep 2023

Paper
Code

Effective Multi-Graph Neural Networks for Illicit Account Detection on Cryptocurrency Transaction Networks

tommydzh/diam • • 4 Sep 2023

Extensive experiments, comparing against 14 existing solutions on 4 large cryptocurrency datasets of Bitcoin and Ethereum, demonstrate that DIAM consistently achieves the best performance to accurately detect illicit accounts, while being efficient.

04 Sep 2023

Paper
Code

Interpolation of mountain weather forecasts by machine learning

kazumaiwase/interpolation-of-mountain-weather-forecasts- • 27 Aug 2023

Recent advances in numerical simulation methods based on physical models and their combination with machine learning have improved the accuracy of weather forecasts.

27 Aug 2023

Paper
Code

TrajPy: empowering feature engineering for trajectory analysis across domains

ocbe-uio/trajpy • 22 Aug 2023

The TrajPy package was developed in Python 3 and released under the GNU GPL-3 license.

22 Aug 2023

Paper
Code

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

sruthisudheer/comment-classification-of-c-code • • 11 Aug 2023

The performance of the classical bag of words model and transformer-based models were explored to identify significant features from the given training corpus.

11 Aug 2023

Paper
Code

Feature Engineering

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result