Feature Engineering

393 papers with code • 1 benchmarks • 5 datasets

Feature engineering is the process of taking a dataset and constructing explanatory variables — features — that can be used to train a machine learning model for a prediction problem. Often, data is spread across multiple tables and must be gathered into a single table with rows containing the observations and features in the columns.

The traditional approach to feature engineering is to build features one at a time using domain knowledge, a tedious, time-consuming, and error-prone process known as manual feature engineering. The code for manual feature engineering is problem-dependent and must be re-written for each new dataset.

Libraries

Use these libraries to find Feature Engineering models and implementations
6 papers
7,354
6 papers
788
6 papers
311
See all 12 libraries.

Subtasks


Feature Interaction Aware Automated Data Representation Transformation

ehtesam3154/inhrecon 29 Sep 2023

Creating an effective representation space is crucial for mitigating the curse of dimensionality, enhancing model generalization, addressing data sparsity, and leveraging classical models more effectively.

0
29 Sep 2023

Context-Based Tweet Engagement Prediction

jovan_ns/2020recsystwitter 28 Sep 2023

In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations.

1
28 Sep 2023

Baichuan 2: Open Large-scale Language Models

baichuan-inc/baichuan2 19 Sep 2023

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering.

3,937
19 Sep 2023

Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring

ai-zahran/E2E-R IEEE Access 2023

In the first step, the pre-trained SSL model is fine-tuned on a phoneme recognition task to obtain better representations for the pronounced phonemes.

9
19 Sep 2023

SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems

s3l-official/slmia-sr 14 Sep 2023

Our attack is versatile and can work in both white-box and black-box scenarios.

4
14 Sep 2023

Native Language Identification with Big Bird Embeddings

sergeykramp/mthesis-bigbird-embeddings 13 Sep 2023

Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language.

0
13 Sep 2023

Effective Multi-Graph Neural Networks for Illicit Account Detection on Cryptocurrency Transaction Networks

tommydzh/diam 4 Sep 2023

Extensive experiments, comparing against 14 existing solutions on 4 large cryptocurrency datasets of Bitcoin and Ethereum, demonstrate that DIAM consistently achieves the best performance to accurately detect illicit accounts, while being efficient.

3
04 Sep 2023

Interpolation of mountain weather forecasts by machine learning

kazumaiwase/interpolation-of-mountain-weather-forecasts- 27 Aug 2023

Recent advances in numerical simulation methods based on physical models and their combination with machine learning have improved the accuracy of weather forecasts.

2
27 Aug 2023

TrajPy: empowering feature engineering for trajectory analysis across domains

ocbe-uio/trajpy 22 Aug 2023

The TrajPy package was developed in Python 3 and released under the GNU GPL-3 license.

6
22 Aug 2023

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

sruthisudheer/comment-classification-of-c-code 11 Aug 2023

The performance of the classical bag of words model and transformer-based models were explored to identify significant features from the given training corpus.

1
11 Aug 2023