Search Results for author: Nicola Messina

Found 18 papers, 10 papers with code

Is CLIP the main roadblock for fine-grained open-world perception?

2 code implementations • 4 Apr 2024 • Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Fabrizio Falchi

Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training.

Autonomous Driving Novel Concepts +4

Paper
Code

The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding

1 code implementation • 29 Nov 2023 • Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

Recent advancements in large vision-language models enabled visual object detection in open-vocabulary scenarios, where object classes are defined in free-text formats during inference.

Object object-detection +1

Paper
Code

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language

1 code implementation • 25 May 2023 • Nicola Messina, Jan Sedmidubsky, Fabrizio Falchi, Tomáš Rebok

Due to recent advances in pose-estimation methods, human motion can be extracted from a common video in the form of 3D skeleton sequences.

Metric Learning Pose Estimation +1

Paper
Code

Development of a Realistic Crowd Simulation Environment for Fine-grained Validation of People Tracking Methods

no code implementations • 26 Apr 2023 • Paweł Foszner, Agnieszka Szczęsna, Luca Ciampi, Nicola Messina, Adam Cygan, Bartosz Bizoń, Michał Cogiel, Dominik Golba, Elżbieta Macioszek, Michał Staniszewski

Generally, crowd datasets can be collected or generated from real or synthetic sources.

Multiple People Tracking Unity

Paper
Add Code

CrowdSim2: an Open Synthetic Benchmark for Object Detectors

no code implementations • 11 Apr 2023 • Paweł Foszner, Agnieszka Szczęsna, Luca Ciampi, Nicola Messina, Adam Cygan, Bartosz Bizoń, Michał Cogiel, Dominik Golba, Elżbieta Macioszek, Michał Staniszewski

Data scarcity has become one of the main obstacles to developing supervised models based on Artificial Intelligence in Computer Vision.

Object Object Detection +1

Paper
Add Code

Deep learning for structural health monitoring: An application to heritage structures

no code implementations • 4 Nov 2022 • Fabio Carrara, Fabrizio Falchi, Maria Girardi, Nicola Messina, Cristina Padovani, Daniele Pellegrini

Thanks to recent advancements in numerical methods, computer power, and monitoring technology, seismic ambient noise provides precious information about the structural behavior of old buildings.

Time Series Time Series Forecasting +1

Paper
Add Code

A Spatio-Temporal Attentive Network for Video-Based Crowd Counting

no code implementations • 24 Aug 2022 • Marco Avvenuti, Marco Bongiovanni, Luca Ciampi, Fabrizio Falchi, Claudio Gennaro, Nicola Messina

Automatic people counting from images has recently drawn attention for urban monitoring in modern Smart Cities due to the ubiquity of surveillance camera networks.

Crowd Counting

Paper
Add Code

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

1 code implementation • 29 Jul 2022 • Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato, Rita Cucchiara

In literature, this task is often used as a pre-training objective to forge architectures able to jointly deal with images and texts.

Ranked #22 on Cross-Modal Retrieval on COCO 2014

Image-text matching Retrieval +1

Paper
Code

Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching

2 code implementations • 21 Jun 2022 • Nicola Messina, Davide Alessandro Coccomini, Andrea Esuli, Fabrizio Falchi

With the increased accessibility of web and online encyclopedias, the amount of data to manage is constantly increasing.

2,977

Paper
Code

Recurrent Vision Transformer for Solving Visual Reasoning Problems

no code implementations • 29 Nov 2021 • Nicola Messina, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Fabrizio Falchi

In the end, this study can lay the basis for a deeper understanding of the role of attention and recurrent connections for solving visual abstract reasoning tasks.

Visual Reasoning

Paper
Add Code

Generative Adversarial Networks for Astronomical Images Generation

1 code implementation • 22 Nov 2021 • Davide Coccomini, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

Space exploration has always been a source of inspiration for humankind, and thanks to modern telescopes, it is now possible to observe celestial bodies far away from us.

Paper
Code

AIMH at SemEval-2021 Task 6: Multimodal Classification Using an Ensemble of Transformer Models

1 code implementation • SEMEVAL 2021 • Nicola Messina, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato

This paper describes the system used by the AIMH Team to approach the SemEval Task 6.

Multi-Label Classification

Paper
Code

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

3 code implementations • 6 Jul 2021 • Davide Coccomini, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7.

Ranked #1 on DeepFake Detection on DFDC (using extra training data)

DeepFake Detection Face Swapping

186

Paper
Code

Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features

no code implementations • 1 Jun 2021 • Nicola Messina, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, Stéphane Marchand-Maillet

It is designed for producing fixed-size 1024-d vectors describing whole images and sentences, as well as variable-length sets of 1024-d vectors describing the various building components of the two modalities (image regions and sentence words respectively).

Image Retrieval Image-text matching +3

Paper
Add Code

Solving the Same-Different Task with Convolutional Neural Networks

no code implementations • 22 Jan 2021 • Nicola Messina, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Fabrizio Falchi

With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems.

Overall - Test Zero-shot Generalization

Paper
Add Code

Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders

1 code implementation • 12 Aug 2020 • Nicola Messina, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, Stéphane Marchand-Maillet

In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level.

Ranked #6 on Image Retrieval on Flickr30K 1K test

Cross-Modal Retrieval Image Retrieval +3

Paper
Code

Transformer Reasoning Network for Image-Text Matching and Retrieval

1 code implementation • 20 Apr 2020 • Nicola Messina, Fabrizio Falchi, Andrea Esuli, Giuseppe Amato

State-of-the-art results in image-text matching are achieved by inter-playing image and text features from the two different processing pipelines, usually using mutual attention mechanisms.

Image Retrieval Image-text matching +3

Paper
Code

Virtual to Real adaptation of Pedestrian Detectors

no code implementations • 9 Jan 2020 • Luca Ciampi, Nicola Messina, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato

Furthermore, we demonstrate that with our Domain Adaptation techniques, we can reduce the Synthetic2Real Domain Shift, making closer the two domains and obtaining a performance improvement when testing the network over the real-world images.

Domain Adaptation object-detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.