Search Results for author: Nicola Messina

Found 18 papers, 10 papers with code

Is CLIP the main roadblock for fine-grained open-world perception?

2 code implementations4 Apr 2024 Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Fabrizio Falchi

Modern applications increasingly demand flexible computer vision models that adapt to novel concepts not encountered during training.

Autonomous Driving Novel Concepts +4

The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding

1 code implementation29 Nov 2023 Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

Recent advancements in large vision-language models enabled visual object detection in open-vocabulary scenarios, where object classes are defined in free-text formats during inference.

Object object-detection +1

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language

1 code implementation25 May 2023 Nicola Messina, Jan Sedmidubsky, Fabrizio Falchi, Tomáš Rebok

Due to recent advances in pose-estimation methods, human motion can be extracted from a common video in the form of 3D skeleton sequences.

Metric Learning Pose Estimation +1

Deep learning for structural health monitoring: An application to heritage structures

no code implementations4 Nov 2022 Fabio Carrara, Fabrizio Falchi, Maria Girardi, Nicola Messina, Cristina Padovani, Daniele Pellegrini

Thanks to recent advancements in numerical methods, computer power, and monitoring technology, seismic ambient noise provides precious information about the structural behavior of old buildings.

Time Series Time Series Forecasting +1

A Spatio-Temporal Attentive Network for Video-Based Crowd Counting

no code implementations24 Aug 2022 Marco Avvenuti, Marco Bongiovanni, Luca Ciampi, Fabrizio Falchi, Claudio Gennaro, Nicola Messina

Automatic people counting from images has recently drawn attention for urban monitoring in modern Smart Cities due to the ubiquity of surveillance camera networks.

Crowd Counting

Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching

2 code implementations21 Jun 2022 Nicola Messina, Davide Alessandro Coccomini, Andrea Esuli, Fabrizio Falchi

With the increased accessibility of web and online encyclopedias, the amount of data to manage is constantly increasing.

Recurrent Vision Transformer for Solving Visual Reasoning Problems

no code implementations29 Nov 2021 Nicola Messina, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Fabrizio Falchi

In the end, this study can lay the basis for a deeper understanding of the role of attention and recurrent connections for solving visual abstract reasoning tasks.

Visual Reasoning

Generative Adversarial Networks for Astronomical Images Generation

1 code implementation22 Nov 2021 Davide Coccomini, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

Space exploration has always been a source of inspiration for humankind, and thanks to modern telescopes, it is now possible to observe celestial bodies far away from us.

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

3 code implementations6 Jul 2021 Davide Coccomini, Nicola Messina, Claudio Gennaro, Fabrizio Falchi

Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7.

 Ranked #1 on DeepFake Detection on DFDC (using extra training data)

DeepFake Detection Face Swapping

Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features

no code implementations1 Jun 2021 Nicola Messina, Giuseppe Amato, Fabrizio Falchi, Claudio Gennaro, Stéphane Marchand-Maillet

It is designed for producing fixed-size 1024-d vectors describing whole images and sentences, as well as variable-length sets of 1024-d vectors describing the various building components of the two modalities (image regions and sentence words respectively).

Image Retrieval Image-text matching +3

Solving the Same-Different Task with Convolutional Neural Networks

no code implementations22 Jan 2021 Nicola Messina, Giuseppe Amato, Fabio Carrara, Claudio Gennaro, Fabrizio Falchi

With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems.

Overall - Test Zero-shot Generalization

Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders

1 code implementation12 Aug 2020 Nicola Messina, Giuseppe Amato, Andrea Esuli, Fabrizio Falchi, Claudio Gennaro, Stéphane Marchand-Maillet

In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level.

Cross-Modal Retrieval Image Retrieval +3

Transformer Reasoning Network for Image-Text Matching and Retrieval

1 code implementation20 Apr 2020 Nicola Messina, Fabrizio Falchi, Andrea Esuli, Giuseppe Amato

State-of-the-art results in image-text matching are achieved by inter-playing image and text features from the two different processing pipelines, usually using mutual attention mechanisms.

Image Retrieval Image-text matching +3

Virtual to Real adaptation of Pedestrian Detectors

no code implementations9 Jan 2020 Luca Ciampi, Nicola Messina, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato

Furthermore, we demonstrate that with our Domain Adaptation techniques, we can reduce the Synthetic2Real Domain Shift, making closer the two domains and obtaining a performance improvement when testing the network over the real-world images.

Domain Adaptation object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.