Search Results for author: Hirokatsu Kataoka

Found 48 papers, 19 papers with code

Primitive Geometry Segment Pre-training for 3D Medical Image Segmentation

1 code implementation8 Jan 2024 Ryu Tadokoro, Ryosuke Yamada, Kodai Nakashima, Ryo Nakamura, Hirokatsu Kataoka

From experimental results, we conclude that effective pre-training can be achieved by looking at primitive geometric objects only.

Image Segmentation Medical Image Segmentation +3

Traffic Incident Database with Multiple Labels Including Various Perspective Environmental Information

1 code implementation17 Dec 2023 Shota Nishiyama, Takuma Saito, Ryo Nakamura, Go Ohtani, Hirokatsu Kataoka, Kensho Hara

Our proposed dataset aims to improve the performance of traffic accident recognition by annotating ten types of environmental information as teacher labels in addition to the presence or absence of traffic accidents.

Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

no code implementations23 Oct 2023 Shuhei Yokoo, Peifei Zhu, Yuchi Ishikawa, Mikihiro Tanaka, Masayoshi Kondo, Hirokatsu Kataoka

Our solution adopts large multimodal models CLIP and BLIP-2 to filter and modify web crawl data, and utilize external datasets along with a bag of tricks to improve the data quality.

text similarity

Constructing Image-Text Pair Dataset from Books

no code implementations3 Oct 2023 Yamato Okamoto, Haruto Toyonaga, Yoshihisa Ijiri, Hirokatsu Kataoka

Digital archiving is becoming widespread owing to its effectiveness in protecting valuable books and providing knowledge to many people electronically.

Optical Character Recognition (OCR) Retrieval +1

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

1 code implementation ICCV 2023 Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

SegRCDB has a high potential to contribute to semantic segmentation pre-training and investigation by enabling the creation of large datasets without manual annotation.

Segmentation Semantic Segmentation

Diffusion-based Holistic Texture Rectification and Synthesis

no code implementations26 Sep 2023 Guoqing Hao, Satoshi Iizuka, Kensho Hara, Edgar Simo-Serra, Hirokatsu Kataoka, Kazuhiro Fukui

We present a novel framework for rectifying occlusions and distortions in degraded texture samples from natural images.

Texture Synthesis

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

no code implementations4 Sep 2023 Ryota Yoshihashi, Yuya Otsuka, Kenji Doi, Tomohiro Tanaka, Hirokatsu Kataoka

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images.

Data Augmentation Image Generation +5

Pre-training Vision Transformers with Very Limited Synthesized Images

1 code implementation ICCV 2023 Ryo Nakamura, Hirokatsu Kataoka, Sora Takashima, Edgar Josafat Martinez Noriega, Rio Yokota, Nakamasa Inoue

Prior work on FDSL has shown that pre-training vision transformers on such synthetic datasets can yield competitive accuracy on a wide range of downstream tasks.

Data Augmentation

Pre-Training Auto-Generated Volumetric Shapes for 3D Medical Image Segmentation

1 code implementation CVPR Workshop 2023 Ryu Tadokoro, Ryosuke Yamada, Hirokatsu Kataoka

Inspired by this approach, we propose the Auto-generated Volumetric Shapes Database (AVS-DB) for data-scarce 3D medical image segmentation tasks.

Image Segmentation Medical Image Segmentation +3

Scapegoat Generation for Privacy Protection from Deepfake

no code implementations6 Mar 2023 Gido Kato, Yoshihiro Fukuhara, Mariko Isogawa, Hideki Tsunashima, Hirokatsu Kataoka, Shigeo Morishima

To protect privacy and prevent malicious use of deepfake, current studies propose methods that interfere with the generation process, such as detection and destruction approaches.

Face Swapping

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

no code implementations CVPR 2023 Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility.

Frequency-aware GAN for Adversarial Manipulation Generation

no code implementations ICCV 2023 Peifei Zhu, Genki Osada, Hirokatsu Kataoka, Tsubasa Takahashi

We observe that existing spatial attacks cause large degradation in image quality and find the loss of high-frequency detailed components might be its major reason.

Adversarial Attack Image Manipulation

Graph Representation for Order-Aware Visual Transformation

no code implementations CVPR 2023 Yue Qiu, Yanjun Sun, Fumiya Matsuzawa, Kenji Iwata, Hirokatsu Kataoka

This paper proposes a new visual reasoning formulation that aims at discovering changes between image pairs and their temporal orders.

Visual Reasoning

Neural Density-Distance Fields

1 code implementation29 Jul 2022 Itsuki Ueda, Yoshihiro Fukuhara, Hirokatsu Kataoka, Hiroaki Aizawa, Hidehiko Shishido, Itaru Kitahara

However, it is difficult to achieve high localization performance by only density fields-based methods such as Neural Radiance Field (NeRF) since they do not provide density gradient in most empty regions.

Novel View Synthesis Visual Localization

Replacing Labeled Real-image Datasets with Auto-generated Contours

no code implementations CVPR 2022 Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs).

Community-Driven Comprehensive Scientific Paper Summarization: Insight from cvpaper.challenge

no code implementations17 Mar 2022 Shintaro Yamamoto, Hirokatsu Kataoka, Ryota Suzuki, Seitaro Shinagawa, Shigeo Morishima

To alleviate this problem, we organized a group of non-native English speakers to write summaries of papers presented at a computer vision conference to share the knowledge of the papers read by the group.

Describing and Localizing Multiple Changes with Transformers

2 code implementations ICCV 2021 Yue Qiu, Shintaro Yamamoto, Kodai Nakashima, Ryota Suzuki, Kenji Iwata, Hirokatsu Kataoka, Yutaka Satoh

Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes.

Can Vision Transformers Learn without Natural Images?

1 code implementation24 Mar 2021 Kodai Nakashima, Hirokatsu Kataoka, Asato Matsumoto, Kenji Iwata, Nakamasa Inoue

Moreover, although the ViT pre-trained without natural images produces some different visualizations from ImageNet pre-trained ViT, it can interpret natural image datasets to a large extent.

Fairness Self-Supervised Learning

Pre-training without Natural Images

2 code implementations21 Jan 2021 Hirokatsu Kataoka, Kazushige Okayasu, Asato Matsumoto, Eisuke Yamagata, Ryosuke Yamada, Nakamasa Inoue, Akio Nakamura, Yutaka Satoh

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding?

Initialization Using Perlin Noise for Training Networks with a Limited Amount of Data

no code implementations19 Jan 2021 Nakamasa Inoue, Eisuke Yamagata, Hirokatsu Kataoka

Our main idea is to initialize the network parameters by solving an artificial noise classification problem, where the aim is to classify Perlin noise samples into their noise categories.

Classification General Classification +1

Alleviating Over-segmentation Errors by Detecting Action Boundaries

2 code implementations14 Jul 2020 Yuchi Ishikawa, Seito Kasai, Yoshimitsu Aoki, Hirokatsu Kataoka

Our model architecture consists of a long-term feature extractor and two branches: the Action Segmentation Branch (ASB) and the Boundary Regression Branch (BRB).

Action Classification Action Segmentation +2

Retrieving and Highlighting Action with Spatiotemporal Reference

1 code implementation19 May 2020 Seito Kasai, Yuchi Ishikawa, Masaki Hayashi, Yoshimitsu Aoki, Kensho Hara, Hirokatsu Kataoka

In this paper, we present a framework that jointly retrieves and spatiotemporally highlights actions in videos by enhancing current deep cross-modal retrieval methods.

Action Recognition Cross-Modal Retrieval +5

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

10 code implementations10 Apr 2020 Hirokatsu Kataoka, Tenga Wakamiya, Kensho Hara, Yutaka Satoh

Therefore, in the present paper, we conduct exploration study in order to improve spatiotemporal 3D CNNs as follows: (i) Recently proposed large-scale video datasets help improve spatiotemporal 3D CNNs in terms of video classification accuracy.

General Classification Open-Ended Question Answering +2

Weakly Supervised Dataset Collection for Robust Person Detection

1 code implementation27 Mar 2020 Munetaka Minoguchi, Ken Okayama, Yutaka Satoh, Hirokatsu Kataoka

To construct an algorithm that can provide robust person detection, we present a dataset with over 8 million images that was produced in a weakly supervised manner.

Human Detection

Augmented Cyclic Consistency Regularization for Unpaired Image-to-Image Translation

no code implementations29 Feb 2020 Takehiko Ohkawa, Naoto Inoue, Hirokatsu Kataoka, Nakamasa Inoue

Herein, we propose Augmented Cyclic Consistency Regularization (ACCR), a novel regularization method for unpaired I2I translation.

Data Augmentation Image-to-Image Translation +1

Learning with Protection: Rejection of Suspicious Samples under Adversarial Environment

no code implementations25 Sep 2019 Masahiro Kato, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima

Our main idea is to apply a framework of learning with rejection and adversarial examples to assist in the decision making for such suspicious samples.

BIG-bench Machine Learning Binary Classification +3

Automatic Paper Summary Generation from Visual and Textual Information

no code implementations16 Nov 2018 Shintaro Yamamoto, Yoshihiro Fukuhara, Ryota Suzuki, Shigeo Morishima, Hirokatsu Kataoka

Due to the recent boom in artificial intelligence (AI) research, including computer vision (CV), it has become impossible for researchers in these fields to keep up with the exponentially increasing number of manuscripts.

Sentence

Understanding Fake Faces

no code implementations22 Sep 2018 Ryota Natsume, Kazuki Inoue, Yoshihiro Fukuhara, Shintaro Yamamoto, Shigeo Morishima, Hirokatsu Kataoka

Face recognition research is one of the most active topics in computer vision (CV), and deep neural networks (DNN) are now filling the gap between human-level and computer-driven performance levels in face verification algorithms.

Face Recognition Face Verification

Neural Joking Machine : Humorous image captioning

no code implementations30 May 2018 Kota Yoshida, Munetaka Minoguchi, Kenichiro Wani, Akio Nakamura, Hirokatsu Kataoka

In the present paper, in order to consider this question from an academic standpoint, we generate an image caption that draws a "laugh" by a computer.

Image Captioning

Anticipating Traffic Accidents with Adaptive Loss and Large-scale Incident DB

no code implementations CVPR 2018 Tomoyuki Suzuki, Hirokatsu Kataoka, Yoshimitsu Aoki, Yutaka Satoh

In this paper, we propose a novel approach for traffic accident anticipation through (i) Adaptive Loss for Early Anticipation (AdaLEA) and (ii) a large-scale self-annotated incident database for anticipation.

Accident Anticipation

Drive Video Analysis for the Detection of Traffic Near-Miss Incidents

no code implementations7 Apr 2018 Hirokatsu Kataoka, Teppei Suzuki, Shoko Oikawa, Yasuhiro Matsui, Yutaka Satoh

Because of their recent introduction, self-driving cars and advanced driver assistance system (ADAS) equipped vehicles have had little opportunity to learn, the dangerous traffic (including near-miss incident) scenarios that provide normal drivers with strong motivation to drive safely.

Self-Driving Cars

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

26 code implementations CVPR 2018 Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels.

Action Recognition

Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

1 code implementation25 Aug 2017 Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh

The 3D ResNets trained on the Kinetics did not suffer from overfitting despite the large number of parameters of the model, and achieved better performance than relatively shallow networks, such as C3D.

Action Recognition Hand-Gesture Recognition +1

Collaborative Descriptors: Convolutional Maps for Preprocessing

no code implementations10 May 2017 Hirokatsu Kataoka, Kaori Abe, Akio Nakamura, Yutaka Satoh

The paper presents a novel concept for collaborative descriptors between deeply learned and hand-crafted features.

Object Recognition

Motion Representation with Acceleration Images

no code implementations30 Aug 2016 Hirokatsu Kataoka, Yun He, Soma Shirakabe, Yutaka Satoh

Information of time differentiation is extremely important cue for a motion representation.

Optical Flow Estimation

Dominant Codewords Selection with Topic Model for Action Recognition

no code implementations1 May 2016 Hirokatsu Kataoka, Masaki Hayashi, Kenji Iwata, Yutaka Satoh, Yoshimitsu Aoki, Slobodan Ilic

Latent Dirichlet allocation (LDA) is used to develop approximations of human motion primitives; these are mid-level representations, and they adaptively integrate dominant vectors when classifying human activities.

Action Recognition Temporal Action Localization

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

no code implementations25 Sep 2015 Hirokatsu Kataoka, Kenji Iwata, Yutaka Satoh

In this paper, we evaluate convolutional neural network (CNN) features using the AlexNet architecture and very deep convolutional network (VGGNet) architecture.

General Classification Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.