Search Results for author: Junsong Chen

Found 9 papers, 5 papers with code

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

no code implementations • 7 Mar 2024 • Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

In this paper, we introduce PixArt-\Sigma, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution.

4k Image Captioning +1

Paper
Add Code

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

1 code implementation • 10 Jan 2024 • Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li

As a state-of-the-art, open-source image generation model, PIXART-{\delta} offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.

Image Generation

2,169

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

345

Paper
Code

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

no code implementations • 12 Dec 2023 • Shentong Mo, Enze Xie, Yue Wu, Junsong Chen, Matthias Nießner, Zhenguo Li

Motivated by the inherent redundancy of 3D compared to 2D, we propose FastDiT-3D, a novel masked diffusion transformer tailored for efficient 3D point cloud generation, which greatly reduces training costs.

3D Generation Denoising +1

Paper
Add Code

PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

2 code implementations • 30 Sep 2023 • Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo Li

We hope PIXART-$\alpha$ will provide new insights to the AIGC community and startups to accelerate building their own high-quality yet low-cost generative models from scratch.

Image Generation Language Modelling

2,169

Paper
Code

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

1 code implementation • 19 Apr 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Paper
Code

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving

no code implementations • 3 Apr 2023 • Tianqi Wang, Sukmin Kim, Wenxuan Ji, Enze Xie, Chongjian Ge, Junsong Chen, Zhenguo Li, Ping Luo

In addition, we propose a new task, end-to-end motion and accident prediction, which can be used to directly evaluate the accident prediction ability for different autonomous driving algorithms.

3D Object Detection Autonomous Driving +1

Paper
Add Code

ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data

1 code implementation • CVPR 2023 • Haojie Zhao, Junsong Chen, Lijun Wang, Huchuan Lu

Compared with traditional RGB-only visual tracking, few datasets have been constructed for RGB-D tracking.

Visual Tracking

Paper
Code

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation

no code implementations • ICCV 2023 • Chongjian Ge, Junsong Chen, Enze Xie, Zhongdao Wang, Lanqing Hong, Huchuan Lu, Zhenguo Li, Ping Luo

These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities.

3D Object Detection Autonomous Driving +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.