Search Results for author: Fan Yang

Found 286 papers, 107 papers with code

Improving Relevance Quality in Product Search using High-Precision Query-Product Semantic Similarity

no code implementations ECNLP (ACL) 2022 Alireza Bagheri Garakani, Fan Yang, Wen-Yu Hua, Yetian Chen, Michinari Momma, Jingyuan Deng, Yan Gao, Yi Sun

Ensuring relevance quality in product search is a critical task as it impacts the customer’s ability to find intended products in the short-term as well as the general perception and trust of the e-commerce system in the long term.

Re-Ranking Semantic Similarity +1

Spelling Correction using Phonetics in E-commerce Search

no code implementations ECNLP (ACL) 2022 Fan Yang, Alireza Bagheri Garakani, Yifei Teng, Yan Gao, Jia Liu, Jingyuan Deng, Yi Sun

In E-commerce search, spelling correction plays an important role to find desired products for customers in processing user-typed search queries.

Spelling Correction

Improving Evidence Retrieval with Claim-Evidence Entailment

no code implementations RANLP 2021 Fan Yang, Eduard Dragut, Arjun Mukherjee

Claim verification is challenging because it requires first to find textual evidence and then apply claim-evidence entailment to verify a claim.

Claim Verification Retrieval +1

Beyond 3DMM Space: Towards Fine-grained 3D Face Reconstruction

1 code implementation ECCV 2020 Xiangyu Zhu, Fan Yang, Di Huang, Chang Yu, Hao Wang, Jianzhu Guo, Zhen Lei, Stan Z. Li

However, most of their training data is constructed by 3D Morphable Model, whose space spanned is only a small part of the shape space.

3D Face Reconstruction

Self-learning Canonical Space for Multi-view 3D Human Pose Estimation

no code implementations19 Mar 2024 Xiaoben Li, Mancheng Meng, Ziyan Wu, Terrence Chen, Fan Yang, Dinggang Shen

To facilitate the aggregation of the intra- and inter-view, we define a canonical parameter space, depicted by per-view camera pose and human pose and shape parameters ($\theta$ and $\beta$) of SMPL model, and propose a two-stage learning procedure.

3D Human Pose Estimation Self-Learning

Human Mesh Recovery from Arbitrary Multi-view Images

no code implementations19 Mar 2024 Xiaoben Li, Mancheng Meng, Ziyan Wu, Terrence Chen, Fan Yang, Dinggang Shen

Human mesh recovery from arbitrary multi-view images involves two characteristics: the arbitrary camera poses and arbitrary number of camera views.

Human Mesh Recovery Pose Estimation

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

1 code implementation14 Mar 2024 Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wang

Large Vision Language Models have achieved fine-grained object perception, but the limitation of image resolution remains a significant obstacle to surpass the performance of task-specific experts in complex and dense scenarios.

Object Object Counting +3

Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior

no code implementations14 Mar 2024 Cheng Chen, Xiaofeng Yang, Fan Yang, Chengzeng Feng, Zhoujie Fu, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu

In this paper, we present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.

Text to 3D

Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era

1 code implementation13 Mar 2024 Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang, Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu

Therefore, in this paper, we introduce Usable XAI in the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems, and (2) how LLMs can contribute to the advancement of XAI.

Automating Catheterization Labs with Real-Time Perception

no code implementations9 Mar 2024 Fan Yang, Benjamin Planche, Meng Zheng, Cheng Chen, Terrence Chen, Ziyan Wu

For decades, three-dimensional C-arm Cone-Beam Computed Tomography (CBCT) imaging system has been a critical component for complex vascular and nonvascular interventional procedures.

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

no code implementations8 Mar 2024 Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-Baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy, Jilin Chen, Michael Isard, Paul Barham, Tom Hennigan, Ross Mcilroy, Melvin Johnson, Johan Schalkwyk, Eli Collins, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Clemens Meyer, Gregory Thornton, Zhen Yang, Henryk Michalewski, Zaheer Abbas, Nathan Schucher, Ankesh Anand, Richard Ives, James Keeling, Karel Lenc, Salem Haykal, Siamak Shakeri, Pranav Shyam, Aakanksha Chowdhery, Roman Ring, Stephen Spencer, Eren Sezener, Luke Vilnis, Oscar Chang, Nobuyuki Morioka, George Tucker, Ce Zheng, Oliver Woodman, Nithya Attaluri, Tomas Kocisky, Evgenii Eltyshev, Xi Chen, Timothy Chung, Vittorio Selo, Siddhartha Brahma, Petko Georgiev, Ambrose Slone, Zhenkai Zhu, James Lottes, Siyuan Qiao, Ben Caine, Sebastian Riedel, Alex Tomala, Martin Chadwick, Juliette Love, Peter Choy, Sid Mittal, Neil Houlsby, Yunhao Tang, Matthew Lamm, Libin Bai, Qiao Zhang, Luheng He, Yong Cheng, Peter Humphreys, Yujia Li, Sergey Brin, Albin Cassirer, Yingjie Miao, Lukas Zilka, Taylor Tobin, Kelvin Xu, Lev Proleev, Daniel Sohn, Alberto Magni, Lisa Anne Hendricks, Isabel Gao, Santiago Ontañón, Oskar Bunyan, Nathan Byrd, Abhanshu Sharma, Biao Zhang, Mario Pinto, Rishika Sinha, Harsh Mehta, Dawei Jia, Sergi Caelles, Albert Webson, Alex Morris, Becca Roelofs, Yifan Ding, Robin Strudel, Xuehan Xiong, Marvin Ritter, Mostafa Dehghani, Rahma Chaabouni, Abhijit Karmarkar, Guangda Lai, Fabian Mentzer, Bibo Xu, Yaguang Li, Yujing Zhang, Tom Le Paine, Alex Goldin, Behnam Neyshabur, Kate Baumli, Anselm Levskaya, Michael Laskin, Wenhao Jia, Jack W. Rae, Kefan Xiao, Antoine He, Skye Giordano, Lakshman Yagati, Jean-Baptiste Lespiau, Paul Natsev, Sanjay Ganapathy, Fangyu Liu, Danilo Martins, Nanxin Chen, Yunhan Xu, Megan Barnes, Rhys May, Arpi Vezer, Junhyuk Oh, Ken Franko, Sophie Bridgers, Ruizhe Zhao, Boxi Wu, Basil Mustafa, Sean Sechrist, Emilio Parisotto, Thanumalayan Sankaranarayana Pillai, Chris Larkin, Chenjie Gu, Christina Sorokin, Maxim Krikun, Alexey Guseynov, Jessica Landon, Romina Datta, Alexander Pritzel, Phoebe Thacker, Fan Yang, Kevin Hui, Anja Hauth, Chih-Kuan Yeh, David Barker, Justin Mao-Jones, Sophia Austin, Hannah Sheahan, Parker Schuh, James Svensson, Rohan Jain, Vinay Ramasesh, Anton Briukhov, Da-Woon Chung, Tamara von Glehn, Christina Butterfield, Priya Jhakra, Matthew Wiethoff, Justin Frye, Jordan Grimstad, Beer Changpinyo, Charline Le Lan, Anna Bortsova, Yonghui Wu, Paul Voigtlaender, Tara Sainath, Charlotte Smith, Will Hawkins, Kris Cao, James Besley, Srivatsan Srinivasan, Mark Omernick, Colin Gaffney, Gabriela Surita, Ryan Burnell, Bogdan Damoc, Junwhan Ahn, Andrew Brock, Mantas Pajarskas, Anastasia Petrushkina, Seb Noury, Lorenzo Blanco, Kevin Swersky, Arun Ahuja, Thi Avrahami, Vedant Misra, Raoul de Liedekerke, Mariko Iinuma, Alex Polozov, Sarah York, George van den Driessche, Paul Michel, Justin Chiu, Rory Blevins, Zach Gleicher, Adrià Recasens, Alban Rrustemi, Elena Gribovskaya, Aurko Roy, Wiktor Gworek, Séb Arnold, Lisa Lee, James Lee-Thorp, Marcello Maggioni, Enrique Piqueras, Kartikeya Badola, Sharad Vikram, Lucas Gonzalez, Anirudh Baddepudi, Evan Senter, Jacob Devlin, James Qin, Michael Azzam, Maja Trebacz, Martin Polacek, Kashyap Krishnakumar, Shuo-Yiin Chang, Matthew Tung, Ivo Penchev, Rishabh Joshi, Kate Olszewska, Carrie Muir, Mateo Wirth, Ale Jakse Hartman, Josh Newlan, Sheleem Kashem, Vijay Bolina, Elahe Dabir, Joost van Amersfoort, Zafarali Ahmed, James Cobon-Kerr, Aishwarya Kamath, Arnar Mar Hrafnkelsson, Le Hou, Ian Mackinnon, Alexandre Frechette, Eric Noland, Xiance Si, Emanuel Taropa, Dong Li, Phil Crone, Anmol Gulati, Sébastien Cevey, Jonas Adler, Ada Ma, David Silver, Simon Tokumine, Richard Powell, Stephan Lee, Michael Chang, Samer Hassan, Diana Mincu, Antoine Yang, Nir Levine, Jenny Brennan, Mingqiu Wang, Sarah Hodkinson, Jeffrey Zhao, Josh Lipschultz, Aedan Pope, Michael B. Chang, Cheng Li, Laurent El Shafey, Michela Paganini, Sholto Douglas, Bernd Bohnet, Fabio Pardo, Seth Odoom, Mihaela Rosca, Cicero Nogueira dos santos, Kedar Soparkar, Arthur Guez, Tom Hudson, Steven Hansen, Chulayuth Asawaroengchai, Ravi Addanki, Tianhe Yu, Wojciech Stokowiec, Mina Khan, Justin Gilmer, Jaehoon Lee, Carrie Grimes Bostock, Keran Rong, Jonathan Caton, Pedram Pejman, Filip Pavetic, Geoff Brown, Vivek Sharma, Mario Lučić, Rajkumar Samuel, Josip Djolonga, Amol Mandhane, Lars Lowe Sjösund, Elena Buchatskaya, Elspeth White, Natalie Clay, Jiepu Jiang, Hyeontaek Lim, Ross Hemsley, Jane Labanowski, Nicola De Cao, David Steiner, Sayed Hadi Hashemi, Jacob Austin, Anita Gergely, Tim Blyth, Joe Stanton, Kaushik Shivakumar, Aditya Siddhant, Anders Andreassen, Carlos Araya, Nikhil Sethi, Rakesh Shivanna, Steven Hand, Ankur Bapna, Ali Khodaei, Antoine Miech, Garrett Tanzer, Andy Swing, Shantanu Thakoor, Zhufeng Pan, Zachary Nado, Stephanie Winkler, Dian Yu, Mohammad Saleh, Loren Maggiore, Iain Barr, Minh Giang, Thais Kagohara, Ivo Danihelka, Amit Marathe, Vladimir Feinberg, Nimesh Ghelani, Dan Horgan, Helen Miller, Lexi Walker, Richard Tanburn, Mukarram Tariq, Disha Shrivastava, Fei Xia, Chung-Cheng Chiu, Khuslen Baatarsukh, Sina Samangooei, Fred Alcober, Axel Stjerngren, Paul Komarek, Katerina Tsihlas, Anudhyan Boral, Ramona Comanescu, Jeremy Chen, Ruibo Liu, Dawn Bloxwich, Charlie Chen, Yanhua Sun, Fangxiaoyu Feng, Matthew Mauger, Xerxes Dotiwalla, Vincent Hellendoorn, Michael Sharman, Ivy Zheng, Krishna Haridasan, Gabe Barth-Maron, Craig Swanson, Dominika Rogozińska, Alek Andreev, Paul Kishan Rubenstein, Ruoxin Sang, Dan Hurt, Gamaleldin Elsayed, Renshen Wang, Dave Lacey, Anastasija Ilić, Yao Zhao, Lora Aroyo, Chimezie Iwuanyanwu, Vitaly Nikolaev, Balaji Lakshminarayanan, Sadegh Jazayeri, Raphaël Lopez Kaufman, Mani Varadarajan, Chetan Tekur, Doug Fritz, Misha Khalman, David Reitter, Kingshuk Dasgupta, Shourya Sarcar, Tina Ornduff, Javier Snaider, Fantine Huot, Johnson Jia, Rupert Kemp, Nejc Trdin, Anitha Vijayakumar, Lucy Kim, Christof Angermueller, Li Lao, Tianqi Liu, Haibin Zhang, David Engel, Somer Greene, Anaïs White, Jessica Austin, Lilly Taylor, Shereen Ashraf, Dangyi Liu, Maria Georgaki, Irene Cai, Yana Kulizhskaya, Sonam Goenka, Brennan Saeta, Kiran Vodrahalli, Christian Frank, Dario de Cesare, Brona Robenek, Harry Richardson, Mahmoud Alnahlawi, Christopher Yew, Priya Ponnapalli, Marco Tagliasacchi, Alex Korchemniy, Yelin Kim, Dinghua Li, Bill Rosgen, Zoe Ashwood, Kyle Levin, Jeremy Wiesner, Praseem Banzal, Praveen Srinivasan, Hongkun Yu, Çağlar Ünlü, David Reid, Zora Tung, Daniel Finchelstein, Ravin Kumar, Andre Elisseeff, Jin Huang, Ming Zhang, Rui Zhu, Ricardo Aguilar, Mai Giménez, Jiawei Xia, Olivier Dousse, Willi Gierke, Soheil Hassas Yeganeh, Damion Yates, Komal Jalan, Lu Li, Eri Latorre-Chimoto, Duc Dung Nguyen, Ken Durden, Praveen Kallakuri, Yaxin Liu, Matthew Johnson, Tomy Tsai, Alice Talbert, Jasmine Liu, Alexander Neitz, Chen Elkind, Marco Selvi, Mimi Jasarevic, Livio Baldini Soares, Albert Cui, Pidong Wang, Alek Wenjiao Wang, Xinyu Ye, Krystal Kallarackal, Lucia Loher, Hoi Lam, Josef Broder, Dan Holtmann-Rice, Nina Martin, Bramandia Ramadhana, Daniel Toyama, Mrinal Shukla, Sujoy Basu, Abhi Mohan, Nick Fernando, Noah Fiedel, Kim Paterson, Hui Li, Ankush Garg, Jane Park, DongHyun Choi, Diane Wu, Sankalp Singh, Zhishuai Zhang, Amir Globerson, Lily Yu, John Carpenter, Félix de Chaumont Quitry, Carey Radebaugh, Chu-Cheng Lin, Alex Tudor, Prakash Shroff, Drew Garmon, Dayou Du, Neera Vats, Han Lu, Shariq Iqbal, Alex Yakubovich, Nilesh Tripuraneni, James Manyika, Haroon Qureshi, Nan Hua, Christel Ngani, Maria Abi Raad, Hannah Forbes, Anna Bulanova, Jeff Stanway, Mukund Sundararajan, Victor Ungureanu, Colton Bishop, Yunjie Li, Balaji Venkatraman, Bo Li, Chloe Thornton, Salvatore Scellato, Nishesh Gupta, Yicheng Wang, Ian Tenney, Xihui Wu, Ashish Shenoy, Gabriel Carvajal, Diana Gage Wright, Ben Bariach, Zhuyun Xiao, Peter Hawkins, Sid Dalmia, Clement Farabet, Pedro Valenzuela, Quan Yuan, Chris Welty, Ananth Agarwal, Mia Chen, Wooyeol Kim, Brice Hulse, Nandita Dukkipati, Adam Paszke, Andrew Bolt, Elnaz Davoodi, Kiam Choo, Jennifer Beattie, Jennifer Prendki, Harsha Vashisht, Rebeca Santamaria-Fernandez, Luis C. Cobo, Jarek Wilkiewicz, David Madras, Ali Elqursh, Grant Uy, Kevin Ramirez, Matt Harvey, Tyler Liechty, Heiga Zen, Jeff Seibert, Clara Huiyi Hu, Mohamed Elhawaty, Andrey Khorlin, Maigo Le, Asaf Aharoni, Megan Li, Lily Wang, Sandeep Kumar, Alejandro Lince, Norman Casagrande, Jay Hoover, Dalia El Badawy, David Soergel, Denis Vnukov, Matt Miecnikowski, Jiri Simsa, Anna Koop, Praveen Kumar, Thibault Sellam, Daniel Vlasic, Samira Daruki, Nir Shabat, John Zhang, Guolong Su, Jiageng Zhang, Jeremiah Liu, Yi Sun, Evan Palmer, Alireza Ghaffarkhah, Xi Xiong, Victor Cotruta, Michael Fink, Lucas Dixon, Ashwin Sreevatsa, Adrian Goedeckemeyer, Alek Dimitriev, Mohsen Jafari, Remi Crocker, Nicholas FitzGerald, Aviral Kumar, Sanjay Ghemawat, Ivan Philips, Frederick Liu, Yannie Liang, Rachel Sterneck, Alena Repina, Marcus Wu, Laura Knight, Marin Georgiev, Hyo Lee, Harry Askham, Abhishek Chakladar, Annie Louis, Carl Crous, Hardie Cate, Dessie Petrova, MICHAEL QUINN, Denese Owusu-Afriyie, Achintya Singhal, Nan Wei, Solomon Kim, Damien Vincent, Milad Nasr, Christopher A. Choquette-Choo, Reiko Tojo, Shawn Lu, Diego de Las Casas, Yuchung Cheng, Tolga Bolukbasi, Katherine Lee, Saaber Fatehi, Rajagopal Ananthanarayanan, Miteyan Patel, Charbel Kaed, Jing Li, Jakub Sygnowski, Shreyas Rammohan Belle, Zhe Chen, Jaclyn Konzelmann, Siim Põder, Roopal Garg, Vinod Koverkathu, Adam Brown, Chris Dyer, Rosanne Liu, Azade Nova, Jun Xu, Slav Petrov, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals

In this report, we present the latest model of the Gemini family, Gemini 1. 5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio.

Code Generation Retrieval

Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion

no code implementations5 Mar 2024 Meng Zheng, Benjamin Planche, Xuan Gong, Fan Yang, Terrence Chen, Ziyan Wu

3D patient body modeling is critical to the success of automated patient positioning for smart medical scanning and operating rooms.

Keypoint Detection

Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles

1 code implementation24 Feb 2024 Fan Yang, Pierre Le Bodic, Michael Kamp, Mario Boley

Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models.

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

no code implementations21 Feb 2024 Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang

This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance.

Opening the Black Box of Large Language Models: Two Views on Holistic Interpretability

no code implementations16 Feb 2024 Haiyan Zhao, Fan Yang, Himabindu Lakkaraju, Mengnan Du

As large language models (LLMs) grow more powerful, concerns around potential harms like toxicity, unfairness, and hallucination threaten user trust.

Hallucination

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

no code implementations12 Feb 2024 Mateusz Łajszczak, Guillermo Cámbara, Yang Li, Fatih Beyhan, Arent van Korlaar, Fan Yang, Arnaud Joly, Álvaro Martín-Cortinas, Ammar Abbas, Adam Michalski, Alexis Moinet, Sri Karlapati, Ewa Muszyńska, Haohan Guo, Bartosz Putrycz, Soledad López Gambino, Kayeon Yoo, Elena Sokolova, Thomas Drugman

Echoing the widely-reported "emergent abilities" of large language models when trained on increasing volume of data, we show that BASE TTS variants built with 10K+ hours and 500M+ parameters begin to demonstrate natural prosody on textually complex sentences.

Disentanglement

Understanding the Weakness of Large Language Model Agents within a Complex Android Environment

1 code implementation9 Feb 2024 Mingzhe Xing, Rongkai Zhang, Hui Xue, Qi Chen, Fan Yang, Zhen Xiao

These challenges motivate AndroidArena, an environment and benchmark designed to evaluate LLM agents on a modern operating system.

Date Understanding Language Modelling +1

Large Language Models As Faithful Explainers

no code implementations7 Feb 2024 Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Ruixiang Tang, Fan Yang, Mengnan Du, Xuanting Cai, Xia Hu

In this work, we introduce a generative explanation framework, xLLM, to improve the faithfulness of the explanations provided in natural language formats for LLMs.

Decision Making

Seeing is not always believing: The Space of Harmless Perturbations

no code implementations3 Feb 2024 Lu Chen, Shaofeng Li, Benhao Huang, Fan Yang, Zheng Li, Jie Li, Yuan Luo

In the context of deep neural networks, we expose the existence of a harmless perturbation space, where perturbations leave the network output entirely unaltered.

Privacy Preserving

DARCS: Memory-Efficient Deep Compressed Sensing Reconstruction for Acceleration of 3D Whole-Heart Coronary MR Angiography

no code implementations1 Feb 2024 Zhihao Xue, Fan Yang, Juan Gao, Zhuo Chen, Hao Peng, Chao Zou, Hang Jin, Chenxi Hu

While unrolling-based deep reconstruction methods have achieved state-of-the-art performance on 2D image reconstruction, their application to 3D reconstruction is hindered by the large amount of memory needed to train an unrolled network.

3D Reconstruction De-aliasing +2

TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis

no code implementations5 Jan 2024 Liwen Zhang, Lianzhen Zhong, Fan Yang, Di Dong, Hui Hui, Jie Tian

However, ranking loss only focus on the ranking of survival time and does not consider potential effect of samples for exact survival time values.

Survival Analysis

Disentangle Estimation of Causal Effects from Cross-Silo Data

no code implementations4 Jan 2024 Yuxuan Liu, Haozhao Wang, Shuang Wang, Zhiming He, Wenchao Xu, Jialiang Zhu, Fan Yang

Estimating causal effects among different events is of great importance to critical fields such as drug development.

LETA: Learning Transferable Attribution for Generic Vision Explainer

no code implementations23 Dec 2023 Guanchu Wang, Yu-Neng Chuang, Fan Yang, Mengnan Du, Chia-Yuan Chang, Shaochen Zhong, Zirui Liu, Zhaozhuo Xu, Kaixiong Zhou, Xuanting Cai, Xia Hu

To address this problem, we develop a pre-trained, DNN-based, generic explainer on large-scale image datasets, and leverage its transferability to explain various vision models for downstream tasks.

Gemini: A Family of Highly Capable Multimodal Models

no code implementations The Keyword 2023 Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Slav Petrov, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, Ryan Doherty, Eli Collins, Clemens Meyer, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, George Tucker, Enrique Piqueras, Maxim Krikun, Iain Barr, Nikolay Savinov, Ivo Danihelka, Becca Roelofs, Anaïs White, Anders Andreassen, Tamara von Glehn, Lakshman Yagati, Mehran Kazemi, Lucas Gonzalez, Misha Khalman, Jakub Sygnowski, Alexandre Frechette, Charlotte Smith, Laura Culp, Lev Proleev, Yi Luan, Xi Chen, James Lottes, Nathan Schucher, Federico Lebron, Alban Rrustemi, Natalie Clay, Phil Crone, Tomas Kocisky, Jeffrey Zhao, Bartek Perz, Dian Yu, Heidi Howard, Adam Bloniarz, Jack W. Rae, Han Lu, Laurent SIfre, Marcello Maggioni, Fred Alcober, Dan Garrette, Megan Barnes, Shantanu Thakoor, Jacob Austin, Gabriel Barth-Maron, William Wong, Rishabh Joshi, Rahma Chaabouni, Deeni Fatiha, Arun Ahuja, Ruibo Liu, Yunxuan Li, Sarah Cogan, Jeremy Chen, Chao Jia, Chenjie Gu, Qiao Zhang, Jordan Grimstad, Ale Jakse Hartman, Martin Chadwick, Gaurav Singh Tomar, Xavier Garcia, Evan Senter, Emanuel Taropa, Thanumalayan Sankaranarayana Pillai, Jacob Devlin, Michael Laskin, Diego de Las Casas, Dasha Valter, Connie Tao, Lorenzo Blanco, Adrià Puigdomènech Badia, David Reitter, Mianna Chen, Jenny Brennan, Clara Rivera, Sergey Brin, Shariq Iqbal, Gabriela Surita, Jane Labanowski, Abhi Rao, Stephanie Winkler, Emilio Parisotto, Yiming Gu, Kate Olszewska, Yujing Zhang, Ravi Addanki, Antoine Miech, Annie Louis, Laurent El Shafey, Denis Teplyashin, Geoff Brown, Elliot Catt, Nithya Attaluri, Jan Balaguer, Jackie Xiang, Pidong Wang, Zoe Ashwood, Anton Briukhov, Albert Webson, Sanjay Ganapathy, Smit Sanghavi, Ajay Kannan, Ming-Wei Chang, Axel Stjerngren, Josip Djolonga, Yuting Sun, Ankur Bapna, Matthew Aitchison, Pedram Pejman, Henryk Michalewski, Tianhe Yu, Cindy Wang, Juliette Love, Junwhan Ahn, Dawn Bloxwich, Kehang Han, Peter Humphreys, Thibault Sellam, James Bradbury, Varun Godbole, Sina Samangooei, Bogdan Damoc, Alex Kaskasoli, Sébastien M. R. Arnold, Vijay Vasudevan, Shubham Agrawal, Jason Riesa, Dmitry Lepikhin, Richard Tanburn, Srivatsan Srinivasan, Hyeontaek Lim, Sarah Hodkinson, Pranav Shyam, Johan Ferret, Steven Hand, Ankush Garg, Tom Le Paine, Jian Li, Yujia Li, Minh Giang, Alexander Neitz, Zaheer Abbas, Sarah York, Machel Reid, Elizabeth Cole, Aakanksha Chowdhery, Dipanjan Das, Dominika Rogozińska, Vitaly Nikolaev, Pablo Sprechmann, Zachary Nado, Lukas Zilka, Flavien Prost, Luheng He, Marianne Monteiro, Gaurav Mishra, Chris Welty, Josh Newlan, Dawei Jia, Miltiadis Allamanis, Clara Huiyi Hu, Raoul de Liedekerke, Justin Gilmer, Carl Saroufim, Shruti Rijhwani, Shaobo Hou, Disha Shrivastava, Anirudh Baddepudi, Alex Goldin, Adnan Ozturel, Albin Cassirer, Yunhan Xu, Daniel Sohn, Devendra Sachan, Reinald Kim Amplayo, Craig Swanson, Dessie Petrova, Shashi Narayan, Arthur Guez, Siddhartha Brahma, Jessica Landon, Miteyan Patel, Ruizhe Zhao, Kevin Villela, Luyu Wang, Wenhao Jia, Matthew Rahtz, Mai Giménez, Legg Yeung, Hanzhao Lin, James Keeling, Petko Georgiev, Diana Mincu, Boxi Wu, Salem Haykal, Rachel Saputro, Kiran Vodrahalli, James Qin, Zeynep Cankara, Abhanshu Sharma, Nick Fernando, Will Hawkins, Behnam Neyshabur, Solomon Kim, Adrian Hutter, Priyanka Agrawal, Alex Castro-Ros, George van den Driessche, Tao Wang, Shuo-Yiin Chang, Paul Komarek, Ross Mcilroy, Mario Lučić, Guodong Zhang, Wael Farhan, Michael Sharman, Paul Natsev, Paul Michel, Yong Cheng, Yamini Bansal, Siyuan Qiao, Kris Cao, Siamak Shakeri, Christina Butterfield, Justin Chung, Paul Kishan Rubenstein, Shivani Agrawal, Arthur Mensch, Kedar Soparkar, Karel Lenc, Timothy Chung, Aedan Pope, Loren Maggiore, Jackie Kay, Priya Jhakra, Shibo Wang, Joshua Maynez, Mary Phuong, Taylor Tobin, Andrea Tacchetti, Maja Trebacz, Kevin Robinson, Yash Katariya, Sebastian Riedel, Paige Bailey, Kefan Xiao, Nimesh Ghelani, Lora Aroyo, Ambrose Slone, Neil Houlsby, Xuehan Xiong, Zhen Yang, Elena Gribovskaya, Jonas Adler, Mateo Wirth, Lisa Lee, Music Li, Thais Kagohara, Jay Pavagadhi, Sophie Bridgers, Anna Bortsova, Sanjay Ghemawat, Zafarali Ahmed, Tianqi Liu, Richard Powell, Vijay Bolina, Mariko Iinuma, Polina Zablotskaia, James Besley, Da-Woon Chung, Timothy Dozat, Ramona Comanescu, Xiance Si, Jeremy Greer, Guolong Su, Martin Polacek, Raphaël Lopez Kaufman, Simon Tokumine, Hexiang Hu, Elena Buchatskaya, Yingjie Miao, Mohamed Elhawaty, Aditya Siddhant, Nenad Tomasev, Jinwei Xing, Christina Greer, Helen Miller, Shereen Ashraf, Aurko Roy, Zizhao Zhang, Ada Ma, Angelos Filos, Milos Besta, Rory Blevins, Ted Klimenko, Chih-Kuan Yeh, Soravit Changpinyo, Jiaqi Mu, Oscar Chang, Mantas Pajarskas, Carrie Muir, Vered Cohen, Charline Le Lan, Krishna Haridasan, Amit Marathe, Steven Hansen, Sholto Douglas, Rajkumar Samuel, Mingqiu Wang, Sophia Austin, Chang Lan, Jiepu Jiang, Justin Chiu, Jaime Alonso Lorenzo, Lars Lowe Sjösund, Sébastien Cevey, Zach Gleicher, Thi Avrahami, Anudhyan Boral, Hansa Srinivasan, Vittorio Selo, Rhys May, Konstantinos Aisopos, Léonard Hussenot, Livio Baldini Soares, Kate Baumli, Michael B. Chang, Adrià Recasens, Ben Caine, Alexander Pritzel, Filip Pavetic, Fabio Pardo, Anita Gergely, Justin Frye, Vinay Ramasesh, Dan Horgan, Kartikeya Badola, Nora Kassner, Subhrajit Roy, Ethan Dyer, Víctor Campos, Alex Tomala, Yunhao Tang, Dalia El Badawy, Elspeth White, Basil Mustafa, Oran Lang, Abhishek Jindal, Sharad Vikram, Zhitao Gong, Sergi Caelles, Ross Hemsley, Gregory Thornton, Fangxiaoyu Feng, Wojciech Stokowiec, Ce Zheng, Phoebe Thacker, Çağlar Ünlü, Zhishuai Zhang, Mohammad Saleh, James Svensson, Max Bileschi, Piyush Patil, Ankesh Anand, Roman Ring, Katerina Tsihlas, Arpi Vezer, Marco Selvi, Toby Shevlane, Mikel Rodriguez, Tom Kwiatkowski, Samira Daruki, Keran Rong, Allan Dafoe, Nicholas FitzGerald, Keren Gu-Lemberg, Mina Khan, Lisa Anne Hendricks, Marie Pellat, Vladimir Feinberg, James Cobon-Kerr, Tara Sainath, Maribeth Rauh, Sayed Hadi Hashemi, Richard Ives, Yana Hasson, Yaguang Li, Eric Noland, Yuan Cao, Nathan Byrd, Le Hou, Qingze Wang, Thibault Sottiaux, Michela Paganini, Jean-Baptiste Lespiau, Alexandre Moufarek, Samer Hassan, Kaushik Shivakumar, Joost van Amersfoort, Amol Mandhane, Pratik Joshi, Anirudh Goyal, Matthew Tung, Andrew Brock, Hannah Sheahan, Vedant Misra, Cheng Li, Nemanja Rakićević, Mostafa Dehghani, Fangyu Liu, Sid Mittal, Junhyuk Oh, Seb Noury, Eren Sezener, Fantine Huot, Matthew Lamm, Nicola De Cao, Charlie Chen, Gamaleldin Elsayed, Ed Chi, Mahdis Mahdieh, Ian Tenney, Nan Hua, Ivan Petrychenko, Patrick Kane, Dylan Scandinaro, Rishub Jain, Jonathan Uesato, Romina Datta, Adam Sadovsky, Oskar Bunyan, Dominik Rabiej, Shimu Wu, John Zhang, Gautam Vasudevan, Edouard Leurent, Mahmoud Alnahlawi, Ionut Georgescu, Nan Wei, Ivy Zheng, Betty Chan, Pam G Rabinovitch, Piotr Stanczyk, Ye Zhang, David Steiner, Subhajit Naskar, Michael Azzam, Matthew Johnson, Adam Paszke, Chung-Cheng Chiu, Jaume Sanchez Elias, Afroz Mohiuddin, Faizan Muhammad, Jin Miao, Andrew Lee, Nino Vieillard, Sahitya Potluri, Jane Park, Elnaz Davoodi, Jiageng Zhang, Jeff Stanway, Drew Garmon, Abhijit Karmarkar, Zhe Dong, Jong Lee, Aviral Kumar, Luowei Zhou, Jonathan Evens, William Isaac, Zhe Chen, Johnson Jia, Anselm Levskaya, Zhenkai Zhu, Chris Gorgolewski, Peter Grabowski, Yu Mao, Alberto Magni, Kaisheng Yao, Javier Snaider, Norman Casagrande, Paul Suganthan, Evan Palmer, Geoffrey Irving, Edward Loper, Manaal Faruqui, Isha Arkatkar, Nanxin Chen, Izhak Shafran, Michael Fink, Alfonso Castaño, Irene Giannoumis, Wooyeol Kim, Mikołaj Rybiński, Ashwin Sreevatsa, Jennifer Prendki, David Soergel, Adrian Goedeckemeyer, Willi Gierke, Mohsen Jafari, Meenu Gaba, Jeremy Wiesner, Diana Gage Wright, Yawen Wei, Harsha Vashisht, Yana Kulizhskaya, Jay Hoover, Maigo Le, Lu Li, Chimezie Iwuanyanwu, Lu Liu, Kevin Ramirez, Andrey Khorlin, Albert Cui, Tian Lin, Marin Georgiev, Marcus Wu, Ricardo Aguilar, Keith Pallo, Abhishek Chakladar, Alena Repina, Xihui Wu, Tom van der Weide, Priya Ponnapalli, Caroline Kaplan, Jiri Simsa, Shuangfeng Li, Olivier Dousse, Jeff Piper, Nathan Ie, Minnie Lui, Rama Pasumarthi, Nathan Lintz, Anitha Vijayakumar, Lam Nguyen Thiet, Daniel Andor, Pedro Valenzuela, Cosmin Paduraru, Daiyi Peng, Katherine Lee, Shuyuan Zhang, Somer Greene, Duc Dung Nguyen, Paula Kurylowicz, Sarmishta Velury, Sebastian Krause, Cassidy Hardin, Lucas Dixon, Lili Janzer, Kiam Choo, Ziqiang Feng, Biao Zhang, Achintya Singhal, Tejasi Latkar, Mingyang Zhang, Quoc Le, Elena Allica Abellan, Dayou Du, Dan McKinnon, Natasha Antropova, Tolga Bolukbasi, Orgad Keller, David Reid, Daniel Finchelstein, Maria Abi Raad, Remi Crocker, Peter Hawkins, Robert Dadashi, Colin Gaffney, Sid Lall, Ken Franko, Egor Filonov, Anna Bulanova, Rémi Leblond, Vikas Yadav, Shirley Chung, Harry Askham, Luis C. Cobo, Kelvin Xu, Felix Fischer, Jun Xu, Christina Sorokin, Chris Alberti, Chu-Cheng Lin, Colin Evans, Hao Zhou, Alek Dimitriev, Hannah Forbes, Dylan Banarse, Zora Tung, Jeremiah Liu, Mark Omernick, Colton Bishop, Chintu Kumar, Rachel Sterneck, Ryan Foley, Rohan Jain, Swaroop Mishra, Jiawei Xia, Taylor Bos, Geoffrey Cideron, Ehsan Amid, Francesco Piccinno, Xingyu Wang, Praseem Banzal, Petru Gurita, Hila Noga, Premal Shah, Daniel J. Mankowitz, Alex Polozov, Nate Kushman, Victoria Krakovna, Sasha Brown, Mohammadhossein Bateni, Dennis Duan, Vlad Firoiu, Meghana Thotakuri, Tom Natan, Anhad Mohananey, Matthieu Geist, Sidharth Mudgal, Sertan Girgin, Hui Li, Jiayu Ye, Ofir Roval, Reiko Tojo, Michael Kwong, James Lee-Thorp, Christopher Yew, Quan Yuan, Sumit Bagri, Danila Sinopalnikov, Sabela Ramos, John Mellor, Abhishek Sharma, Aliaksei Severyn, Jonathan Lai, Kathy Wu, Heng-Tze Cheng, David Miller, Nicolas Sonnerat, Denis Vnukov, Rory Greig, Jennifer Beattie, Emily Caveness, Libin Bai, Julian Eisenschlos, Alex Korchemniy, Tomy Tsai, Mimi Jasarevic, Weize Kong, Phuong Dao, Zeyu Zheng, Frederick Liu, Fan Yang, Rui Zhu, Mark Geller, Tian Huey Teh, Jason Sanmiya, Evgeny Gladchenko, Nejc Trdin, Andrei Sozanschi, Daniel Toyama, Evan Rosen, Sasan Tavakkol, Linting Xue, Chen Elkind, Oliver Woodman, John Carpenter, George Papamakarios, Rupert Kemp, Sushant Kafle, Tanya Grunina, Rishika Sinha, Alice Talbert, Abhimanyu Goyal, Diane Wu, Denese Owusu-Afriyie, Cosmo Du, Chloe Thornton, Jordi Pont-Tuset, Pradyumna Narayana, Jing Li, Sabaer Fatehi, John Wieting, Omar Ajmeri, Benigno Uria, Tao Zhu, Yeongil Ko, Laura Knight, Amélie Héliou, Ning Niu, Shane Gu, Chenxi Pang, Dustin Tran, Yeqing Li, Nir Levine, Ariel Stolovich, Norbert Kalb, Rebeca Santamaria-Fernandez, Sonam Goenka, Wenny Yustalim, Robin Strudel, Ali Elqursh, Balaji Lakshminarayanan, Charlie Deck, Shyam Upadhyay, Hyo Lee, Mike Dusenberry, Zonglin Li, Xuezhi Wang, Kyle Levin, Raphael Hoffmann, Dan Holtmann-Rice, Olivier Bachem, Summer Yue, Sho Arora, Eric Malmi, Daniil Mirylenka, Qijun Tan, Christy Koh, Soheil Hassas Yeganeh, Siim Põder, Steven Zheng, Francesco Pongetti, Mukarram Tariq, Yanhua Sun, Lucian Ionita, Mojtaba Seyedhosseini, Pouya Tafti, Ragha Kotikalapudi, Zhiyu Liu, Anmol Gulati, Jasmine Liu, Xinyu Ye, Bart Chrzaszcz, Lily Wang, Nikhil Sethi, Tianrun Li, Ben Brown, Shreya Singh, Wei Fan, Aaron Parisi, Joe Stanton, Chenkai Kuang, Vinod Koverkathu, Christopher A. Choquette-Choo, Yunjie Li, TJ Lu, Abe Ittycheriah, Prakash Shroff, Pei Sun, Mani Varadarajan, Sanaz Bahargam, Rob Willoughby, David Gaddy, Ishita Dasgupta, Guillaume Desjardins, Marco Cornero, Brona Robenek, Bhavishya Mittal, Ben Albrecht, Ashish Shenoy, Fedor Moiseev, Henrik Jacobsson, Alireza Ghaffarkhah, Morgane Rivière, Alanna Walton, Clément Crepy, Alicia Parrish, YuAn Liu, Zongwei Zhou, Clement Farabet, Carey Radebaugh, Praveen Srinivasan, Claudia van der Salm, Andreas Fidjeland, Salvatore Scellato, Eri Latorre-Chimoto, Hanna Klimczak-Plucińska, David Bridson, Dario de Cesare, Tom Hudson, Piermaria Mendolicchio, Lexi Walker, Alex Morris, Ivo Penchev, Matthew Mauger, Alexey Guseynov, Alison Reid, Seth Odoom, Lucia Loher, Victor Cotruta, Madhavi Yenugula, Dominik Grewe, Anastasia Petrushkina, Tom Duerig, Antonio Sanchez, Steve Yadlowsky, Amy Shen, Amir Globerson, Adam Kurzrok, Lynette Webb, Sahil Dua, Dong Li, Preethi Lahoti, Surya Bhupatiraju, Dan Hurt, Haroon Qureshi, Ananth Agarwal, Tomer Shani, Matan Eyal, Anuj Khare, Shreyas Rammohan Belle, Lei Wang, Chetan Tekur, Mihir Sanjay Kale, Jinliang Wei, Ruoxin Sang, Brennan Saeta, Tyler Liechty, Yi Sun, Yao Zhao, Stephan Lee, Pandu Nayak, Doug Fritz, Manish Reddy Vuyyuru, John Aslanides, Nidhi Vyas, Martin Wicke, Xiao Ma, Taylan Bilal, Evgenii Eltyshev, Daniel Balle, Nina Martin, Hardie Cate, James Manyika, Keyvan Amiri, Yelin Kim, Xi Xiong, Kai Kang, Florian Luisier, Nilesh Tripuraneni, David Madras, Mandy Guo, Austin Waters, Oliver Wang, Joshua Ainslie, Jason Baldridge, Han Zhang, Garima Pruthi, Jakob Bauer, Feng Yang, Riham Mansour, Jason Gelman, Yang Xu, George Polovets, Ji Liu, Honglong Cai, Warren Chen, XiangHai Sheng, Emily Xue, Sherjil Ozair, Adams Yu, Christof Angermueller, Xiaowei Li, Weiren Wang, Julia Wiesinger, Emmanouil Koukoumidis, Yuan Tian, Anand Iyer, Madhu Gurumurthy, Mark Goldenson, Parashar Shah, MK Blake, Hongkun Yu, Anthony Urbanowicz, Jennimaria Palomaki, Chrisantha Fernando, Kevin Brooks, Ken Durden, Harsh Mehta, Nikola Momchev, Elahe Rahimtoroghi, Maria Georgaki, Amit Raul, Sebastian Ruder, Morgan Redshaw, Jinhyuk Lee, Komal Jalan, Dinghua Li, Ginger Perng, Blake Hechtman, Parker Schuh, Milad Nasr, Mia Chen, Kieran Milan, Vladimir Mikulik, Trevor Strohman, Juliana Franco, Tim Green, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding.

Arithmetic Reasoning Code Generation +3

Implicit Modeling of Non-rigid Objects with Cross-Category Signals

no code implementations15 Dec 2023 Yuchun Liu, Benjamin Planche, Meng Zheng, Zhongpai Gao, Pierre Sibut-Bourde, Fan Yang, Terrence Chen, Ziyan Wu

To effectively capture the interrelation between these entities and ensure precise, collision-free representations, our approach facilitates signaling between category-specific fields to adequately rectify shapes.

Object

Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning

no code implementations14 Dec 2023 Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Fan Yang, Mao Yang

In this work, we propose CoT-Influx, a novel approach that pushes the boundary of few-shot Chain-of-Thoughts (CoT) learning to improve LLM mathematical reasoning.

Arithmetic Reasoning Few-Shot Learning +3

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

1 code implementation11 Dec 2023 Abdullah Rashwan, Jiageng Zhang, Ali Taalimi, Fan Yang, Xingyi Zhou, Chaochao Yan, Liang-Chieh Chen, Yeqing Li

With ResNet50 backbone, our MaskConver achieves 53. 6% PQ on the COCO panoptic val set, outperforming the modern convolution-based model, Panoptic FCN, by 9. 3% as well as transformer-based models such as Mask2Former (+1. 7% PQ) and kMaX-DeepLab (+0. 6% PQ).

Panoptic Segmentation

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

no code implementations3 Dec 2023 Fan Yang, Tianyi Chen, Xiaosheng He, Zhongang Cai, Lei Yang, Si Wu, Guosheng Lin

We propose AttriHuman-3D, an editable 3D human generation model, which address the aforementioned problems with attribute decomposition and indexing.

Attribute Disentanglement

Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

no code implementations26 Nov 2023 Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies.

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

1 code implementation24 Nov 2023 Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang

More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules.

Referring Expression Referring Expression Comprehension

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

1 code implementation8 Nov 2023 Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao

As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner.

Domain Adaptation Federated Learning

Detecting Generated Images by Real Images Only

no code implementations2 Nov 2023 Xiuli Bi, Bo Liu, Fan Yang, Bin Xiao, Weisheng Li, Gao Huang, Pamela C. Cosman

This paper approaches the generated image detection problem from a new perspective: Start from real images.

VisPercep: A Vision-Language Approach to Enhance Visual Perception for People with Blindness and Low Vision

no code implementations31 Oct 2023 Yu Hao, Fan Yang, Hao Huang, Shuaihang Yuan, Sundeep Rangan, John-Ross Rizzo, Yao Wang, Yi Fang

By combining the prompt and input image, a large vision-language model (i. e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing the environmental objects and scenes, relevant to the prompt.

Language Modelling Prompt Engineering +1

Popularity, face and voice: Predicting and interpreting livestreamers' retail performance using machine learning techniques

no code implementations29 Oct 2023 Xiong Xiong, Fan Yang, Li Su

Livestreaming commerce, a hybrid of e-commerce and self-media, has expanded the broad spectrum of traditional sales performance determinants.

Explainable artificial intelligence Feature Importance

Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion

1 code implementation25 Oct 2023 Fan Yang, Xiaofei Wang

To address this issue, we proposed a method for extending the spatio-temporal behavior dataset in Student Classroom Scenarios (SCB-ST-Dataset4) through image dataset.

BitNet: Scaling 1-bit Transformers for Large Language Models

2 code implementations17 Oct 2023 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.

Language Modelling Quantization

Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

no code implementations10 Oct 2023 Fan Yang, Wenxuan Zhou, Zuxin Liu, Ding Zhao, David Held

This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively.

reinforcement-learning Reinforcement Learning (RL) +1

A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors

no code implementations4 Oct 2023 Fan Yang

However, low accuracy in student classroom behavior detection is a prevalent issue.

SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior

1 code implementation4 Oct 2023 Fan Yang, Tao Wang

The use of deep learning methods to automatically detect students' classroom behavior is a promising approach for analyzing their class performance and improving teaching effectiveness.

GAFlow: Incorporating Gaussian Attention into Optical Flow

1 code implementation ICCV 2023 Ao Luo, Fan Yang, Xin Li, Lang Nie, Chunyu Lin, Haoqiang Fan, Shuaicheng Liu

Moreover, for reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM) which not only inherits properties from Gaussian distribution to instinctively revolve around the neighbor fields of each point but also is empowered to put the emphasis on contextually related regions during matching.

Optical Flow Estimation Representation Learning

Model-enhanced Vector Index

1 code implementation NeurIPS 2023 Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions.

Natural Questions Quantization +1

AOSR-Net: All-in-One Sandstorm Removal Network

no code implementations16 Sep 2023 Yazhong Si, xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao

Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios.

Image Enhancement Image Restoration

DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research

1 code implementation4 Sep 2023 Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Kwei-Herng Lai, Daochen Zha, Ruixiang Tang, Fan Yang, Alfredo Costilla Reyes, Kaixiong Zhou, Xiaoqian Jiang, Xia Hu

The exponential growth in scholarly publications necessitates advanced tools for efficient article retrieval, especially in interdisciplinary fields where diverse terminologies are used to describe similar research.

named-entity-recognition Named Entity Recognition +5

Explainability for Large Language Models: A Survey

no code implementations2 Sep 2023 Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du

For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge.

RecMind: Large Language Model Powered Agent For Recommendation

no code implementations28 Aug 2023 Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, Yingzhen Yang

While the recommendation system (RS) has advanced significantly through deep learning, current RS approaches usually train and fine-tune models on task-specific datasets, limiting their generalizability to new recommendation tasks and their ability to leverage external knowledge due to model scale and data size constraints.

Explanation Generation Language Modelling +2

Multilingual context-based pronunciation learning for Text-to-Speech

no code implementations31 Jul 2023 Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.

Words That Stick: Predicting Decision Making and Synonym Engagement Using Cognitive Biases and Computational Linguistics

no code implementations26 Jul 2023 Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano

This research draws upon cognitive psychology and information systems studies to anticipate user engagement and decision-making on digital platforms.

Decision Making Marketing

A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing

no code implementations26 Jul 2023 Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano

This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution.

Language Modelling

One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

1 code implementation25 Jul 2023 Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen, Di Guo, Guang Yang, Xiaobo Qu

We demonstrate that training DL models on synthetic data, coupled with enhanced learning techniques, yields in vivo MRI reconstructions comparable to or surpassing those of models trained on matched realistic datasets, reducing the reliance on real-world MRI data by up to 96%.

Medical Diagnosis MRI Reconstruction

ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer

no code implementations26 Jun 2023 Jiaxin Deng, Dong Shen, Shiyao Wang, Xiangyu Wu, Fan Yang, Guorui Zhou, Gaofeng Meng

However, most previous works treat the live as a whole item and explore the Click-through-Rate (CTR) prediction framework on item-level, neglecting that the dynamic changes that occur even within the same live room.

Click-Through Rate Prediction Dynamic Time Warping +1

A Pairing Enhancement Approach for Aspect Sentiment Triplet Extraction

no code implementations11 Jun 2023 Fan Yang, Mian Zhang, Gongzhen Hu, Xiabing Zhou

Aspect Sentiment Triplet Extraction (ASTE) aims to extract the triplet of an aspect term, an opinion term, and their corresponding sentiment polarity from the review texts.

Aspect Sentiment Triplet Extraction Contrastive Learning +1

Multi-Task Knowledge Enhancement for Zero-Shot and Multi-Domain Recommendation in an AI Assistant Application

no code implementations9 Jun 2023 Elan Markowitz, Ziyan Jiang, Fan Yang, Xing Fan, Tony Chen, Greg Ver Steeg, Aram Galstyan

We propose in this work to unify these approaches: Using information from interactions in other domains as well as external knowledge graphs to make predictions in a new domain that would be impossible with either information source alone.

Knowledge Graphs Recommendation Systems

Student Classroom Behavior Detection based on Improved YOLOv7

1 code implementation6 Jun 2023 Fan Yang

Accurately detecting student behavior in classroom videos can aid in analyzing their classroom performance and improving teaching effectiveness.

Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

no code implementations31 May 2023 Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients.

Reversible Quantization Index Modulation for Static Deep Neural Network Watermarking

no code implementations29 May 2023 Junren Qin, Shanxiang Lyu, Fan Yang, Jiarui Deng, Zhihua Xia, Xiaochun Cao

In this paper, we propose a novel RDH-based static DNN watermarking scheme using quantization index modulation (QIM).

Quantization

Self-aware and Cross-sample Prototypical Learning for Semi-supervised Medical Image Segmentation

no code implementations25 May 2023 Zhenxi Zhang, Ran Ran, Chunna Tian, Heng Zhou, Xin Li, Fan Yang, Zhicheng Jiao

To address these issues, we propose a self-aware and cross-sample prototypical learning method (SCP-Net) to enhance the diversity of prediction in consistency learning by utilizing a broader range of semantic information derived from multiple inputs.

Image Segmentation Semantic Segmentation +1

Cross-supervised Dual Classifiers for Semi-supervised Medical Image Segmentation

no code implementations25 May 2023 Zhenxi Zhang, Ran Ran, Chunna Tian, Heng Zhou, Fan Yang, Xin Li, Zhicheng Jiao

This paper proposes a cross-supervised learning framework based on dual classifiers (DC-Net), including an evidential classifier and a vanilla classifier.

Image Segmentation Segmentation +2

Interpretation of Time-Series Deep Models: A Survey

no code implementations23 May 2023 Ziqi Zhao, Yucheng Shi, Shushan Wu, Fan Yang, WenZhan Song, Ninghao Liu

Deep learning models developed for time-series associated tasks have become more widely researched nowadays.

Time Series

Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding

no code implementations23 May 2023 Zheng Chen, Ziyan Jiang, Fan Yang, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, Aram Galstyan

This paper presents our "Collaborative Query Rewriting" approach, which specifically addresses the task of rewriting new user interactions that have not been previously observed in the user's history.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +9

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

no code implementations21 May 2023 Yijia Zhang, Lingran Zhao, Shijie Cao, WenQiang Wang, Ting Cao, Fan Yang, Mao Yang, Shanghang Zhang, Ningyi Xu

In this study, we conduct a comparative analysis of INT and FP quantization with the same bit-width, revealing that the optimal quantization format varies across different layers due to the complexity and diversity of tensor distribution.

Quantization

OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields

1 code implementation17 May 2023 Youtan Yin, Zhoujie Fu, Fan Yang, Guosheng Lin

This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view, achieving better performance in less time than previous works.

3D scene Editing Novel View Synthesis +1

The Ways of Words: The Impact of Word Choice on Information Engagement and Decision Making

no code implementations16 May 2023 Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan Yang, Jennifer Romano

The framework was empirically validated in a large-scale user study measuring how word choice impacts the dimensions of IE.

Decision Making

Student Classroom Behavior Detection based on YOLOv7-BRA and Multi-Model Fusion

1 code implementation13 May 2023 Fan Yang, Tao Wang, Xiaofei Wang

We constructed a dataset, which contained 11, 248 labels and 4, 001 images, with an emphasis on the common behavior of raising hands in a classroom setting (Student Classroom Behavior dataset, SCB-Dataset).

SSD-MonoDETR: Supervised Scale-aware Deformable Transformer for Monocular 3D Object Detection

1 code implementation12 May 2023 Xuan He, Fan Yang, Kailun Yang, Jiacheng Lin, Haolong Fu, Meng Wang, Jin Yuan, Zhiyong Li

To tackle this problem, this paper proposes a novel "Supervised Scale-aware Deformable Attention" (SSDA) for monocular 3D object detection.

Monocular 3D Object Detection Object +1

HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation

no code implementations6 May 2023 Wenxuan Zhou, Bowen Jiang, Fan Yang, Chris Paxton, David Held

In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations.

Object

MDDL: A Framework for Reinforcement Learning-based Position Allocation in Multi-Channel Feed

no code implementations17 Apr 2023 Xiaowen Shi, Ze Wang, Yuanying Cai, Xiaoxu Wu, Fan Yang, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang

There are two types of data employed to train reinforcement learning (RL) model for position allocation, named strategy data and random data.

Imitation Learning Position +2

Reweighted Mixup for Subpopulation Shift

no code implementations9 Apr 2023 Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao, QinGhua Hu, Bingzhe Wu, Changqing Zhang, Jianhua Yao

Subpopulation shift exists widely in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.

Fairness Generalization Bounds

A large-scale dataset for end-to-end table recognition in the wild

1 code implementation27 Mar 2023 Fan Yang, Lei Hu, Xinwu Liu, Shuangping Huang, Zhenghui Gu

To this end, we propose a new large-scale dataset named Table Recognition Set (TabRecSet) with diverse table forms sourcing from multiple scenarios in the wild, providing complete annotation dedicated to end-to-end TR research.

Table annotation Table Detection +1

Did You Train on My Dataset? Towards Public Dataset Protection with Clean-Label Backdoor Watermarking

1 code implementation20 Mar 2023 Ruixiang Tang, Qizhang Feng, Ninghao Liu, Fan Yang, Xia Hu

To overcome this challenge, we introduce a clean-label backdoor watermarking framework that uses imperceptible perturbations to replace mislabeled samples.

Anomaly Detection

IRGen: Generative Modeling for Image Retrieval

1 code implementation17 Mar 2023 Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Mao Yang, Qingmin Liao, Baining Guo

While generative modeling has been ubiquitous in natural language processing and computer vision, its application to image retrieval remains unexplored.

Image Retrieval Retrieval

Data-centric Artificial Intelligence: A Survey

10 code implementations17 Mar 2023 Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Zhimeng Jiang, Shaochen Zhong, Xia Hu

Artificial Intelligence (AI) is making a profound impact in almost every domain.

Generation-Guided Multi-Level Unified Network for Video Grounding

no code implementations14 Mar 2023 Xing Cheng, Xiangyu Wu, Dong Shen, Hezheng Lin, Fan Yang

Video grounding aims to locate the timestamps best matching the query description within an untrimmed video.

Video Grounding

Spatial Attention and Syntax Rule Enhanced Tree Decoder for Offine Handwritten Mathematical Expression Recognition

no code implementations13 Mar 2023 Zihao Lin, Jinrong Li, Fan Yang, Shuangping Huang, Xu Yang, Jianmin Lin, Ming Yang

In this paper, we propose a novel model called Spatial Attention and Syntax Rule Enhanced Tree Decoder (SS-TD), which is equipped with spatial attention mechanism to alleviate the prediction error of tree structure and use syntax masks (obtained from the transformation of syntax rules) to constrain the occurrence of ungrammatical mathematical expression.

CoRTX: Contrastive Framework for Real-time Explanation

1 code implementation5 Mar 2023 Yu-Neng Chuang, Guanchu Wang, Fan Yang, Quan Zhou, Pushkar Tripathi, Xuanting Cai, Xia Hu

In this work, we propose a COntrastive Real-Time eXplanation (CoRTX) framework to learn the explanation-oriented representation and relieve the intensive dependence of explainer training on explanation labels.

Contrastive Learning

Deep Seam Prediction for Image Stitching Based on Selection Consistency Loss

no code implementations10 Feb 2023 Senmao Cheng, Fan Yang, Zhi Chen, Nanjun Yuan, Wenbing Tao

To our knowledge, the proposed DSeam is the first deep learning based seam prediction method for image stitching.

Image Stitching

A Unified Multi-view Multi-person Tracking Framework

no code implementations8 Feb 2023 Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang

Although there is a significant development in 3D Multi-view Multi-person Tracking (3D MM-Tracking), current 3D MM-Tracking frameworks are designed separately for footprint and pose tracking.

3D Multi-Person Pose Estimation Multiple People Tracking +2

Efficient XAI Techniques: A Taxonomic Survey

no code implementations7 Feb 2023 Yu-Neng Chuang, Guanchu Wang, Fan Yang, Zirui Liu, Xuanting Cai, Mengnan Du, Xia Hu

Finally, we summarize the challenges of deploying XAI acceleration methods to real-world scenarios, overcoming the trade-off between faithfulness and efficiency, and the selection of different acceleration methods.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI)

PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce

1 code implementation6 Feb 2023 Xiaowen Shi, Fan Yang, Ze Wang, Xiaoxu Wu, Muzhi Guan, Guogang Liao, Yongkang Wang, Xingxing Wang, Dong Wang

Then we design a novel omnidirectional attention mechanism in OCPM to capture the context information in the permutation.

Re-Ranking

Conditional generalized quantiles based on expected utility model and equivalent characterization of properties

no code implementations29 Jan 2023 Qinyu Wu, Fan Yang, Ping Zhang

As a counterpart to the (static) risk measures of generalized quantiles and motivated by Bellini et al. (2018), we propose a new kind of conditional risk measure called conditional generalized quantiles.

Data-centric AI: Perspectives and Challenges

1 code implementation12 Jan 2023 Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Xia Hu

The role of data in building AI systems has recently been significantly magnified by the emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model advancements to ensuring data quality and reliability.

AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts

1 code implementation ICCV 2023 Tianlong Chen, Xuxi Chen, Xianzhi Du, Abdullah Rashwan, Fan Yang, Huizhong Chen, Zhangyang Wang, Yeqing Li

Instead of compressing multiple tasks' knowledge into a single model, MoE separates the parameter space and only utilizes the relevant model pieces given task type and its input, which provides stabilized MTL training and ultra-efficient inference.

Instance Segmentation Multi-Task Learning +3

Towards Blind Watermarking: Combining Invertible and Non-invertible Mechanisms

1 code implementation24 Dec 2022 Rui Ma, Mengxi Guo, Yi Hou, Fan Yang, Yuan Li, Huizhu Jia, Xiaodong Xie

The CIN is composed of the invertible part to achieve high imperceptibility and the non-invertible part to strengthen the robustness against strong noise attacks.

Exploring Stochastic Autoregressive Image Modeling for Visual Representation

1 code implementation3 Dec 2022 Yu Qi, Fan Yang, Yousong Zhu, Yufei Liu, Liwei Wu, Rui Zhao, Wei Li

By introducing stochastic prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling.

Self-Supervised Learning

Instance-level Heterogeneous Domain Adaptation for Limited-labeled Sketch-to-Photo Retrieval

1 code implementation IEEE Transactions on Multimedia 2020 Fan Yang, Yang Wu, Zheng Wang, Xiang Li, Sakriani Sakti, Satoshi Nakamura

Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i. e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i. e., target domain).

Domain Adaptation Image Retrieval +1

MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts

1 code implementation25 Nov 2022 Xiangyu Xi, Jianwei Lv, Shuaipeng Liu, Wei Ye, Fan Yang, Guanglu Wan

As a pioneering exploration that expands event detection to the scenarios involving informal and heterogeneous texts, we propose a new large-scale Chinese event detection dataset based on user reviews, text conversations, and phone conversations in a leading e-commerce platform for food service.

Event Detection

Hard to Track Objects with Irregular Motions and Similar Appearances? Make It Easier by Buffering the Matching Space

no code implementations24 Nov 2022 Fan Yang, Shigeyuki Odashima, Shoichi Masui, Shan Jiang

To address this issue, our C-BIoU tracker adds buffers to expand the matching space of detections and tracks, which mitigates the effect of irregular motions in two aspects: one is to directly match identical but non-overlapping detections and tracks in adjacent frames, and the other is to compensate for the motion estimation bias in the matching space.

Motion Estimation Multi-Object Tracking +1

A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset

no code implementations19 Nov 2022 Jiaxin Deng, Dong Shen, Haojie Pan, Xiangyu Wu, Ximan Liu, Gaofeng Meng, Fan Yang, Size Li, Ruiji Fu, Zhongyuan Wang

Furthermore, based on this dataset, we propose an end-to-end model that jointly optimizes the video understanding objective with knowledge graph embedding, which can not only better inject factual knowledge into video understanding but also generate effective multi-modal entity embedding for KG.

Common Sense Reasoning Knowledge Graph Embedding +4

Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

1 code implementation14 Nov 2022 Zelong Zeng, Fan Yang, Hong Liu, Shin'ichi Satoh

However, this type of method normally ignores the crucial knowledge hidden in the data (e. g., intra-class information variation), which is harmful to the generalization of the trained model.

Metric Learning

Deep-Learning-Empowered Inverse Design for Freeform Reconfigurable Metasurfaces

no code implementations11 Nov 2022 Changhao Liu, Fan Yang, Maokun Li, Shenheng Xu

Recently, artificial neural network empowered inverse design for metasurfaces has been developed that can design on-demand meta-atoms with diverse shapes and high performance, where the design process based on artificial intelligence is fast and automatic.

ISA-Net: Improved spatial attention network for PET-CT tumor segmentation

no code implementations4 Nov 2022 Zhengyong Huang, Sijuan Zou, Guoshuai Wang, Zixiang Chen, Hao Shen, HaiYan Wang, Na Zhang, Lu Zhang, Fan Yang, Haining Wangg, Dong Liang, Tianye Niu, Xiaohua Zhuc, Zhanli Hua

In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information.

Segmentation STS +1

Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection

no code implementations3 Nov 2022 Fan Yang, Xinhao Xu, Hui Chen, Yuchen Guo, Jungong Han, Kai Ni, Guiguang Ding

To pick up the ground plane prior for M3OD, we propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go.

Monocular 3D Object Detection object-detection

SIMPLE-RC: Group Network Inference with Non-Sharp Nulls and Weak Signals

no code implementations31 Oct 2022 Jianqing Fan, Yingying Fan, Jinchi Lv, Fan Yang

To address these practical challenges, in this paper we propose a SIMPLE method with random coupling (SIMPLE-RC) for testing the non-sharp null hypothesis that a group of given nodes share similar (not necessarily identical) membership profiles under weaker signals.

Uncertainty Quantification

Revisiting Attention Weights as Explanations from an Information Theoretic Perspective

no code implementations31 Oct 2022 Bingyang Wen, K. P. Subbalakshmi, Fan Yang

Attention mechanisms have recently demonstrated impressive performance on a range of NLP tasks, and attention scores are often used as a proxy for model explainability.

Deep Attention

Forecasting Human Trajectory from Scene History

1 code implementation17 Oct 2022 Mancheng Meng, Ziyan Wu, Terrence Chen, Xiran Cai, Xiang Sean Zhou, Fan Yang, Dinggang Shen

We categorize scene history information into two types: historical group trajectory and individual-surroundings interaction.

Trajectory Prediction

SoccerNet 2022 Challenges Results

7 code implementations5 Oct 2022 Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks

2 code implementations28 Sep 2022 Zhiyang Chen, Yousong Zhu, Zhaowen Li, Fan Yang, Wei Li, Haixin Wang, Chaoyang Zhao, Liwei Wu, Rui Zhao, Jinqiao Wang, Ming Tang

Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.

Multi-Label Classification Object +2

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

no code implementations22 Sep 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Chen Zhang, Ying Cao, Quanlu Zhang, Yunxin Liu, Fan Yang, Minyi Guo

An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN).

UMIX: Improving Importance Weighting for Subpopulation Shift via Uncertainty-Aware Mixup

1 code implementation19 Sep 2022 Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao, Bingzhe Wu, Changqing Zhang, Jianhua Yao

Importance reweighting is a normal way to handle the subpopulation shift issue by imposing constant or adaptive sampling weights on each sample in the training dataset.

Generalization Bounds

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

1 code implementation30 Aug 2022 Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu

In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.

Quantization

Surrogate-assisted Multi-objective Neural Architecture Search for Real-time Semantic Segmentation

no code implementations14 Aug 2022 Zhichao Lu, Ran Cheng, Shihua Huang, Haoming Zhang, Changxiao Qiu, Fan Yang

The main challenges of applying NAS to semantic segmentation arise from two aspects: (i) high-resolution images to be processed; (ii) additional requirement of real-time inference speed (i. e., real-time semantic segmentation) for applications such as autonomous driving.

Autonomous Driving Image Classification +3

Differentially Private Counterfactuals via Functional Mechanism

no code implementations4 Aug 2022 Fan Yang, Qizhang Feng, Kaixiong Zhou, Jiahao Chen, Xia Hu

Counterfactual, serving as one emerging type of model explanation, has attracted tons of attentions recently from both industry and academia.

counterfactual valid

Improving Generalization of Metric Learning via Listwise Self-distillation

1 code implementation17 Jun 2022 Zelong Zeng, Fan Yang, Zheng Wang, Shin'ichi Satoh

Most deep metric learning (DML) methods employ a strategy that forces all positive samples to be close in the embedding space while keeping them away from negative ones.

Metric Learning

Accelerating Shapley Explanation via Contributive Cooperator Selection

1 code implementation17 Jun 2022 Guanchu Wang, Yu-Neng Chuang, Mengnan Du, Fan Yang, Quan Zhou, Pushkar Tripathi, Xuanting Cai, Xia Hu

Even though Shapley value provides an effective explanation for a DNN model prediction, the computation relies on the enumeration of all possible input feature coalitions, which leads to the exponentially growing complexity.

Learning Interpretable Decision Rule Sets: A Submodular Optimization Approach

no code implementations NeurIPS 2021 Fan Yang, Kai He, Linxiao Yang, Hongxia Du, Jingbang Yang, Bo Yang, Liang Sun

The learning problem is framed as a subset selection task in which a subset of all possible rules needs to be selected to form an accurate and interpretable rule set.

Tutel: Adaptive Mixture-of-Experts at Scale

2 code implementations7 Jun 2022 Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Joe Chau, Peng Cheng, Fan Yang, Mao Yang, Yongqiang Xiong

On efficiency, Flex accelerates SwinV2-MoE, achieving up to 1. 55x and 2. 11x speedup in training and inference over Fairseq, respectively.

Object Detection

Geo-Localization via Ground-to-Satellite Cross-View Image Retrieval

1 code implementation22 May 2022 Zelong Zeng, Zheng Wang, Fan Yang, Shin'ichi Satoh

The large variation of viewpoint and irrelevant content around the target always hinder accurate image retrieval and its subsequent tasks.

Image Retrieval Representation Learning +1

Demo: low-power communications based on RIS and AI for 6G

no code implementations21 May 2022 Mingyao Cui, Zidong Wu, Yuhao Chen, Shenheng Xu, Fan Yang, Linglong Dai

By jointly designing the hardware and software, this prototype can realize real-time 4K video transmission with much reduced power consumption.

NMA: Neural Multi-slot Auctions with Externalities for Online Advertising

no code implementations20 May 2022 Guogang Liao, Xuejian Li, Ze Wang, Fan Yang, Muzhi Guan, Bingqi Zhu, Yongkang Wang, Xingxing Wang, Dong Wang

Although VCG-based multi-slot auctions (e. g., VCG, WVCG) make it theoretically possible to model global externalities (e. g., the order and positions of ads and so on), they lack an efficient balance of both revenue and social welfare.

A Low-Cost, Controllable and Interpretable Task-Oriented Chatbot: With Real-World After-Sale Services as Example

no code implementations13 May 2022 Xiangyu Xi, Chenxu Lv, Yuncheng Hua, Wei Ye, Chaobo Sun, Shuaipeng Liu, Fan Yang, Guanglu Wan

Though widely used in industry, traditional task-oriented dialogue systems suffer from three bottlenecks: (i) difficult ontology construction (e. g., intents and slots); (ii) poor controllability and interpretability; (iii) annotation-hungry.

Chatbot Task-Oriented Dialogue Systems

Limited-memory BFGS Optimisation of Phase-Only Computer-Generated Hologram for Fraunhofer Diffraction

no code implementations10 May 2022 Jinze Sha, Andrew Kadis, Fan Yang, Timothy D. Wilkinson

We implement a novel limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimisation algorithm with cross entropy (CE) loss function, to produce phase-only computer-generated hologram (CGH) for holographic displays, with validation on a binary-phase modulation holographic projector.

Learning Individual Interactions from Population Dynamics with Discrete-Event Simulation Model

no code implementations4 May 2022 Yan Shen, Fan Yang, Mingchen Gao, Wen Dong

Traditional machine learning approaches capture complex system dynamics either with dynamic Bayesian networks and state space models, which is hard to scale because it is non-trivial to prescribe the dynamics with a sparse graph or a system of differential equations; or a deep neural networks, where the distributed representation of the learned dynamics is hard to interpret.

A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions

1 code implementation21 Apr 2022 Fan Yang

Spatio-temporal action detection is an important and challenging problem in video understanding.

Action Detection Video Understanding

SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration

1 code implementation28 Mar 2022 Zhi Chen, Kun Sun, Fan Yang, Wenbing Tao

In this paper, we present a second order spatial compatibility (SC^2) measure based method for efficient and robust point cloud registration (PCR), called SC^2-PCR.

Point Cloud Registration

Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss

no code implementations17 Mar 2022 Yantao Gong, Cao Liu, Fan Yang, Xunliang Cai, Guanglu Wan, Jiansong Chen, Weipeng Zhang, Houfeng Wang

Experiments on the open datasets verify that our model outperforms the existing calibration methods and achieves a significant improvement on the calibration metric.

Intent Detection

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

no code implementations CVPR 2022 Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.

Image Classification Object +4

Learning from Attacks: Attacking Variational Autoencoder for Improving Image Classification

no code implementations11 Mar 2022 Jianzhang Zheng, Fan Yang, Hao Shen, Xuan Tang, Mingsong Chen, Liang Song, Xian Wei

We propose an algorithmic framework that leverages the advantages of the DNNs for data self-expression and task-specific predictions, to improve image classification.

Classification Image Classification

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

1 code implementation ICLR 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo

This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.

Data Free Quantization

Learning Optical Flow with Adaptive Graph Reasoning

1 code implementation8 Feb 2022 Ao Luo, Fan Yang, Kunming Luo, Xin Li, Haoqiang Fan, Shuaicheng Liu

Our key idea is to decouple the context reasoning from the matching procedure, and exploit scene information to effectively assist motion estimation by learning to reason over the adaptive graph.

Motion Estimation Optical Flow Estimation +1

A comprehensive benchmark analysis for sand dust image reconstruction

no code implementations7 Feb 2022 Yazhong Si, Fan Yang, Ya Guo, Wei zhang, Yipu Yang

In this paper, we presented a comprehensive perceptual study and analysis of real-world sand dust images, then constructed a Sand-dust Image Reconstruction Benchmark (SIRB) for training Convolutional Neural Networks (CNNs) and evaluating algorithms performance.

Image Enhancement Image Reconstruction

Do Smart Glasses Dream of Sentimental Visions? Deep Emotionship Analysis for Eyewear Devices

1 code implementation24 Jan 2022 Yingying Zhao, Yuhu Chang, Yutian Lu, Yujiang Wang, Mingzhi Dong, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

Experimental studies with 20 participants demonstrate that, thanks to the emotionship awareness, EMOShip not only achieves superior emotion recognition accuracy over existing methods (80. 2% vs. 69. 4%), but also provides a valuable understanding of the cause of emotions.

Emotion Recognition

BBA-net: A bi-branch attention network for crowd counting

no code implementations22 Jan 2022 Yi Hou, Chengyang Li, Fan Yang, Cong Ma, Liping Zhu, Yuan Li, Huizhu Jia, Xiaodong Xie

Our method can integrate the pedestrian's head and body information to enhance the feature expression ability of the density map.

Crowd Counting

Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification

1 code implementation CVPR 2022 Zongbo Han, Fan Yang, Junzhou Huang, Changqing Zhang, Jianhua Yao

To the best of our knowledge, this is the first work to jointly model both feature and modality variation for different samples to provide trustworthy fusion in multi-modal classification.

Informativeness Medical Diagnosis +1

SC2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud Registration

no code implementations CVPR 2022 Zhi Chen, Kun Sun, Fan Yang, Wenbing Tao

In this paper, we present a second order spatial compatibility (SC^2) measure based method for efficient and robust point cloud registration (PCR), called SC^2-PCR.

Point Cloud Registration

DetarNet: Decoupling Translation and Rotation by Siamese Network for Point Cloud Registration

1 code implementation28 Dec 2021 Zhi Chen, Fan Yang, Wenbing Tao

In this paper, we propose a neural network named DetarNet to decouple the translation $t$ and rotation $R$, so as to overcome the performance degradation due to their mutual interference in point cloud registration.

Point Cloud Registration Translation

Neural Born Iteration Method For Solving Inverse Scattering Problems: 2D Cases

no code implementations18 Dec 2021 Tao Shan, Zhichao Lin, Xiaoqian Song, Maokun Li, Fan Yang, Zhensheng Xu

In this paper, we propose the neural Born iterative method (NeuralBIM) for solving 2D inverse scattering problems (ISPs) by drawing on the scheme of physics-informed supervised residual learning (PhiSRL) to emulate the computing process of the traditional Born iterative method (TBIM).

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

3 code implementations17 Dec 2021 Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Image Classification Instance Segmentation +6

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

1 code implementation24 Nov 2021 Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Image Generation Text-to-Video Generation +2

Towards Privacy-Preserving Affect Recognition: A Two-Level Deep Learning Architecture

no code implementations14 Nov 2021 Jimiama M. Mase, Natalie Leesakul, Fan Yang, Grazziela P. Figueredo, Mercedes Torres Torres

Possible solutions to protect the privacy of users and avoid misuse of their identities are to: (1) extract anonymised facial features, namely action units (AU) from a database of images, discard the images and use AUs for processing and training, and (2) federated learning (FL) i. e. process raw images in users' local machines (local processing) and send the locally trained models to the main processing machine for aggregation (central processing).

Federated Learning Privacy Preserving +1

Defense Against Explanation Manipulation

no code implementations8 Nov 2021 Ruixiang Tang, Ninghao Liu, Fan Yang, Na Zou, Xia Hu

Explainable machine learning attracts increasing attention as it improves transparency of models, which is helpful for machine learning to be trusted in real applications.

Adversarial Attack BIG-bench Machine Learning

Causal-TGAN: Causally-Aware Synthetic Tabular Data Generative Adversarial Network

no code implementations29 Sep 2021 Bingyang Wen, Yupeng Cao, Fan Yang, Koduvayur Subbalakshmi, Rajarathnam Chandramouli

The flexibility of this architecture is its capability to support different types of expert knowledge (e. g., complete or partial) about the causal nature of the underlying phenomenon.

Generative Adversarial Network Image Generation

EXACT: Scalable Graph Neural Networks Training via Extreme Activation Compression

no code implementations ICLR 2022 Zirui Liu, Kaixiong Zhou, Fan Yang, Li Li, Rui Chen, Xia Hu

Based on the implementation, we propose a memory-efficient framework called ``EXACT'', which for the first time demonstrate the potential and evaluate the feasibility of training GNNs with compressed activations.

Graph Learning

Generalized Demographic Parity for Group Fairness

1 code implementation ICLR 2022 Zhimeng Jiang, Xiaotian Han, Chao Fan, Fan Yang, Ali Mostafavi, Xia Hu

We show the understanding of GDP from the probability perspective and theoretically reveal the connection between GDP regularizer and adversarial debiasing.

Attribute Fairness

LODE: Deep Local Deblurring and A New Benchmark

1 code implementation19 Sep 2021 Zerun Wang, Liuyu Xiang, Fan Yang, Jinzhao Qian, Jie Hu, Haidong Huang, Jungong Han, Yuchen Guo, Guiguang Ding

While recent deep deblurring algorithms have achieved remarkable progress, most existing methods focus on the global deblurring problem, where the image blur mostly arises from severe camera shake.

Deblurring

Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

2 code implementations9 Sep 2021 Xing Cheng, Hezheng Lin, Xiangyu Wu, Fan Yang, Dong Shen

In this paper, we propose a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (CAMoE) and a novel Dual Softmax Loss (DSL) to solve the two heterogeneity.

Ranked #8 on Video Retrieval on MSVD (using extra training data)

Retrieval Text Retrieval +1

LinEasyBO: Scalable Bayesian Optimization Approach for Analog Circuit Synthesis via One-Dimensional Subspaces

no code implementations1 Sep 2021 Shuhan Zhang, Fan Yang, Changhao Yan, Dian Zhou, Xuan Zeng

A large body of literature has proved that the Bayesian optimization framework is especially efficient and effective in analog circuit synthesis.

Bayesian Optimization

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

no code implementations30 Aug 2021 Yang Wu, Dingheng Wang, Xiaotong Lu, Fan Yang, Guoqi Li, Weisheng Dong, Jianbo Shi

Visual recognition is currently one of the most important and active research areas in computer vision, pattern recognition, and even the general field of artificial intelligence.

Adaptive Label Smoothing To Regularize Large-Scale Graph Training

no code implementations30 Aug 2021 Kaixiong Zhou, Ninghao Liu, Fan Yang, Zirui Liu, Rui Chen, Li Li, Soo-Hyun Choi, Xia Hu

Graph neural networks (GNNs), which learn the node representations by recursively aggregating information from its neighbors, have become a predominant computational tool in many domains.

Node Clustering

Actuarial-consistency and two-step actuarial valuations: a new paradigm to insurance valuation

no code implementations30 Aug 2021 Karim Barigou, Daniël Linders, Fan Yang

This paper introduces new valuation schemes called actuarial-consistent valuations for insurance liabilities which depend on both financial and actuarial risks, which imposes that all actuarial risks are priced via standard actuarial principles.

RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting

no code implementations ICCV 2021 Jiachen Li, Fan Yang, Hengbo Ma, Srikanth Malla, Masayoshi Tomizuka, Chiho Choi

Motion forecasting plays a significant role in various domains (e. g., autonomous driving, human-robot interaction), which aims to predict future motion sequences given a set of historical observations.

Motion Forecasting Trajectory Prediction

Opinion Prediction with User Fingerprinting

1 code implementation RANLP 2021 Kishore Tumarada, Yifan Zhang, Fan Yang, Eduard Dragut, Omprakash Gnawali, Arjun Mukherjee

Experimental results show novel insights that were previously unknown such as better predictions for an increase in dynamic history length, the impact of the nature of the article on performance, thereby laying the foundation for further research.

Sentiment Analysis Time Series +1

An Efficient Asynchronous Batch Bayesian Optimization Approach for Analog Circuit Synthesis

no code implementations28 Jun 2021 Shuhan Zhang, Fan Yang, Dian Zhou, Xuan Zeng

A new strategy is proposed to better balance the exploration and exploitation and guarantee the diversity of the query points.

Bayesian Optimization

An Efficient Batch Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multi-objective Acquisition Ensemble

no code implementations28 Jun 2021 Shuhan Zhang, Fan Yang, Changhao Yan, Dian Zhou, Xuan Zeng

After achieving the first feasible point, we favor the feasible region by adopting a specially designed penalization term to the acquisition function ensemble.

Bayesian Optimization valid

A Scalable 256-Elements E-Band Phased-Array Transceiver for Broadband Communication

no code implementations20 Jun 2021 Xu Li, Wenyao Zhai, Morris Repeta, Hua Cai, Tyler Ross, Kimia Ansari, Sam Tiller, Hari Krishna Pothula, Dong Liang, Fan Yang, Yibo Lyu, Songlin Shuai, Guangjian Wang, Wen Tong

For E-band wireless communications, a high gain steerable antenna with sub-arrays is desired to reduce the implementation complexity.

Probabilistic Model Distillation for Semantic Correspondence

1 code implementation CVPR 2021 Xin Li, Deng-Ping Fan, Fan Yang, Ao Luo, Hong Cheng, Zicheng Liu

We address this problem with the use of a novel Probabilistic Model Distillation (PMD) approach which transfers knowledge learned by a probabilistic teacher model on synthetic data to a static student model with the use of unlabeled real image pairs.

Representation Learning Semantic correspondence

Model-Based Counterfactual Synthesizer for Interpretation

no code implementations16 Jun 2021 Fan Yang, Sahan Suresh Alva, Jiahao Chen, Xia Hu

To address these limitations, we propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models.

counterfactual Inductive Bias

From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding

no code implementations ACL 2021 Shan Wu, Bo Chen, Chunlei Xin, Xianpei Han, Le Sun, Weipeng Zhang, Jiansong Chen, Fan Yang, Xunliang Cai

During synchronous decoding: the utterance paraphrasing is constrained by the structure of the logical form, therefore the canonical utterance can be paraphrased controlledly; the semantic decoding is guided by the semantics of the canonical utterance, therefore its logical form can be generated unsupervisedly.

Unsupervised semantic parsing

MST: Masked Self-Supervised Transformer for Visual Representation

no code implementations NeurIPS 2021 Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.

Language Modelling Masked Language Modeling +3

CAT: Cross Attention in Vision Transformer

1 code implementation10 Jun 2021 Hezheng Lin, Xing Cheng, Xiangyu Wu, Fan Yang, Dong Shen, Zhongyuan Wang, Qing Song, Wei Yuan

In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.

Calibrating multi-dimensional complex ODE from noisy data via deep neural networks

no code implementations7 Jun 2021 Kexuan Li, Fangfang Wang, Ruiqi Liu, Fan Yang, Zuofeng Shang

Our method is able to recover the ODE system without being subject to the curse of dimensionality and complicated ODE structure.

ModelPS: An Interactive and Collaborative Platform for Editing Pre-trained Models at Scale

1 code implementation18 May 2021 Yuanming Li, Huaizheng Zhang, Shanshan Jiang, Fan Yang, Yonggang Wen, Yong Luo

AI engineering has emerged as a crucial discipline to democratize deep neural network (DNN) models among software developers with a diverse background.

Model Editing

CT-Net: Complementary Transfering Network for Garment Transfer with Arbitrary Geometric Changes

no code implementations CVPR 2021 Fan Yang, Guosheng Lin

Garment transfer shows great potential in realistic applications with the goal of transfering outfits across different people images.

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

1 code implementation30 Apr 2021 Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan

Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation.

Ranked #16 on Text-to-Video Generation on MSR-VTT (CLIPSIM metric)

Text-to-Video Generation Video Generation

Mutual Graph Learning for Camouflaged Object Detection

1 code implementation CVPR 2021 Qiang Zhai, Xin Li, Fan Yang, Chenglizhao Chen, Hong Cheng, Deng-Ping Fan

Automatically detecting/segmenting object(s) that blend in with their surroundings is difficult for current models.

Graph Learning Object +2

Superresolving second-order correlation imaging using synthesized colored noise speckles

no code implementations11 Feb 2021 Zheng Li, Xiaoyu Nie, Fan Yang, Xiangpei Liu, Dongyu Liu, Xiaolong Dong, Xingchen Zhao, Tao Peng, M. Suhail Zubairy, Marlan O. Scully

We present a novel method to synthesize non-trivial speckles that can enable superresolving second-order correlation imaging.

Optics Image and Video Processing

Cascade Network with Guided Loss and Hybrid Attention for Finding Good Correspondences

1 code implementation31 Jan 2021 Zhi Chen, Fan Yang, Wenbing Tao

We then propose a hybrid attention block to extract feature, which integrates the Bayesian attentive context normalization (BACN) and channel-wise attention (CA).

Generative Counterfactuals for Neural Networks via Attribute-Informed Perturbation

no code implementations18 Jan 2021 Fan Yang, Ninghao Liu, Mengnan Du, Xia Hu

With the wide use of deep neural networks (DNN), model interpretability has become a critical concern, since explainable decisions are preferred in high-stake scenarios.

Attribute

Possible evidence of hydrogen emission in the first-overtone and multi-mode RR Lyrae variables

no code implementations24 Dec 2020 Xiao-Wei Duan, Xiao-Dian Chen, Li-Cai Deng, Fan Yang, Chao Liu, Anupam Bhardwaj, Hua-Wei Zhang

The nature of shock waves in non-fundamental mode RR Lyrae stars remains a mystery because of limited spectroscopic observations.

Solar and Stellar Astrophysics

A Polynomial Roth Theorem for Corners in Finite Fields

no code implementations21 Dec 2020 Rui Han, Michael T Lacey, Fan Yang

We prove a Roth type theorem for polynomial corners in the finite field setting.

Classical Analysis and ODEs Combinatorics Number Theory

Pattern-aware Data Augmentation for Query Rewriting in Voice Assistant Systems

no code implementations21 Dec 2020 Yunmo Chen, Sixing Lu, Fan Yang, Xiaojiang Huang, Xing Fan, Chenlei Guo

Query rewriting (QR) systems are widely used to reduce the friction caused by errors in a spoken language understanding pipeline.

Data Augmentation Friction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.