Search Results for author: Soroush Vosoughi

Found 66 papers, 19 papers with code

TWEETSPIN: Fine-grained Propaganda Detection in Social Media Using Multi-View Representations

no code implementations • NAACL 2022 • Prashanth Vijayaraghavan, Soroush Vosoughi

Our model relies on multi-view representations of the input tweet data to (a) extract different aspects of the input text including the context, entities, their relationships, and external knowledge; (b) model their mutual interplay; and (c) effectively speed up the learning process by requiring fewer training examples.

Implicit Relations Logical Fallacies +1

Paper
Add Code

Aligning Generative Language Models with Human Values

no code implementations • Findings (NAACL) 2022 • Ruibo Liu, Ge Zhang, Xinyu Feng, Soroush Vosoughi

Although current large-scale generative language models (LMs) can show impressive insights about factual knowledge, they do not exhibit similar success with respect to human values judgements (e. g., whether or not the generations of an LM are moral).

Text Generation Transfer Learning

Paper
Add Code

DartmouthCS at SemEval-2022 Task 8: Predicting Multilingual News Article Similarity with Meta-Information and Translation

no code implementations • SemEval (NAACL) 2022 • Joseph Hajjar, Weicheng Ma, Soroush Vosoughi

This paper presents our approach for tackling SemEval-2022 Task 8: Multilingual News Article Similarity.

Paper
Add Code

Dartmouth at SemEval-2022 Task 6: Detection of Sarcasm

no code implementations • SemEval (NAACL) 2022 • Rishik Lad, Weicheng Ma, Soroush Vosoughi

This paper introduces the result of Team Dartmouth’s experiments on each of the five subtasks for the detection of sarcasm in English and Arabic tweets.

Data Augmentation Sarcasm Detection

Paper
Add Code

Multi-resolution Annotations for Emoji Prediction

no code implementations • EMNLP 2020 • Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi

The lack of multi-label and aspect-level emoji prediction datasets is one of the bottlenecks for this task.

Multi-class Classification Natural Language Understanding

Paper
Add Code

Disordered-DABS: A Benchmark for Dynamic Aspect-Based Summarization in Disordered Texts

no code implementations • 16 Feb 2024 • Xiaobo Guo, Soroush Vosoughi

Aspect-based summarization has seen significant advancements, especially in structured text.

Paper
Add Code

Proto-lm: A Prototypical Network-Based Framework for Built-in Interpretability in Large Language Models

1 code implementation • 3 Nov 2023 • Sean Xie, Soroush Vosoughi, Saeed Hassanpour

Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP), but their lack of interpretability has been a major concern.

Paper
Code

Improving Representation Learning for Histopathologic Images with Cluster Constraints

1 code implementation • ICCV 2023 • Weiyi Wu, Chongyang Gao, Joseph DiPalma, Soroush Vosoughi, Saeed Hassanpour

This framework aims for transferable representation learning and semantically meaningful clustering by synergizing invariance loss and clustering loss in WSI analysis.

Clustering Representation Learning +1

Paper
Code

Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction

1 code implementation • 5 Oct 2023 • Yiren Jian, Tingkai Liu, Yunzhe Tao, Chunhui Zhang, Soroush Vosoughi, Hongxia Yang

Our experimental findings demonstrate that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.

Representation Learning Text Generation

Paper
Code

Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

1 code implementation • NeurIPS 2023 • Yiren Jian, Chongyang Gao, Soroush Vosoughi

We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training.

Paper
Code

Joint Latent Topic Discovery and Expectation Modeling for Financial Markets

no code implementations • 1 Jun 2023 • Lili Wang, Chenghan Huang, Chongyang Gao, Weicheng Ma, Soroush Vosoughi

In the pursuit of accurate and scalable quantitative methods for financial market analysis, the focus has shifted from individual stock models to those capturing interrelations between companies and their stocks.

Paper
Add Code

Graph-Level Embedding for Time-Evolving Graphs

no code implementations • 1 Jun 2023 • Lili Wang, Chenghan Huang, Weicheng Ma, Xinyuan Cao, Soroush Vosoughi

We evaluate our proposed model on five publicly available datasets for the task of temporal graph similarity ranking, and our model outperforms baseline methods.

Anomaly Detection Graph Representation Learning +4

Paper
Add Code

Training Socially Aligned Language Models on Simulated Social Interactions

1 code implementation • 26 May 2023 • Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi

Social alignment in AI systems aims to ensure that these models behave according to established societal values.

327

Paper
Code

Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task

1 code implementation • 13 Feb 2023 • Yiren Jian, Chongyang Gao, Chen Zeng, Yunjie Zhao, Soroush Vosoughi

Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research.

Transfer Learning

Paper
Code

Capturing Topic Framing via Masked Language Modeling

no code implementations • 7 Feb 2023 • Xiaobo Guo, Weicheng Ma, Soroush Vosoughi

Differential framing of issues can lead to divergent world views on important issues.

Language Modelling Masked Language Modeling

Paper
Add Code

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

no code implementations • 1 Jan 2023 • Ruibo Liu, Chenyan Jia, Ge Zhang, Ziyu Zhuang, Tony X Liu, Soroush Vosoughi

We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Mind's Eye: Grounded Language Model Reasoning through Simulation

no code implementations • 11 Oct 2022 • Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai

By training solely on written text, current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning.

Language Modelling

Paper
Add Code

Language Models are Multilingual Chain-of-Thought Reasoners

2 code implementations • 6 Oct 2022 • Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei

Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.

GSM8K Math

166

Paper
Code

Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings

1 code implementation • 20 Sep 2022 • Yiren Jian, Chongyang Gao, Soroush Vosoughi

This indicates that Transformer models are able to generalize better by doing a similar task (i. e., clustering) with unpaired examples from different modalities in a multi-task fashion.

Clustering Contrastive Learning +3

Paper
Code

Robin: A Novel Online Suicidal Text Corpus of Substantial Breadth and Scale

no code implementations • 13 Sep 2022 • Daniel DiPietro, Vivek Hazari, Soroush Vosoughi

Suicide is a major public health crisis.

Paper
Add Code

Interpretation Quality Score for Measuring the Quality of interpretability methods

no code implementations • 24 May 2022 • Yuansheng Xie, Soroush Vosoughi, Saeed Hassanpour

Machine learning (ML) models have been applied to a wide range of natural language processing (NLP) tasks in recent years.

Paper
Add Code

Contrastive Learning for Prompt-Based Few-Shot Language Learners

1 code implementation • NAACL 2022 • Yiren Jian, Chongyang Gao, Soroush Vosoughi

Following this line of work, we present a contrastive learning framework that clusters inputs from the same class for better generality of models trained with only limited examples.

Contrastive Learning In-Context Learning +2

Paper
Code

Embedding Hallucination for Few-Shot Language Fine-tuning

1 code implementation • NAACL 2022 • Yiren Jian, Chongyang Gao, Soroush Vosoughi

Few-shot language learners adapt knowledge from a pre-trained model to recognize novel classes from a few-labeled sentences.

Data Augmentation Hallucination +1

Paper
Code

Non-Parallel Text Style Transfer with Self-Parallel Supervision

1 code implementation • ICLR 2022 • Ruibo Liu, Chongyang Gao, Chenyan Jia, Guangxuan Xu, Soroush Vosoughi

The performance of existing text style transfer models is severely limited by the non-parallel datasets on which the models are trained.

Imitation Learning Style Transfer +1

Paper
Code

Knowledge Infused Decoding

1 code implementation • ICLR 2022 • Ruibo Liu, Guoqing Zheng, Shashank Gupta, Radhika Gaonkar, Chongyang Gao, Soroush Vosoughi, Milad Shokouhi, Ahmed Hassan Awadallah

Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks.

Ranked #2 on Question Answering on KILT: ELI5

counterfactual Question Answering +1

Paper
Code

Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning

no code implementations • 30 Mar 2022 • Sean Xie, Soroush Vosoughi, Saeed Hassanpour

Artificial intelligence, particularly through recent advancements in deep learning, has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision.

Decision Making reinforcement-learning +1

Paper
Add Code

EnCBP: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in English

no code implementations • Findings (ACL) 2022 • Weicheng Ma, Samiha Datta, Lili Wang, Soroush Vosoughi

While cultural backgrounds have been shown to affect linguistic expressions, existing natural language processing (NLP) research on culture modeling is overly coarse-grained and does not examine cultural differences among speakers of the same language.

Cultural Vocal Bursts Intensity Prediction Language Modelling +5

Paper
Add Code

Emotion-based Modeling of Mental Disorders on Social Media

no code implementations • 24 Jan 2022 • Xiaobo Guo, Yaojia Sun, Soroush Vosoughi

Our proposed model is different from other work in this area in that our model is based entirely on the emotional states, and the transition between these states of users on Reddit, whereas prior work is typically based on content-based representations (e. g., n-grams, language model embeddings, etc).

Language Modelling

Paper
Add Code

Graph Embedding via Diffusion-Wavelets-Based Node Feature Distribution Characterization

no code implementations • 14 Sep 2021 • Lili Wang, Chenghan Huang, Weicheng Ma, Xinyuan Cao, Soroush Vosoughi

Recent years have seen a rise in the development of representational learning methods for graph data.

Graph Embedding Representation Learning

Paper
Add Code

Embedding Node Structural Role Identity Using Stress Majorization

no code implementations • 14 Sep 2021 • Lili Wang, Chenghan Huang, Weicheng Ma, Ying Lu, Soroush Vosoughi

In this paper, we present a novel and flexible framework using stress majorization, to transform the high-dimensional role identities in networks directly (without approximation or indirect modeling) to a low-dimensional embedding space.

Node Classification

Paper
Add Code

GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks

no code implementations • EMNLP 2021 • Weicheng Ma, Renze Lou, Kai Zhang, Lili Wang, Soroush Vosoughi

Compared to AUTOSEM, a strong baseline method, GradTS improves the performance of MT-DNN with a bert-base-cased backend model, from 0. 33% to 17. 93% on 8 natural language understanding (NLU) tasks in the GLUE benchmarks.

Multi-Task Learning Natural Language Understanding

Paper
Add Code

Language Model Augmented Relevance Score

no code implementations • ACL 2021 • Ruibo Liu, Jason Wei, Soroush Vosoughi

Although automated metrics are commonly used to evaluate NLG systems, they often correlate poorly with human judgements.

Language Modelling nlg evaluation

Paper
Add Code

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

no code implementations • ACL 2021 • Weicheng Ma, Kai Zhang, Renze Lou, Lili Wang, Soroush Vosoughi

Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments.

XLM-R

Paper
Add Code

Modulating Language Models with Emotions

no code implementations • Findings (ACL) 2021 • Ruibo Liu, Jason Wei, Chenyan Jia, Soroush Vosoughi

Generating context-aware language that embodies diverse emotions is an important step towards building empathetic NLP systems.

Response Generation

Paper
Add Code

Embedding Heterogeneous Networks into Hyperbolic Space Without Meta-path

no code implementations • 18 Jun 2021 • Lili Wang, Chongyang Gao, Chenghan Huang, Ruibo Liu, Weicheng Ma, Soroush Vosoughi

A common type of network is the heterogeneous network, where the nodes (and edges) can be of different types.

Anatomy Link Prediction +1

Paper
Add Code

Linguistic Complexity Loss in Text-Based Therapy

no code implementations • NAACL 2021 • Jason Wei, Kelly Finn, Emma Templeton, Thalia Wheatley, Soroush Vosoughi

The recent advent of online text-based therapy presents a new opportunity to analyze the complexity loss paradox in a novel operationalization: linguistic complexity loss in text-based therapy conversations.

Paper
Add Code

A Survey of Data Augmentation Approaches for NLP

1 code implementation • Findings (ACL) 2021 • Steven Y. Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, Eduard Hovy

In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner.

Data Augmentation

1,549

Paper
Code

Mitigating Political Bias in Language Models Through Reinforced Calibration

no code implementations • 30 Apr 2021 • Ruibo Liu, Chenyan Jia, Jason Wei, Guangxuan Xu, Lili Wang, Soroush Vosoughi

Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real-world settings.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

BigGreen at SemEval-2021 Task 1: Lexical Complexity Prediction with Assembly Models

1 code implementation • SEMEVAL 2021 • Aadil Islam, Weicheng Ma, Soroush Vosoughi

This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context.

Feature Engineering Lexical Complexity Prediction

Paper
Code

Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic

1 code implementation • SEMEVAL 2021 • Yakoob Khan, Weicheng Ma, Soroush Vosoughi

This paper describes our approach to the Toxic Spans Detection problem (SemEval-2021 Task 5).

Data Augmentation Toxic Spans Detection

Paper
Code

Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

1 code implementation • NAACL 2021 • Jason Wei, Chengyu Huang, Soroush Vosoughi, Yu Cheng, Shiqi Xu

Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category.

Data Augmentation Few-Shot Text Classification +2

Paper
Code

Feature Selection for Multivariate Time Series via Network Pruning

1 code implementation • 11 Feb 2021 • Kang Gu, Soroush Vosoughi, Temiloluwa Prioleau

In recent years, there has been an ever increasing amount of multivariate time series (MTS) data in various domains, typically generated by a large family of sensors such as wearable devices.

feature selection Network Pruning +2

Paper
Code

Text Augmentation in a Multi-Task View

no code implementations • EACL 2021 • Jason Wei, Chengyu Huang, Shiqi Xu, Soroush Vosoughi

Traditional data augmentation aims to increase the coverage of the input distribution by generating augmented examples that strongly resemble original samples in an online fashion where augmented examples dominate training.

Text Augmentation text-classification +1

Paper
Add Code

Political Depolarization of News Articles Using Attribute-aware Word Embeddings

no code implementations • 5 Jan 2021 • Ruibo Liu, Lili Wang, Chenyan Jia, Soroush Vosoughi

To detect polar words, we train a multi-attribute-aware word embedding model that is aware of ideology and topics on 360k full-length media articles.

Attribute Text Generation +1

Paper
Add Code

Social media data reveals signal for public consumer perceptions

no code implementations • 26 Dec 2020 • Neeti Pokhriyal, Abenezer Dara, Benjamin Valentino, Soroush Vosoughi

By using decadal data (2008-2019) from Reddit, we show that both monthly and daily estimates of CCI can, indeed, be reliably estimated at least several months in advance, and that our model estimates are far superior to those generated by the existing methods.

Paper
Add Code

Multi-modal Identification of State-Sponsored Propaganda on Social Media

no code implementations • 24 Dec 2020 • Xiaobo Guo, Soroush Vosoughi

The prevalence of state-sponsored propaganda on the Internet has become a cause for concern in the recent years.

Paper
Add Code

Big Green at WNUT 2020 Shared Task-1: Relation Extraction as Contextualized Sequence Classification

no code implementations • EMNLP (WNUT) 2020 • Chris Miller, Soroush Vosoughi

Relation and event extraction is an important task in natural language processing.

Event Extraction General Classification +3

Paper
Add Code

Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet Classification Using BERT

no code implementations • EMNLP (WNUT) 2020 • Dylan Whang, Soroush Vosoughi

We compared its performance to a suite of machine learning models.

General Classification Task 2

Paper
Add Code

Improvements and Extensions on Metaphor Detection

no code implementations • ACL (unimplicit) 2021 • Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi

Finally, we clean up the improper or outdated annotations in one of the MD benchmark datasets and re-benchmark it with our Transformer-based model.

Natural Language Understanding

Paper
Add Code

An Empirical Survey of Unsupervised Text Representation Methods on Twitter Data

no code implementations • EMNLP (WNUT) 2020 • Lili Wang, Chongyang Gao, Jason Wei, Weicheng Ma, Ruibo Liu, Soroush Vosoughi

The field of NLP has seen unprecedented achievements in recent years.

Clustering Text Clustering

Paper
Add Code

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

no code implementations • EMNLP 2020 • Ruibo Liu, Guangxuan Xu, Chenyan Jia, Weicheng Ma, Lili Wang, Soroush Vosoughi

For instance, Data Boost improves F1 for the three tasks by 8. 7% on average when given only 10% of the whole data for training.

reinforcement-learning Reinforcement Learning (RL) +3

Paper
Add Code

Enhanced Offensive Language Detection Through Data Augmentation

no code implementations • 5 Dec 2020 • Ruibo Liu, Guangxuan Xu, Soroush Vosoughi

In this work, we present Dager (Data Augmenter), a generation-based data augmentation method, that improves the performance of classification on imbalanced and low-resource data such as the offensive language dataset.

Data Augmentation Task 2

Paper
Add Code

Embedding Node Structural Role Identity into Hyperbolic Space

no code implementations • 3 Nov 2020 • Lili Wang, Ying Lu, Chenghan Huang, Soroush Vosoughi

However, the work on network embedding in hyperbolic space has been focused on microscopic node embedding.

Network Embedding

Paper
Add Code

Towards Improved Model Design for Authorship Identification: A Survey on Writing Style Understanding

no code implementations • 30 Sep 2020 • Weicheng Ma, Ruibo Liu, Li-Li Wang, Soroush Vosoughi

While other tasks based on linguistic style understanding benefit from deep learning methods, these methods have not behaved as well as traditional machine learning methods in many authorship-based tasks.

BIG-bench Machine Learning Natural Language Understanding

Paper
Add Code

Emoji Prediction: Extensions and Benchmarking

1 code implementation • 14 Jul 2020 • Weicheng Ma, Ruibo Liu, Lili Wang, Soroush Vosoughi

In this paper, we extend the existing setting of the emoji prediction task to include a richer set of emojis and to allow multi-label classification on the task.

Benchmarking Multi-Label Classification

Paper
Code

Query-Free Adversarial Transfer via Undertrained Surrogates

no code implementations • 1 Jul 2020 • Chris Miller, Soroush Vosoughi

Deep neural networks are vulnerable to adversarial examples -- minor perturbations added to a model's input which cause the model to output an incorrect prediction.

Adversarial Attack

Paper
Add Code

Salienteye: Maximizing Engagement While Maintaining Artistic Style on Instagram Using Deep Neural Networks

no code implementations • 13 Jun 2020 • Lili Wang, Ruibo Liu, Soroush Vosoughi

Once trained on their accounts, users can have new photos sorted based on predicted engagement and style similarity to their previous work, thus enabling them to upload photos that not only have the potential to maximize engagement from their followers but also maintain their style of photography.

Object Recognition Transfer Learning

Paper
Add Code

What Are People Asking About COVID-19? A Question Classification Dataset

2 code implementations • ACL 2020 • Jerry Wei, Chengyu Huang, Soroush Vosoughi, Jason Wei

We present COVID-Q, a set of 1, 690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters.

Clustering General Classification

Paper
Code

Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning

no code implementations • ACL 2017 • Prashanth Vijayaraghavan, Soroush Vosoughi, Deb Roy

In this paper, we present a demographic classifier for gender, age, political orientation and location on Twitter.

Classification General Classification +1

Paper
Add Code

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

no code implementations • 26 Jul 2016 • Soroush Vosoughi, Prashanth Vijayaraghavan, Deb Roy

The vector representations generated by our model are generic, and hence can be applied to a variety of tasks.

Paper
Add Code

DeepStance at SemEval-2016 Task 6: Detecting Stance in Tweets Using Character and Word-Level CNNs

no code implementations • SEMEVAL 2016 • Prashanth Vijayaraghavan, Ivan Sysoev, Soroush Vosoughi, Deb Roy

This paper describes our approach for the Detecting Stance in Tweets task (SemEval-2016 Task 6).

Data Augmentation Text Categorization

Paper
Add Code

Automatic Detection and Categorization of Election-Related Tweets

no code implementations • 17 May 2016 • Prashanth Vijayaraghavan, Soroush Vosoughi, Deb Roy

With the rise in popularity of public social media and micro-blogging services, most notably Twitter, the people have found a venue to hear and be heard by their peers without an intermediary.

Paper
Add Code

A Semi-automatic Method for Efficient Detection of Stories on Social Media

no code implementations • 17 May 2016 • Soroush Vosoughi, Deb Roy

In this paper, we present a novel semi-automatic tool that enables users to efficiently identify and track stories about real-world events on Twitter.

Paper
Add Code

Digital Stylometry: Linking Profiles Across Social Networks

no code implementations • 17 May 2016 • Soroush Vosoughi, Helen Zhou, Deb Roy

There is an ever growing number of users with accounts on multiple social media and networking sites.

Paper
Add Code

Enhanced Twitter Sentiment Classification Using Contextual Information

no code implementations • WS 2015 • Soroush Vosoughi, Helen Zhou, Deb Roy

This combined classifier outperforms the purely linguistic classifier, showing that integrating the rich contextual information available on Twitter into sentiment classification is a promising direction of research.

Classification General Classification +2

Paper
Add Code

Tweet Acts: A Speech Act Classifier for Twitter

no code implementations • 17 May 2016 • Soroush Vosoughi, Deb Roy

We created a taxonomy of six speech acts for Twitter and proposed a set of semantic and syntactic features.

General Classification Multi-class Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.