UFO: A UI-Focused Agent for Windows OS Interaction

microsoft/UFO 8 Feb 2024

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

4,089
0.40 stars / hour

RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems

microsoft/recai 11 Mar 2024

This paper introduces RecAI, a practical toolkit designed to augment or even revolutionize recommender systems with the advanced capabilities of Large Language Models (LLMs).

Recommendation Systems

319
0.39 stars / hour

FinLangNet: A Novel Deep Learning Framework for Credit Risk Prediction Using Linguistic Analogy in Financial Data

leiyu0210/finlangnet 19 Apr 2024

Our research demonstrates that FinLangNet surpasses traditional statistical methods in predicting credit risk and that its integration with these methods enhances credit card fraud prediction models, achieving a significant improvement of over 1. 5 points in the Kolmogorov-Smirnov metric.

41
0.39 stars / hour

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

openbmb/omnilmm 18 Mar 2024

To address the challenges, we present LLaVA-UHD, a large multimodal model that can efficiently perceive images in any aspect ratio and high resolution.

1,312
0.38 stars / hour

MAexp: A Generic Platform for RL-based Multi-Agent Exploration

duangzhu/maexp 19 Apr 2024

The sim-to-real gap poses a significant challenge in RL-based multi-agent exploration due to scene quantization and action discretization.

Multi-agent Reinforcement Learning Quantization

35
0.37 stars / hour

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Leeroo-AI/mergoo 12 Mar 2024

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge.

Arithmetic Reasoning Code Generation +6

233
0.36 stars / hour

Code Llama: Open Foundation Models for Code

facebookresearch/codellama 24 Aug 2023

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

16k Code Generation +1

14,964
0.35 stars / hour

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

bytedance/MoMA 8 Apr 2024

This approach effectively synergizes reference image and text prompt information to produce valuable image features, facilitating an image diffusion model.

Image-to-Image Translation Language Modelling +1

29
0.34 stars / hour

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

FoundationVision/Groma 19 Apr 2024

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.

Language Modelling Large Language Model +2

131
0.34 stars / hour

LOHO: Latent Optimization of Hairstyles via Orthogonalization

dukebw/LOHO CVPR 2021

Therefore, we propose Latent Optimization of Hairstyles via Orthogonalization (LOHO), an optimization-based approach using GAN inversion to infill missing hair structure details in latent space during hairstyle transfer.

SSIM

196
0.33 stars / hour