Trending Research

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

apple/corenet • • 24 Apr 2024

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings.

Contrastive Learning

4,113

14.04 stars / hour

Paper
Code

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON • • 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Ranked #1 on Virtual Try-on on VITON-HD

Virtual Try-on

684

2.60 stars / hour

Paper
Code

X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Molecular Design

ericlbuehler/mistral.rs • 11 Feb 2024

Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks.

graph construction Knowledge Graphs +3

347

2.23 stars / hour

Paper
Code

OpenVoice: Versatile Instant Voice Cloning

myshell-ai/openvoice • • 3 Dec 2023

The voice styles are not directly copied from and constrained by the style of the reference speaker.

Voice Cloning

19,450

1.51 stars / hour

Paper
Code

FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent

dcharatan/flowmap • • 23 Apr 2024

This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence.

Novel View Synthesis Optical Flow Estimation +1

317

1.37 stars / hour

Paper
Code

Make Your LLM Fully Utilize the Context

microsoft/FILM • • 25 Apr 2024

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge.

4k Information Retrieval +1

1.31 stars / hour

Paper
Code

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

tencentarc/instantmesh • • 10 Apr 2024

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

1,514

1.18 stars / hour

Paper
Code

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner • • NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,172

0.95 stars / hour

Paper
Code

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

JackAILab/ConsistentID • • 25 Apr 2024

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

0.91 stars / hour

Paper
Code

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

id-animator/id-animator • 23 Apr 2024

Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation.

Attribute Video Generation

0.89 stars / hour

Paper
Code