[Paper Review] Stable Diffusion & SDXL
September 15, 2024
SDXL extends Stable Diffusion with a larger U-Net backbone, multi-scale generation, and flexible text conditioning, enabling high-resolution, semantically ri...
September 15, 2024
SDXL extends Stable Diffusion with a larger U-Net backbone, multi-scale generation, and flexible text conditioning, enabling high-resolution, semantically ri...
September 09, 2024
UMT is a unified framework for video highlight detection and moment retrieval that flexibly integrates visual, audio, and optional text modalities to identif...
August 16, 2024
SAM2 generalizes promptable visual segmentation to video by integrating spatio-temporal memory, interactive prompting, and a data engine for fine-grained, ef...
July 23, 2024
The SlowFast network employs dual pathways, with the Slow Pathway capturing high-resolution spatial details and the Fast Pathway capturing rapid temporal cha...
July 23, 2024
DETR revolutionizes object detection by integrating the Transformer architecture’s global attention mechanism with CNN-extracted image features, utilizing a ...
July 13, 2024
Variational Autoencoders (VAEs) employ a probabilistic approach to latent variable modeling, optimizing a variational lower bound to perform efficient approx...
July 10, 2024
Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies ...
July 10, 2024
DDIM (Denoising Diffusion Implicit Models) is a generative model for efficient image generation through a refined diffusion denoising process.
July 10, 2024
BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model that improves natural language understanding by pre-training on vast ...