Blog

2024

[Paper Review] Stable Diffusion & SDXL

September 15, 2024

SDXL extends Stable Diffusion with a larger U-Net backbone, multi-scale generation, and flexible text conditioning, enabling high-resolution, semantically ri...

Generative Models, Diffusion

[Paper Review] UMT (Unified Multimodal Transformers)

September 09, 2024

UMT is a unified framework for video highlight detection and moment retrieval that flexibly integrates visual, audio, and optional text modalities to identif...

Transformers, Video Highlight Detection

[Paper Review] UMT (Unified Multimodal Transformers)

[Paper Review] Segment Anything Model 2 (SAM2)

August 16, 2024

SAM2 generalizes promptable visual segmentation to video by integrating spatio-temporal memory, interactive prompting, and a data engine for fine-grained, ef...

Computer Vision, Video Segmentation, Image Segmentation

[Paper Review] Segment Anything Model 2 (SAM2)

[Paper Review] SlowFast Networks for Video Recognition

July 23, 2024

The SlowFast network employs dual pathways, with the Slow Pathway capturing high-resolution spatial details and the Fast Pathway capturing rapid temporal cha...

Video Understanding

[Paper Review] SlowFast Networks for Video Recognition

[Paper Review] End-to-End Object Detection with Transformers (DETR)

July 23, 2024

DETR revolutionizes object detection by integrating the Transformer architecture’s global attention mechanism with CNN-extracted image features, utilizing a ...

Video Understanding

[Paper Review] End-to-End Object Detection with Transformers (DETR)

[Paper Review] VAE (Variational AutoEncoder)

July 13, 2024

Variational Autoencoders (VAEs) employ a probabilistic approach to latent variable modeling, optimizing a variational lower bound to perform efficient approx...

Generative Models, Autoencoder

[Paper Review] VAE (Variational AutoEncoder)

[Paper Review] Transformers

July 10, 2024

Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies ...

NLP, Encoder, Decoder, Attention, Transformers

[Paper Review] VAE (Variational AutoEncoder)

July 10, 2024

DDIM (Denoising Diffusion Implicit Models) is a generative model for efficient image generation through a refined diffusion denoising process.

Generative Models, Diffusion

[Paper Review] BERT

July 10, 2024

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model that improves natural language understanding by pre-training on vast ...

NLP, Encoder, Transformers

Junhyeong Park

Blog

2024