Posts by Tags

[Paper Review] Transformers

5 minute read

Published: July 10, 2024

Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies and contextual relationships in text.

[Paper Review] VAE (Variational AutoEncoder)

4 minute read

Published: July 13, 2024

Variational Autoencoders (VAEs) employ a probabilistic approach to latent variable modeling, optimizing a variational lower bound to perform efficient approximate posterior inference and learning of generative models with continuous latent variables.

[Paper Review] Segment Anything Model 2 (SAM2)

5 minute read

Published: August 16, 2024

SAM2 generalizes promptable visual segmentation to video by integrating spatio-temporal memory, interactive prompting, and a data engine for fine-grained, efficient, and class-agnostic object segmentation across frames.

[Paper Review] Transformers

5 minute read

Published: July 10, 2024

Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies and contextual relationships in text.

[Paper Review] Stable Diffusion & SDXL

3 minute read

Published: September 15, 2024

SDXL extends Stable Diffusion with a larger U-Net backbone, multi-scale generation, and flexible text conditioning, enabling high-resolution, semantically rich image synthesis across diverse prompts and resolutions.

[Paper Review] VAE (Variational AutoEncoder)

5 minute read

Published: July 10, 2024

DDIM (Denoising Diffusion Implicit Models) is a generative model for efficient image generation through a refined diffusion denoising process.

[Paper Review] Transformers

5 minute read

Published: July 10, 2024

Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies and contextual relationships in text.

[Paper Review] BERT

7 minute read

Published: July 10, 2024

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model that improves natural language understanding by pre-training on vast amounts of text to capture context from both directions.

[Paper Review] Stable Diffusion & SDXL

3 minute read

Published: September 15, 2024

SDXL extends Stable Diffusion with a larger U-Net backbone, multi-scale generation, and flexible text conditioning, enabling high-resolution, semantically rich image synthesis across diverse prompts and resolutions.

[Paper Review] VAE (Variational AutoEncoder)

4 minute read

Published: July 13, 2024

Variational Autoencoders (VAEs) employ a probabilistic approach to latent variable modeling, optimizing a variational lower bound to perform efficient approximate posterior inference and learning of generative models with continuous latent variables.

[Paper Review] VAE (Variational AutoEncoder)

5 minute read

Published: July 10, 2024

DDIM (Denoising Diffusion Implicit Models) is a generative model for efficient image generation through a refined diffusion denoising process.

[Paper Review] Segment Anything Model 2 (SAM2)

5 minute read

Published: August 16, 2024

SAM2 generalizes promptable visual segmentation to video by integrating spatio-temporal memory, interactive prompting, and a data engine for fine-grained, efficient, and class-agnostic object segmentation across frames.

[Paper Review] Transformers

5 minute read

Published: July 10, 2024

Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies and contextual relationships in text.

[Paper Review] BERT

7 minute read

Published: July 10, 2024

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model that improves natural language understanding by pre-training on vast amounts of text to capture context from both directions.

[Paper Review] UMT (Unified Multimodal Transformers)

3 minute read

Published: September 09, 2024

UMT is a unified framework for video highlight detection and moment retrieval that flexibly integrates visual, audio, and optional text modalities to identify key moments in both query-based and query-free scenarios.

[Paper Review] Transformers

5 minute read

Published: July 10, 2024

Transformers is a deep learning architecture that enhances natural language processing by using self-attention mechanisms to capture long-range dependencies and contextual relationships in text.

[Paper Review] BERT

7 minute read

Published: July 10, 2024

BERT (Bidirectional Encoder Representations from Transformers) is a deep learning model that improves natural language understanding by pre-training on vast amounts of text to capture context from both directions.

[Paper Review] UMT (Unified Multimodal Transformers)

3 minute read

Published: September 09, 2024

UMT is a unified framework for video highlight detection and moment retrieval that flexibly integrates visual, audio, and optional text modalities to identify key moments in both query-based and query-free scenarios.

[Paper Review] Segment Anything Model 2 (SAM2)

5 minute read

Published: August 16, 2024

SAM2 generalizes promptable visual segmentation to video by integrating spatio-temporal memory, interactive prompting, and a data engine for fine-grained, efficient, and class-agnostic object segmentation across frames.

[Paper Review] SlowFast Networks for Video Recognition

4 minute read

Published: July 23, 2024

The SlowFast network employs dual pathways, with the Slow Pathway capturing high-resolution spatial details and the Fast Pathway capturing rapid temporal changes, to achieve advanced video recognition.

[Paper Review] End-to-End Object Detection with Transformers (DETR)

5 minute read

Published: July 23, 2024

DETR revolutionizes object detection by integrating the Transformer architecture’s global attention mechanism with CNN-extracted image features, utilizing a novel bipartite matching algorithm to enhance detection accuracy and efficiency across varied object scales.

Junhyeong Park

Posts by Tags

Attention

[Paper Review] Transformers

Autoencoder

[Paper Review] VAE (Variational AutoEncoder)

Computer Vision

[Paper Review] Segment Anything Model 2 (SAM2)

Decoder

[Paper Review] Transformers

Diffusion

[Paper Review] Stable Diffusion & SDXL

[Paper Review] VAE (Variational AutoEncoder)

Encoder

[Paper Review] Transformers

[Paper Review] BERT

Generative Models

[Paper Review] Stable Diffusion & SDXL

[Paper Review] VAE (Variational AutoEncoder)

[Paper Review] VAE (Variational AutoEncoder)

Image Segmentation

[Paper Review] Segment Anything Model 2 (SAM2)

NLP

[Paper Review] Transformers

[Paper Review] BERT

Transformers

[Paper Review] UMT (Unified Multimodal Transformers)

[Paper Review] Transformers

[Paper Review] BERT

Video Highlight Detection

[Paper Review] UMT (Unified Multimodal Transformers)

Video Segmentation

[Paper Review] Segment Anything Model 2 (SAM2)

Video Understanding

[Paper Review] SlowFast Networks for Video Recognition

[Paper Review] End-to-End Object Detection with Transformers (DETR)