CVPR2024|AIGC(图像生成,视频生成,3D生成等)相关论文汇总(附论文链接/开源代码/解析)【持续更新】

06-28 1753阅读

CVPR2024|AIGC相关论文汇总(如果觉得有帮助,欢迎点赞和收藏)

  • Awesome-CVPR2024-AIGC
  • 1.图像生成(Image Generation/Image Synthesis)
      • Accelerating Diffusion Sampling with Optimized Time Steps
      • Adversarial Text to Continuous Image Generation
      • Amodal Completion via Progressive Mixed Context Diffusion
      • Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
      • Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
      • Attention Calibration for Disentangled Text-to-Image Personalization
      • Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
      • CapHuman: Capture Your Moments in Parallel Universes
      • CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
      • Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
      • Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
      • CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
      • Condition-Aware Neural Network for Controlled Image Generation
      • CosmicMan: A Text-to-Image Foundation Model for Humans
      • Countering Personalized Text-to-Image Generation with Influence Watermarks
      • Cross Initialization for Face Personalization of Text-to-Image Models
      • Customization Assistant for Text-to-image Generation
      • DeepCache: Accelerating Diffusion Models for Free
      • DemoFusion: Democratising High-Resolution Image Generation With No $
      • Desigen: A Pipeline for Controllable Design Template Generation
      • DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
      • Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
      • DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
      • Diversity-aware Channel Pruning for StyleGAN Compression
      • Discriminative Probing and Tuning for Text-to-Image Generation
      • Don’t drop your samples! Coherence-aware training benefits Conditional diffusion
      • Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
      • DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
      • Dynamic Prompt Optimizing for Text-to-Image Generation
      • ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
      • Efficient Dataset Distillation via Minimax Diffusion
      • ElasticDiffusion: Training-free Arbitrary Size Image Generation
      • EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
      • Enabling Multi-Concept Fusion in Text-to-Image Models
      • Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
      • FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
      • Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
      • FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
      • FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
      • Generalizable Tumor Synthesis
      • Generating Daylight-driven Architectural Design via Diffusion Models
      • Generative Unlearning for Any Identity
      • HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
      • High-fidelity Person-centric Subject-to-Image Synthesis
      • InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
      • InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
      • InstanceDiffusion: Instance-level Control for Image Generation
      • Instruct-Imagen: Image Generation with Multi-modal Instruction
      • Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
      • InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model
      • Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
      • Inversion-Free Image Editing with Natural Language
      • JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
      • LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
      • Learned representation-guided diffusion models for large-image generation
      • Learning Continuous 3D Words for Text-to-Image Generation
      • Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
      • Learning Multi-dimensional Human Preference for Text-to-Image Generation
      • LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
      • MACE: Mass Concept Erasure in Diffusion Models
      • MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
      • MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
      • MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
      • MindBridge: A Cross-Subject Brain Decoding Framework
      • MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
      • On the Scalability of Diffusion-based Text-to-Image Generation
      • OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
      • Personalized Residuals for Concept-Driven Text-to-Image Generation
      • Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
      • PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
      • PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
      • Plug-and-Play Diffusion Distillation
      • Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
      • Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
      • Readout Guidance: Learning Control from Diffusion Features
      • Relation Rectification in Diffusion Model
      • Residual Denoising Diffusion Models
      • Rethinking FID: Towards a Better Evaluation Metric for Image Generation
      • Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
      • Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
      • Rich Human Feedback for Text-to-Image Generation
      • SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
      • Self-correcting LLM-controlled Diffusion Models
      • Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
      • Shadow Generation for Composite Image Using Diffusion Model
      • Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
      • SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
      • StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
      • Structure-Guided Adversarial Training of Diffusion Models
      • Style Aligned Image Generation via Shared Attention
      • SVGDreamer: Text Guided SVG Generation with Diffusion Model
      • SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
      • Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
      • Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
      • Taming Stable Diffusion for Text to 360∘ Panorama Image Generation
      • TextCraftor: Your Text Encoder Can be Image Quality Controller
      • Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
      • TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
      • TokenCompose: Grounding Diffusion with Token-level Supervision
      • Towards Accurate Post-training Quantization for Diffusion Models
      • Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
      • Towards Memorization-Free Diffusion Models
      • Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
      • UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
      • UniGS: Unified Representation for Image Generation and Segmentation
      • Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
      • U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
      • ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models
      • When StyleGAN Meets Stable Diffusion: a 𝒲+ Adapter for Personalized Image Generation
      • X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
      • 2.图像编辑(Image Editing)
          • An Edit Friendly DDPM Noise Space: Inversion and Manipulations
          • Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
          • Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
          • Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
          • DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
          • Deformable One-shot Face Stylization via DINO Semantic Guidance
          • DemoCaricature: Democratising Caricature Generation with a Rough Sketch
          • DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
          • DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
          • Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
          • DiffusionLight: Light Probes for Free by Painting a Chrome Ball
          • Diffusion Models Without Attention
          • Doubly Abductive Counterfactual Inference for Text-based Image Editing
          • Edit One for All: Interactive Batch Image Editing
          • Face2Diffusion for Fast and Editable Face Personalization
          • Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
          • FreeDrag: Feature Dragging for Reliable Point-based Image Editing
          • Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
          • Image Sculpting: Precise Object Editing with 3D Geometry Control
          • Inversion-Free Image Editing with Natural Language
          • PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models
          • Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
          • Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
          • PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
          • RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
          • SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
          • Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
          • SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
          • Text-Driven Image Editing via Learnable Regions
          • Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
          • TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
          • UniHuman: A Unified Model For Editing Human Images in the Wild
          • ZONE: Zero-Shot Instruction-Guided Local Editing
          • 3.视频生成(Video Generation/Video Synthesis)
              • 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
              • A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
              • BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
              • ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
              • Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
              • DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
              • DisCo: Disentangled Control for Realistic Human Dance Generation
              • FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
              • Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
              • FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
              • Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
              • GenTron: Diffusion Transformers for Image and Video Generation
              • Grid Diffusion Models for Text-to-Video Generation
              • Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation
              • Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
              • LAMP: Learn A Motion Pattern for Few-Shot Video Generation
              • Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
              • Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
              • MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
              • Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
              • Make Your Dream A Vlog
              • Make Pixels Dance: High-Dynamic Video Generation
              • MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
              • Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
              • PEEKABOO: Interactive Video Generation via Masked-Diffusion
              • Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
              • SimDA: Simple Diffusion Adapter for Efficient Video Generation
              • StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
              • SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
              • TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
              • Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
              • VideoBooth: Diffusion-based Video Generation with Image Prompts
              • VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
              • Video-P2P: Video Editing with Cross-attention Control
              • 4.视频编辑(Video Editing)
                  • A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
                  • CAMEL: Causal Motion Enhancement tailored for lifting text-driven video editing
                  • CCEdit: Creative and Controllable Video Editing via Diffusion Models
                  • CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
                  • FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
                  • RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
                  • VidToMe: Video Token Merging for Zero-Shot Video Editing
                  • VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
                  • 5.3D生成(3D Generation/3D Synthesis)
                      • 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
                      • Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
                      • A Unified Approach for Text- and Image-guided 4D Scene Generation
                      • BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
                      • BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
                      • CAD: Photorealistic 3D Generation via Adversarial Distillation
                      • CAGE: Controllable Articulation GEneration
                      • CityDreamer: Compositional Generative Model of Unbounded 3D Cities
                      • Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
                      • ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
                      • ControlRoom3D: Room Generation using Semantic Proxy Rooms
                      • DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
                      • DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
                      • DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
                      • DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
                      • Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features
                      • Diffusion Time-step Curriculum for One Image to 3D Generation
                      • DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
                      • DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
                      • DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
                      • Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
                      • EscherNet: A Generative Model for Scalable View Synthesis
                      • GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
                      • GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
                      • Gaussian Shell Maps for Efficient 3D Human Generation
                      • HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
                      • HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
                      • Holodeck: Language Guided Generation of 3D Embodied AI Environments
                      • HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
                      • Interactive3D: Create What You Want by Interactive 3D Generation
                      • InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusio
                      • Intrinsic Image Diffusion for Single-view Material Estimation
                      • Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
                      • MoMask: Generative Masked Modeling of 3D Human Motions
                      • Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
                      • EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
                      • OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
                      • One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
                      • Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
                      • PEGASUS: Personalized Generative 3D Avatars with Composable Attributes
                      • PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
                      • RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D.
                      • SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
                      • SceneWiz3D: Towards Text-guided 3D Scene Composition
                      • SemCity: Semantic Scene Generation with Triplane Diffusion
                      • Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
                      • SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
                      • Single Mesh Diffusion Models with Field Latents for Texture Generation
                      • SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
                      • SPAD: Spatially Aware Multiview Diffusers
                      • Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
                      • Text-to-3D using Gaussian Splatting
                      • The More You See in 2D, the More You Perceive in 3D
                      • Tiger: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
                      • Towards Realistic Scene Generation with LiDAR Diffusion Models
                      • UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
                      • ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
                      • 6.3D编辑(3D Editing)
                          • GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
                          • GenN2N: Generative NeRF2NeRF Translation
                          • Makeup Prior Models for 3D Facial Makeup Estimation and Applications
                          • 7.多模态大语言模型(Multi-Modal Large Language Models)
                              • Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
                              • Anchor-based Robust Finetuning of Vision-Language Models
                              • Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
                              • Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
                              • Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
                              • Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
                              • Compositional Chain-of-Thought Prompting for Large Multimodal Models
                              • Describing Differences in Image Sets with Natural Language
                              • Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
                              • Efficient Stitchable Task Adaptation
                              • Efficient Test-Time Adaptation of Vision-Language Models
                              • Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
                              • FairCLIP: Harnessing Fairness in Vision-Language Learning
                              • FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
                              • FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
                              • Generative Multimodal Models are In-Context Learners
                              • GLaMM: Pixel Grounding Large Multimodal Model
                              • GPT4Point: A Unified Framework for Point-Language Understanding and Generation
                              • InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
                              • Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
                              • Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
                              • LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
                              • LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
                              • Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
                              • MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
                              • MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
                              • Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
                              • OneLLM: One Framework to Align All Modalities with Language
                              • One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
                              • OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
                              • Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
                              • PixelLM: Pixel Reasoning with Large Multimodal Model
                              • PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
                              • Prompt Highlighter: Interactive Control for Multi-Modal LLMs
                              • PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
                              • Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
                              • SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
                              • SEED-Bench: Benchmarking Multimodal Large Language Models
                              • SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
                              • The Manga Whisperer: Automatically Generating Transcriptions for Comics
                              • UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
                              • VBench: Comprehensive Benchmark Suite for Video Generative Models
                              • VideoChat: Chat-Centric Video Understanding
                              • ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
                              • ViTamin: Designing Scalable Vision Models in the Vision-language Era
                              • ViT-Lens: Towards Omni-modal Representations
                              • 8.其他任务(Others)
                                  • AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
                                  • Diff-BGM: A Diffusion Model for Video Background Music Generation
                                  • EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
                                  • On the Content Bias in Fréchet Video Distance
                                  • TexTile: A Differentiable Metric for Texture Tileability
                                  • 参考
                                  • 相关整理

                                    Awesome-CVPR2024-AIGC

                                    A Collection of Papers and Codes for CVPR2024 AIGC

                                    CVPR2024|AIGC(图像生成,视频生成,3D生成等)相关论文汇总(附论文链接/开源代码/解析)【持续更新】
                                    (图片来源网络,侵删)

                                    整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。

                                    欢迎star,fork和PR~

                                    优先在Github更新:Awesome-CVPR2024-AIGC,欢迎star~

                                    知乎:https://zhuanlan.zhihu.com/p/684325134

                                    参考或转载请注明出处

                                    CVPR2024官网:https://cvpr.thecvf.com/Conferences/2024

                                    CVPR接收论文列表:https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers

                                    CVPR完整论文库:https://openaccess.thecvf.com/CVPR2024

                                    开会时间:2024年6月17日-6月21日

                                    论文接收公布时间:2024年2月27日

                                    【Contents】

                                    • 1.图像生成(Image Generation/Image Synthesis)
                                    • 2.图像编辑(Image Editing)
                                    • 3.视频生成(Video Generation/Image Synthesis)
                                    • 4.视频编辑(Video Editing)
                                    • 5.3D生成(3D Generation/3D Synthesis)
                                    • 6.3D编辑(3D Editing)
                                    • 7.多模态大语言模型(Multi-Modal Large Language Model)
                                    • 8.其他多任务(Others)

                                      1.图像生成(Image Generation/Image Synthesis)

                                      Accelerating Diffusion Sampling with Optimized Time Steps

                                      • Paper: https://arxiv.org/abs/2402.17376
                                      • Code: https://github.com/scxue/DM-NonUniform

                                        Adversarial Text to Continuous Image Generation

                                        • Paper: https://openreview.net/forum?id=9X3UZJSGIg9
                                        • Code:

                                          Amodal Completion via Progressive Mixed Context Diffusion

                                          • Paper: https://arxiv.org/abs/2312.15540
                                          • Code: https://github.com/k8xu/amodal

                                            Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

                                            • Paper: https://arxiv.org/abs/2403.10255
                                            • Code: https://github.com/zhenshij/arbitrary-scale-diffusion

                                              Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

                                              • Paper: https://arxiv.org/abs/2312.12471
                                              • Code: https://github.com/zkawfanx/Atlantis

                                                Attention Calibration for Disentangled Text-to-Image Personalization

                                                • Paper: https://arxiv.org/abs/2403.18551
                                                • Code: https://github.com/Monalissaa/DisenDiff

                                                  Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

                                                  • Paper: https://arxiv.org/abs/2405.05252
                                                  • Code:

                                                    CapHuman: Capture Your Moments in Parallel Universes

                                                    • Paper: https://arxiv.org/abs/2402.18078
                                                    • Code: https://github.com/VamosC/CapHuman

                                                      CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization

                                                      • Paper: https://arxiv.org/abs/2404.00521
                                                      • Code:

                                                        Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

                                                        • Paper: https://arxiv.org/abs/2311.15773
                                                        • Code:

                                                          Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

                                                          • Paper: https://arxiv.org/abs/2402.00627
                                                          • Code: https://github.com/YanzuoLu/CFLD

                                                            CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

                                                            • Paper: https://arxiv.org/abs/2310.01407
                                                            • Code: https://github.com/fast-codi/CoDi

                                                              Condition-Aware Neural Network for Controlled Image Generation

                                                              • Paper: https://arxiv.org/abs/2404.01143v1
                                                              • Code:

                                                                CosmicMan: A Text-to-Image Foundation Model for Humans

                                                                • Paper: https://arxiv.org/abs/2404.01294
                                                                • Code: https://github.com/cosmicman-cvpr2024/CosmicMan

                                                                  Countering Personalized Text-to-Image Generation with Influence Watermarks

                                                                  • Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Liu_Countering_Personalized_Text-to-Image_Generation_with_Influence_Watermarks_CVPR_2024_paper.html
                                                                  • Code:

                                                                    Cross Initialization for Face Personalization of Text-to-Image Models

                                                                    • Paper: https://arxiv.org/abs/2312.15905
                                                                    • Code: https://github.com/lyuPang/CrossInitialization

                                                                      Customization Assistant for Text-to-image Generation

                                                                      • Paper: https://arxiv.org/abs/2312.03045
                                                                      • Code:

                                                                        DeepCache: Accelerating Diffusion Models for Free

                                                                        • Paper: https://arxiv.org/abs/2312.00858
                                                                        • Code: https://github.com/horseee/DeepCache

                                                                          DemoFusion: Democratising High-Resolution Image Generation With No $

                                                                          • Paper: https://arxiv.org/abs/2311.16973
                                                                          • Code: https://github.com/PRIS-CV/DemoFusion

                                                                            Desigen: A Pipeline for Controllable Design Template Generation

                                                                            • Paper: https://arxiv.org/abs/2403.09093
                                                                            • Code: https://github.com/whaohan/desigen

                                                                              DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

                                                                              • Paper: https://arxiv.org/abs/2404.01342
                                                                              • Code: https://github.com/OpenGVLab/DiffAgent

                                                                                Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation

                                                                                • Paper: https://arxiv.org/abs/2405.04356v1
                                                                                • Code:

                                                                                  DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

                                                                                  • Paper: https://arxiv.org/abs/2402.19481
                                                                                  • Code: https://github.com/mit-han-lab/distrifuser

                                                                                    Diversity-aware Channel Pruning for StyleGAN Compression

                                                                                    • Paper: https://arxiv.org/abs/2403.13548
                                                                                    • Code: https://github.com/jiwoogit/DCP-GAN

                                                                                      Discriminative Probing and Tuning for Text-to-Image Generation

                                                                                      • Paper: https://www.arxiv.org/abs/2403.04321
                                                                                      • Code: https://github.com/LgQu/DPT-T2I

                                                                                        Don’t drop your samples! Coherence-aware training benefits Conditional diffusion

                                                                                        • Paper: https://arxiv.org/abs/2405.20324
                                                                                        • Code: https://github.com/nicolas-dufour/CAD

                                                                                          Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

                                                                                          • Paper: https://arxiv.org/abs/2404.01050
                                                                                          • Code: https://github.com/haofengl/DragNoise

                                                                                            DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

                                                                                            • Paper: https://arxiv.org/abs/2402.09812
                                                                                            • Code: https://github.com/KU-CVLAB/DreamMatcher

                                                                                              Dynamic Prompt Optimizing for Text-to-Image Generation

                                                                                              • Paper: https://arxiv.org/abs/2404.04095
                                                                                              • Code: https://github.com/Mowenyii/PAE

                                                                                                ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

                                                                                                • Paper: https://arxiv.org/abs/2312.04655
                                                                                                • Code: https://github.com/eclipse-t2i/eclipse-inference

                                                                                                  Efficient Dataset Distillation via Minimax Diffusion

                                                                                                  • Paper: https://arxiv.org/abs/2311.15529
                                                                                                  • Code: https://github.com/vimar-gu/MinimaxDiffusion

                                                                                                    ElasticDiffusion: Training-free Arbitrary Size Image Generation

                                                                                                    • Paper: https://arxiv.org/abs/2311.18822
                                                                                                    • Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official

                                                                                                      EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

                                                                                                      • Paper: https://arxiv.org/abs/2401.04608
                                                                                                      • Code: https://github.com/JingyuanYY/EmoGen

                                                                                                        Enabling Multi-Concept Fusion in Text-to-Image Models

                                                                                                        • Paper: https://arxiv.org/abs/2404.03913v1
                                                                                                        • Code:

                                                                                                          Exact Fusion via Feature Distribution Matching for Few-shot Image Generation

                                                                                                          • Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Zhou_Exact_Fusion_via_Feature_Distribution_Matching_for_Few-shot_Image_Generation_CVPR_2024_paper.html
                                                                                                          • Code:

                                                                                                            FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

                                                                                                            • Paper: https://arxiv.org/abs/2403.06775
                                                                                                            • Code:

                                                                                                              Fast ODE-based Sampling for Diffusion Models in Around 5 Steps

                                                                                                              • Paper: https://arxiv.org/abs/2312.00094
                                                                                                              • Code: https://github.com/zju-pi/diff-sampler

                                                                                                                FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

                                                                                                                • Paper: https://arxiv.org/abs/2312.07536
                                                                                                                • Code: https://github.com/genforce/freecontrol

                                                                                                                  FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

                                                                                                                  • Paper: https://arxiv.org/abs/2405.13870
                                                                                                                  • Code: https://github.com/aim-uofa/FreeCustom

                                                                                                                    Generalizable Tumor Synthesis

                                                                                                                    • Paper: https://www.cs.jhu.edu/~alanlab/Pubs24/chen2024towards.pdf
                                                                                                                    • Code: https://github.com/MrGiovanni/DiffTumor

                                                                                                                      Generating Daylight-driven Architectural Design via Diffusion Models

                                                                                                                      • Paper: https://arxiv.org/abs/2404.13353
                                                                                                                      • Code: https://github.com/unlimitedli/DDADesign

                                                                                                                        Generative Unlearning for Any Identity

                                                                                                                        • Paper: https://arxiv.org/abs/2405.09879
                                                                                                                        • Code: https://github.com/JJuOn/GUIDE

                                                                                                                          HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

                                                                                                                          • Paper: https://arxiv.org/abs/2403.01693
                                                                                                                          • Code: https://github.com/JJuOn/GUIDE

                                                                                                                            High-fidelity Person-centric Subject-to-Image Synthesis

                                                                                                                            • Paper: https://arxiv.org/abs/2311.10329
                                                                                                                            • Code: https://github.com/CodeGoat24/Face-diffuser?tab=readme-ov-file

                                                                                                                              InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization

                                                                                                                              • Paper: https://arxiv.org/abs/2404.04650
                                                                                                                              • Code: https://github.com/xiefan-guo/initno

                                                                                                                                InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning

                                                                                                                                • Paper: https://arxiv.org/abs/2304.03411
                                                                                                                                • Code:

                                                                                                                                  InstanceDiffusion: Instance-level Control for Image Generation

                                                                                                                                  • Paper: https://arxiv.org/abs/2402.03290
                                                                                                                                  • Code: https://github.com/frank-xwang/InstanceDiffusion

                                                                                                                                    Instruct-Imagen: Image Generation with Multi-modal Instruction

                                                                                                                                    • Paper: https://arxiv.org/abs/2401.01952
                                                                                                                                    • Code:

                                                                                                                                      Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

                                                                                                                                      • Paper: https://arxiv.org/abs/2306.00973
                                                                                                                                      • Code: https://github.com/haoningwu3639/StoryGen

                                                                                                                                        InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model

                                                                                                                                        • Paper: https://arxiv.org/abs/2312.05849
                                                                                                                                        • Code: https://github.com/jiuntian/interactdiffusion

                                                                                                                                          Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models

                                                                                                                                          • Paper: https://arxiv.org/abs/2308.15692
                                                                                                                                          • Code:

                                                                                                                                            Inversion-Free Image Editing with Natural Language

                                                                                                                                            • Paper: https://arxiv.org/abs/2312.04965
                                                                                                                                            • Code: https://github.com/sled-group/InfEdit

                                                                                                                                              JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

                                                                                                                                              • Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Zeng_JeDi_Joint-Image_Diffusion_Models_for_Finetuning-Free_Personalized_Text-to-Image_Generation_CVPR_2024_paper.html
                                                                                                                                              • Code:

                                                                                                                                                LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

                                                                                                                                                • Paper: https://arxiv.org/abs/2404.00292
                                                                                                                                                • Code: https://github.com/PanchengZhao/LAKE-RED

                                                                                                                                                  Learned representation-guided diffusion models for large-image generation

                                                                                                                                                  • Paper: https://arxiv.org/abs/2312.07330
                                                                                                                                                  • Code: https://github.com/cvlab-stonybrook/Large-Image-Diffusion

                                                                                                                                                    Learning Continuous 3D Words for Text-to-Image Generation

                                                                                                                                                    • Paper: https://arxiv.org/abs/2402.08654
                                                                                                                                                    • Code: https://github.com/ttchengab/continuous_3d_words_code/

                                                                                                                                                      Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

                                                                                                                                                      • Paper: https://arxiv.org/abs/2311.15841
                                                                                                                                                      • Code:

                                                                                                                                                        Learning Multi-dimensional Human Preference for Text-to-Image Generation

                                                                                                                                                        • Paper: https://arxiv.org/abs/2311.15841
                                                                                                                                                        • Code:

                                                                                                                                                          LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

                                                                                                                                                          • Paper: https://arxiv.org/abs/2305.11577
                                                                                                                                                          • Code: https://github.com/ewrfcas/LeftRefill

                                                                                                                                                            MACE: Mass Concept Erasure in Diffusion Models

                                                                                                                                                            • Paper: https://arxiv.org/abs/2402.05408
                                                                                                                                                            • Code: https://github.com/Shilin-LU/MACE

                                                                                                                                                              MarkovGen: Structured Prediction for Efficient Text-to-Image Generation

                                                                                                                                                              • Paper: https://arxiv.org/abs/2308.10997
                                                                                                                                                              • Code:

                                                                                                                                                                MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

                                                                                                                                                                • Paper: https://arxiv.org/abs/2403.04290
                                                                                                                                                                • Code:

                                                                                                                                                                  MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

                                                                                                                                                                  • Paper: https://arxiv.org/abs/2402.05408
                                                                                                                                                                  • Code: https://github.com/limuloo/MIGC

                                                                                                                                                                    MindBridge: A Cross-Subject Brain Decoding Framework

                                                                                                                                                                    • Paper: https://arxiv.org/abs/2404.07850
                                                                                                                                                                    • Code: https://github.com/littlepure2333/MindBridge

                                                                                                                                                                      MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation

                                                                                                                                                                      • Paper: https://arxiv.org/abs/2404.02790
                                                                                                                                                                      • Code: https://huggingface.co/datasets/mulan-dataset/v1.0

                                                                                                                                                                        On the Scalability of Diffusion-based Text-to-Image Generation

                                                                                                                                                                        • Paper: https://arxiv.org/abs/2404.02883
                                                                                                                                                                        • Code:

                                                                                                                                                                          OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

                                                                                                                                                                          • Paper: https://arxiv.org/abs/2404.07990
                                                                                                                                                                          • Code: https://github.com/Picsart-AI-Research/OpenBias

                                                                                                                                                                            Personalized Residuals for Concept-Driven Text-to-Image Generation

                                                                                                                                                                            • Paper: https://arxiv.org/abs/2405.12978
                                                                                                                                                                            • Code:

                                                                                                                                                                              Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

                                                                                                                                                                              • Paper: https://arxiv.org/abs/2404.15081
                                                                                                                                                                              • Code:

                                                                                                                                                                                PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

                                                                                                                                                                                • Paper: https://arxiv.org/abs/2312.04461
                                                                                                                                                                                • Code: https://github.com/TencentARC/PhotoMaker

                                                                                                                                                                                  PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

                                                                                                                                                                                  • Paper: https://arxiv.org/abs/2403.01852
                                                                                                                                                                                  • Code: https://github.com/cszy98/PLACE

                                                                                                                                                                                    Plug-and-Play Diffusion Distillation

                                                                                                                                                                                    • Paper: https://arxiv.org/abs/2406.01954
                                                                                                                                                                                    • Code:

                                                                                                                                                                                      Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

                                                                                                                                                                                      • Paper: https://arxiv.org/abs/2305.16223
                                                                                                                                                                                      • Code: https://github.com/SHI-Labs/Prompt-Free-Diffusion

                                                                                                                                                                                        Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

                                                                                                                                                                                        • Paper: https://arxiv.org/abs/2311.17002
                                                                                                                                                                                        • Code: https://github.com/ali-vilab/Ranni

                                                                                                                                                                                          Readout Guidance: Learning Control from Diffusion Features

                                                                                                                                                                                          • Paper: https://arxiv.org/abs/2312.02150
                                                                                                                                                                                          • Code: https://github.com/google-research/readout_guidance

                                                                                                                                                                                            Relation Rectification in Diffusion Model

                                                                                                                                                                                            • Paper: https://arxiv.org/abs/2403.20249
                                                                                                                                                                                            • Code: https://github.com/WUyinwei-hah/RRNet

                                                                                                                                                                                              Residual Denoising Diffusion Models

                                                                                                                                                                                              • Paper: https://arxiv.org/abs/2308.13712
                                                                                                                                                                                              • Code: https://github.com/nachifur/RDDM

                                                                                                                                                                                                Rethinking FID: Towards a Better Evaluation Metric for Image Generation

                                                                                                                                                                                                • Paper: https://arxiv.org/abs/2401.09603
                                                                                                                                                                                                • Code: https://github.com/google-research/google-research/tree/master/cmmd

                                                                                                                                                                                                  Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

                                                                                                                                                                                                  • Paper: https://arxiv.org/abs/2404.05384
                                                                                                                                                                                                  • Code: https://github.com/SmilesDZgk/S-CFG

                                                                                                                                                                                                    Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

                                                                                                                                                                                                    • Paper: https://arxiv.org/abs/2311.13602
                                                                                                                                                                                                    • Code: https://github.com/CyberAgentAILab/RALF

                                                                                                                                                                                                      Rich Human Feedback for Text-to-Image Generation

                                                                                                                                                                                                      • Paper: https://arxiv.org/abs/2312.10240
                                                                                                                                                                                                      • Code:

                                                                                                                                                                                                        SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

                                                                                                                                                                                                        • Paper: https://arxiv.org/abs/2401.08053
                                                                                                                                                                                                        • Code:

                                                                                                                                                                                                          Self-correcting LLM-controlled Diffusion Models

                                                                                                                                                                                                          • Paper: https://arxiv.org/abs/2311.16090
                                                                                                                                                                                                          • Code: https://github.com/tsunghan-wu/SLD

                                                                                                                                                                                                            Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

                                                                                                                                                                                                            • Paper: https://arxiv.org/abs/2311.17216
                                                                                                                                                                                                            • Code: https://github.com/hangligit/InterpretDiffusion

                                                                                                                                                                                                              Shadow Generation for Composite Image Using Diffusion Model

                                                                                                                                                                                                              • Paper: https://arxiv.org/abs/2308.09972
                                                                                                                                                                                                              • Code: https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2

                                                                                                                                                                                                                Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

                                                                                                                                                                                                                • Paper: https://arxiv.org/abs/2312.04410
                                                                                                                                                                                                                • Code: https://github.com/SHI-Labs/Smooth-Diffusion

                                                                                                                                                                                                                  SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

                                                                                                                                                                                                                  • Paper: https://arxiv.org/abs/2312.16272
                                                                                                                                                                                                                  • Code: https://github.com/Xiaojiu-z/SSR_Encoder

                                                                                                                                                                                                                    StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

                                                                                                                                                                                                                    • Paper: https://arxiv.org/abs/2312.01725
                                                                                                                                                                                                                    • Code: https://github.com/rlawjdghek/StableVITON

                                                                                                                                                                                                                      Structure-Guided Adversarial Training of Diffusion Models

                                                                                                                                                                                                                      • Paper: https://arxiv.org/abs/2402.17563
                                                                                                                                                                                                                      • Code:

                                                                                                                                                                                                                        Style Aligned Image Generation via Shared Attention

                                                                                                                                                                                                                        • Paper: https://arxiv.org/abs/2312.02133
                                                                                                                                                                                                                        • Code: https://github.com/google/style-aligned/

                                                                                                                                                                                                                          SVGDreamer: Text Guided SVG Generation with Diffusion Model

                                                                                                                                                                                                                          • Paper: https://arxiv.org/abs/2312.16476
                                                                                                                                                                                                                          • Code: https://github.com/ximinng/SVGDreamer

                                                                                                                                                                                                                            SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation

                                                                                                                                                                                                                            • Paper: https://arxiv.org/abs/2312.05239
                                                                                                                                                                                                                            • Code: https://github.com/VinAIResearch/SwiftBrush

                                                                                                                                                                                                                              Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting

                                                                                                                                                                                                                              • Paper: https://arxiv.org/abs/2310.08129
                                                                                                                                                                                                                              • Code: https://github.com/zzjchen/Tailored-Visions

                                                                                                                                                                                                                                Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models

                                                                                                                                                                                                                                • Paper: https://arxiv.org/abs/2403.08381
                                                                                                                                                                                                                                • Code: https://github.com/PangzeCheung/SingDiffusion

                                                                                                                                                                                                                                  Taming Stable Diffusion for Text to 360∘ Panorama Image Generation

                                                                                                                                                                                                                                  • Paper: https://arxiv.org/abs/2404.07949
                                                                                                                                                                                                                                  • Code: https://github.com/chengzhag/PanFusion

                                                                                                                                                                                                                                    TextCraftor: Your Text Encoder Can be Image Quality Controller

                                                                                                                                                                                                                                    • Paper: https://arxiv.org/abs/2403.18978
                                                                                                                                                                                                                                    • Code:

                                                                                                                                                                                                                                      Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation

                                                                                                                                                                                                                                      • Paper: https://arxiv.org/abs/2403.06247
                                                                                                                                                                                                                                      • Code: https://github.com/MingyuLee82/TGI_AD_v1

                                                                                                                                                                                                                                        TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models

                                                                                                                                                                                                                                        • Paper: https://arxiv.org/abs/2311.16503
                                                                                                                                                                                                                                        • Code: https://github.com/ModelTC/TFMQ-DM

                                                                                                                                                                                                                                          TokenCompose: Grounding Diffusion with Token-level Supervision

                                                                                                                                                                                                                                          • Paper: https://arxiv.org/abs/2312.03626
                                                                                                                                                                                                                                          • Code: https://github.com/mlpc-ucsd/TokenCompose

                                                                                                                                                                                                                                            Towards Accurate Post-training Quantization for Diffusion Models

                                                                                                                                                                                                                                            • Paper: https://arxiv.org/abs/2305.18723
                                                                                                                                                                                                                                            • Code: https://github.com/ChangyuanWang17/APQ-DM

                                                                                                                                                                                                                                              Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

                                                                                                                                                                                                                                              • Paper: https://arxiv.org/abs/2403.05239
                                                                                                                                                                                                                                              • Code:

                                                                                                                                                                                                                                                Towards Memorization-Free Diffusion Models

                                                                                                                                                                                                                                                • Paper: https://arxiv.org/abs/2404.00922
                                                                                                                                                                                                                                                • Code: https://github.com/chenchen-usyd/AMG

                                                                                                                                                                                                                                                  Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning

                                                                                                                                                                                                                                                  • Paper: https://openaccess.thecvf.com/content/CVPR2024/html/Miao_Training_Diffusion_Models_Towards_Diverse_Image_Generation_with_Reinforcement_Learning_CVPR_2024_paper.html
                                                                                                                                                                                                                                                  • Code:

                                                                                                                                                                                                                                                    UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

                                                                                                                                                                                                                                                    • Paper: https://arxiv.org/abs/2311.09257
                                                                                                                                                                                                                                                    • Code:

                                                                                                                                                                                                                                                      UniGS: Unified Representation for Image Generation and Segmentation

                                                                                                                                                                                                                                                      • Paper: https://arxiv.org/abs/2312.01985
                                                                                                                                                                                                                                                      • Code: https://github.com/qqlu/Entity

                                                                                                                                                                                                                                                        Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

                                                                                                                                                                                                                                                        • Paper: https://arxiv.org/abs/2311.13231
                                                                                                                                                                                                                                                        • Code: https://github.com/yk7333/d3po

                                                                                                                                                                                                                                                          U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation

                                                                                                                                                                                                                                                          • Paper: https://arxiv.org/abs/2403.20231
                                                                                                                                                                                                                                                          • Code: https://github.com/ICTMCG/U-VAP

                                                                                                                                                                                                                                                            ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models

                                                                                                                                                                                                                                                            • Paper: https://arxiv.org/abs/2403.01807
                                                                                                                                                                                                                                                            • Code: https://github.com/facebookresearch/ViewDiff

                                                                                                                                                                                                                                                              When StyleGAN Meets Stable Diffusion: a 𝒲+ Adapter for Personalized Image Generation

                                                                                                                                                                                                                                                              • Paper: https://arxiv.org/abs/2311.17461
                                                                                                                                                                                                                                                              • Code: https://github.com/csxmli2016/w-plus-adapter

VPS购买请点击我

文章版权声明:除非注明,否则均为主机测评原创文章,转载或复制请以超链接形式并注明出处。

目录[+]