GaussianEditor

Swift and Controllable 3D Editing with Gaussian Splatting (CVPR 2024)

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin

paper :
https://arxiv.org/abs/2311.14521
project website :
https://gaussianeditor.github.io/
code :
https://github.com/buaacyw/GaussianEditor

Paper Review 후기

novelty :
- SAM mask를 GS로 inverse rendering해서 target GS identify
- 기존 GS에서 크게 벗어나지 않도록(stability) anchor loss
3DGS 나오고나서 3DGS 이용한 Editing에 대해 잽싸게 낸 논문이라
비교 대상도 없고
Editing loss도 기존 기법을 그대로 써서 novelty 흐음…?

NeRF-based 3D Editing :
- Instruct-nerf2nerf: Editing 3D scenes with instructions
- Ed-nerf: Efficient text-guided editing of 3D scene using latent space nerf
- Clip-nerf: Text-and-Image driven manipulation of neural radiance fields
- Nerf-art: Text-driven neural radiance fields stylization
- Dreameditor: Text-driven 3D scene editing with neural fields
NeRF-based 3D Editing by MLP의 문제점 :
- specific scene parts를 직접 수정하는 데 제한
- inpainting 및 scene composition 과정이 복잡
- strictly masked area 내에서만 editing 가능
3DGS-based 3D Editing의 문제점 :
- Editing할 Gaussian을 identify(분류)해야 함
- SDS처럼 Diffusion model로 얻은 random generative guidance를 3DGS에 적용할 때
  randomness in loss로 인해
  view마다 non-consistent(random)한 image를 합성(Editing)하므로
  GS is directly affected by randomness,
  so 업데이트 불안정
- 수많은 Gaussian points를 업데이트해야 함
  NeRF-based에서처럼 MLP NN buffering이 불가능하므로 불안정하여
  finely detailed result로 수렴하는 걸 방해

Method

Gaussian Semantic Tracing

전제 : 3DGS가 이미 잘 구성되어 있다고 가정하고, 특정 scene part를 제거 또는 추가하거나 inpainting하는 등 3D Editing 수행
Gaussian Semantic Tracing :
훈련하는 동안 3D Editing할 target을 trace하기 위해 semantic label(mask) 생성

Parameters

\(x, s, q, \alpha , c\) (position, covariance(scale, quaternion), opacity, color) 뿐만 아니라
\(m_{ij}\) (semantic Gaussian mask for i-th Gaussian and j-th semantic label) 추가
densification할 때 clone/split된 points는 parent point의 semantic label를 그대로 물려받음

처음에 inaccurate segmentation mask에서 출발했더라도 Gaussian semantic tracing하는 동안 3DGS 업데이트하면서 semantic Gaussian mask도 알맞게 업데이트됨

Initial Labeling Process

camera pose 하나 골라서 SAM(Segment Anything)으로 2D segmentation 수행한 뒤
inverse rendering으로 2D mask를 3D Gaussian으로 unproject
\(w_i^j = \Sigma o_i (p) \ast T_i^j (p) \ast M^j (p)\)
where \(w_i^j\) : weight of i-th Gaussian for j-th semantic label
where \(o, T, M, p\) : opacity, transmittance, mask, pixel
average weight가 threshold를 넘는 경우에만 해당 i-th Gaussian이 j-th semantic class를 갖는다고 선별

Hierarchical Gaussian Splatting

Hierarchical Gaussian Splatting :
stabilized and fine results 만들기 위해 anchor loss 사용
3D Editing 위해 densification할 때
threshold를 manually 정하는 게 아니라,
3D position gradients가 top k% 안에 드는 3DGS들만 선택적으로 densify
(k값이 점점 증가)
anchor loss :
- 3D Editing 때문에 densification할 때마다 기존의 Gaussian param.을 anchor에 record
- 3D Editing에 따라 변형되는 Gaussian param.가 각 anchor로부터 크게 벗어나지 않도록 함
  \(L_{anchor}^P = \Sigma_{i=0}^n \lambda_{i} (P_i - \hat P_i)^2\)
  where \(P : x, s, q, \alpha , c, m_{ij}\)
  where \(\lambda_{i}\) 값이 점점 증가 (새로 만들어지는 Gaussian param.의 영향이 크도록)
- stable geometry formation under stochastic loss 보장
Edit loss :
- 3DGS model로 rendering한 image와 diffusion model 간의 차이
  \(L_{Edit} = D(\theta ; p, e)\)
  where \(D, \theta , p, e\) : Diffusion model, 3D model, camera pose, prompt
- 2D diffusion model로 3D Editing하는 방법 :
  1) DreamFusion [1] 의 SDS loss처럼 3D model의 rendering과 other conditions를 2D diffusion model에 넣어준 뒤, noise 넣고 denoising하는 과정에서 내놓은 score가 3D model의 업데이트 방향을 guide
  즉, 3D model로 만든 image가 2D diffusion에서의 그럴 듯한 image distribution에 부합하도록 함
  2) Instruct-nerf2nerf처럼 3D model의 rendering과 prompts 이용해서 2D Editing 수행하는 데 초점을 두고, Edited 2D multi-view images를 training target으로 사용하여 3D model에게 guidance 줌
total loss :
\(L = L_{Edit} + \Sigma_{P \in [x, s, q, \alpha , c]} \lambda_{P} L_{anchor}^P\)

3D Inpainting

외부 모델들 사용해서 Efficient 3D Editing 구현
Object Removal (object 제거) :
- semantic label(mask) 가지는 3D Gaussian만 삭제하면 target object와 scene 사이의 interface에서 artifacts 생김
- precise mask를 생성할 필요가 있음
  - 삭제한 3DGS와 가장 가까운 Gaussian을 KNN으로 identify
  - 이를 다양한 view-points로 project하여 mask를 확장 (dilate)
  - hole을 메꿔서 interface area를 정확하게 표현하도록 refined mask를 생성
- Diffusion model을 이용해서 해당 area를 2D inpainting (object 삭제)
- inpainted image를 기반으로 3DGS 업데이트

Object Incorporation by text (object 추가 혹은 수정) :
- editing area에 BB 만듦
- Stable Diffusion XL model [2] 을 이용해서 해당 area에 넣을 image를 생성하고
  fg object is segmented
- Wonder3D model [3] 을 이용해서 fg-segmented image를 3D textured mesh로 변환
- Hierarchical Gaussian Splatting을 이용해서 mesh를 새로운 3DGS로 변환
- DPT [4] 로 depth estimation해서 기존의 3DGS와 생성된 3DGS의 depth를 align해주고 기존의 3DGS와 생성된 3DGS를 concatenate(결합)

Experiments

Implementation :
- view-point (camera-pose) 개수 : 24-96개
- optimization : 3DGS가 이미 구성되었다는 전제 하에
  3D Editing하는 데만 500-1000 steps, 5-10 min.
  (3 min. for Wonder3D mesh 생성 + 2 min. for 3DGS로 변환 및 refine)
Ablation Study :
- w/o Semantic Tracing :
  target object만 Editing되는 게 아니라 image 전체 Editing
- w/o Hierarchical GS :
  uncontrolled densification 및 image blurring 초래

Conclusion

2 strategies
- Gaussian Semantic Tracing
  for precise Gaussian identification of editing areas
- Hierarchical GS
  for balance b.w. fluidity and stability
  to achieve detailed(fine) results under stochastic guidance

Limitation

supervision을 위해 2D diffusion model에 의존하여 3D editing을 수행하는데
현재 2D diffusion model은 특정 복잡한 prompts에 대해서는 effective guidance를 제공하는 데 어려움이 있어 3D editing에도 한계 있음

Question

Q1 : Edit loss에서 사용하는 SDS loss나 Instruct-nerf2nerf 기법은 이미 있는 내용이고,
본 논문에서 볼 건 아래의 두 가지 정도인데 (SAM mask를 GS로 inverse rendering해서 target GS identify하고
기존 GS에서 크게 벗어나지 않도록(stability) anchor loss)
별로 novelty가 없는 것 같다
A1 : ㅇㅈ

GaussianEditor

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin

Paper Review 후기

Related Work

Method

Gaussian Semantic Tracing

Parameters

Initial Labeling Process

Hierarchical Gaussian Splatting

3D Inpainting

Experiments

Conclusion

Limitation

Question