Outlines

  1. Section 1 compares videos generated by $\texttt{T2V-Turbo}\text{ (VC2)}$ and $\text{VCM (VC2)} + \mathcal{R}_\text{img}$
  2. Section 2 presents videos corresponding to Figure 1 of the paper.
  3. Section 3 presents videos corresponding to Figure 4 of the paper.
  4. Section 4 presents videos corresponding to Figure 6 of the paper: ablation study on the choice of $\mathcal{R}_\text{img}$
  5. Section 5 presents videos corresponding to Figures 7 & 8 of the paper: qualitative comparison results for our $\texttt{T2V-Turbo}\text{ (VC2)}$
  6. Section 6 presents videos corresponding to Figures 9 & 10 of the paper: qualitative comparison results for our $\texttt{T2V-Turbo}\text{ (MS)}$

1. Comparing videos generated by $\texttt{T2V-Turbo}$ and $\text{VCM} + \mathcal{R}_\text{img}$

**Prompt: A panda standing on a surfboard in the ocean in sunset.

Untitled

$\qquad\qquad\quad\texttt{T2V-Turbo}\text{ (VC2)}$

Untitled

$\qquad\qquad\quad\text{VCM}\text{ (VC2) } + \,\mathcal{R}_\text{img}$

Analysis: Left The panda is indeed standing on the surfboard. Right The panda is sitting on the surfboard.

Prompt: A raccoon is playing the electronic guitar

Untitled

$\qquad\qquad\quad\texttt{T2V-Turbo}\text{ (VC2)}$

Untitled

$\qquad\qquad\quad\text{VCM}\text{ (VC2) } + \,\mathcal{R}_\text{img}$

Analysis: The right video can only generate a plausible raccoon but fails to model the activity of playing the electronic guitar.

******Prompt: A motorcycle accelerating to gain speed

Untitled

$\qquad\qquad\quad\texttt{T2V-Turbo}\text{ (VC2)}$

Untitled

$\qquad\qquad\quad\text{VCM}\text{ (VC2) } + \,\mathcal{R}_\text{img}$

Analysis: The motorcycle on the right is actually moving backward.

******Prompt: A squirrel eating a burger

Untitled

$\qquad\qquad\quad\texttt{T2V-Turbo}\text{ (VC2)}$

Untitled

$\qquad\qquad\quad\text{VCM}\text{ (VC2) } + \,\mathcal{R}_\text{img}$

Analysis: Compared to the left video, the squirrel in the right video is more like holding a burger without the eating motion.