My Favorite Publication

My Favorite Publication

Publication (Recent & Selected)
Full Publication by Year

The titles below may not be exactly the same as in the papers. They are used in order to highlight the key ideas/contributions. Click [more] for details of the paper. Click [paper] to download the PDF right away. Click [video] for the high-quality project video. For moderate quality video on youtube, please click [youtube]. Click [software] for the associated software. Click [press] for related press clipping.

Mononizing Binocular Videos (SIGGRAPH Asia 2020)

Here we present a fully backward compatible solution to represent a binocular video as an ordinary monocular video. So that, it can be played back, compressed with standard video codec, transmitted, just like any ordinary monocular video. The only uniqueness is that it can also be optionally restored back to its original binocular form, whenever stereoscopic playback is needed. We achieved this by employing the InvertibleX Model. In addition, compressing our mononized video even outperforms the state-of-the-art multiview video encoding.
[more] [youtube] [paper] [code]

Manga Filling with ScreenVAE (SIGGRAPH Asia 2020)

While automatic converting color comic to screened manga is doable, translating a bitonal screened manga to color comic is never done automatically before. The major bottleneck lies on the fundamental difference in characterizing how a region is filled. While a color can be characterized at a single point, a screentone has to be characterized by a region. To enable the effective automatic transation between the color comic and the screened manga, we propose to unify such fundamental difference, by introducing an intermediate representation, ScreenVAE map, which converts the region-wise screentone to a point-wise representation.
[more] [paper] [code]

Video Snapshot (a real live photo) (IEEE TPAMI 2020)

While iPhone keeps a short video for each "live photo", we propose a method to embed a short video into a single frame, in other words, a real live photo. We employ the InvertibleX Model to first encode the neighboring frames into a single visualizable frame. Whenever the video is needed, a decoding subnetwork can be used to expand (restore) it.
[more] [youtube] [paper] [code]

Color-Consistent and Temporal-Coherent Colorblind-Shareable Videos (SIGGRAPH Asia 2019)

Due to the local nature of CNN models, it is hard to ensure the recoloring is consistent over the whole image. To solve this problem in our synthesis of colorblind-shareable videos, we propose to utilize deep learning in indirectly generating parameters of a polynomial color model. This guarantees the recoloring is globally applied to the whole image, while ensuring temporal coherent.
[more] [paper]

Invertible Grayscale (SIGGRAPH Asia 2018)

Here we present a method to convert arbitrary color images to grayscale. That is not interesting, of course. The interesting part is that such grayscale images can be inverted back to color images accurately, without any guessing. We present a learning-based model that offers such color-to-gray conversion and grayscale-to-color restoration abilities. This is the first realization of the Invertible Generative Model.
[more] [paper] [video] [code]

Two-Stage Sketch Colorization (SIGGRAPH Asia 2018)

With the advances of neural networks, automatic or semi-automatic colorization of sketch become feasible and practical. We present a state-of-the-art semi-automatic (as well as automatic) colorization from line art. Our improvement is accounted by a divide-and-conquer scheme. We divide this complex colorization task into two simplier and goal-clearer subtasks, drafting and refinement.
[more] [paper] [video]

Deep Unsupervised Pixelization (SIGGRAPH Asia 2018)

Generating pixel art from a given input can be regarded as a kind of style transfer, and seems to be solvable with existing deep learning. But the real difficulty is the lack of supervised data (thousands of image pairs of high-resolution input and low-resolution pixel art). This is why we need an unsupervised learning framework for this pixelization goal.
[more] [paper]

Deep Extraction of Manga Structural Lines (SIGGRAPH 2017)

Removal of problematic screentone from manga has long been an open problem but strongly needed, as the digitization process can be significantly simpilified. It is until the mature of deep learning, we finally be able to remove the irregular, regular, arbitrarily scaled, or even pictorial screentones with a single unified solution.
[more] [paper] [software]

Pyramid of Arclength Descriptor (PAD) (SIGGRAPH Asia 2016)

We started with a simple goal, filling space with shapes to exhibit strong intercoupling appearance, like what M.C. Escher did. But it turns out to be so hard to be practical (tiling arbitrary shapes within a tractable time). After a decade of research, our solution is a brand new shape descriptor. It is locally supported, scale invaraint, suits for partial-shape matching, and more importantly, efficient to be practical. It will be very useful for many shape recognition problems.
[more] [paper] [video] [video 2]

Visual Sharing with Colorblinds (SIGGRAPH 2016)

Modern TV are not designed for colorblinds, leading them hard to share the same TV with families with normal vision. We propose the first method to allow colorblinds and normal vision audiences to seamlessly share the same display, thanks to the wide availability of binocular TV.
[more] [paper] [video]

Globally Optimal Toon Tracking (SIGGRAPH 2016)

Tracking the corresponding regions throughout a cartoon sequence is necessary in postprocessing, such as colorization and stereoscopization. But it is not available from animators and vision technqiues do not work well for cartoons. By formulating the region tracking as a global optimatization problem and model the region motion trajectory, we can significantly raise the accuracy to a usable level.
[more] [paper]

Closure-aware Sketch Simplification (SIGGRAPH Asia 2015)

Existing methods in simplifying a sketch mainly consider the distance and orientation similarities of strokes. Thresholding on them usually results in unsatisfactory simplifcation. In fact, humans also rely on the depicting regions in understanding whether individual strokes are semantically refering the same stroke. But regions are formed by strokes, and strokes are interpreted by regions. See how we solve the chicken-or-the-egg problem, by considering the closure gestalts.
[more] [paper]

Stereoscopizing Cel Animations (SIGGRAPH Asia 2013)

While 3D movies are popular nowadays, it is impractical for cel animators to hand-draw stereo frames. Geometry modeling cartoon characters not only costly, but also requires highly trained skill for lively and "organic" presentation of cartoon characters. It would be the best if cel animators remains hand-draw their monocular cels, while leaves our system to automatically turn the cel animations into stereo.
[more] [paper] [video] [3D video]

Binocular Tone Mapping (SIGGRAPH 2012)

Tone mapping usually faces a dilemma of looking flat or loosing details. But what happen if we are given two display domains as in binocular display? We can present both detail and high-contrast content. The question is whether these two views can be stably fused