My top six talks from CVPR 2017

My top six talks from CVPR 2017

CVPR2017-conference-Computer-vision

Semantic and instance-level segmentation, salient object detection, single- and multi-human pose estimation, face recognition, action recognition, visual question answering, zero-shot learning, generative adversarial networks (GAN) are only some of the hot topics discussed in the 2017 IEEE Conference on CVPR that captured my interest. Now I cannot resist the urge to make my list of the best talks from the conference.

  1. I had thought that face recognition could be regarded as an almost solved problem, but it seems that it is not. On this subject, there were several papers aiming at reconstructing human faces from either occluded, or very low-resolution images and I was quite surprised at their successful results. To mention one example: the work developed by Xin Yu, Fatih Porikli from Australian National University on ‘Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders’. During the conference I asked them if their approach could be applied to human recognition in Video Surveillance Systems. Unfortunately, the researchers already tried this approach in that context with no success, but we may see improvements in future.

  1. Another interesting paper entitled ‘Universal Adversarial Perturbations’ was presented by Moosavi Dezfooli et al. This paper generated a lot of attention since it was about fooling current state-of-the-art architectures for image classification across the data; this means to find not only specific perturbations for each image, but also universal ones. I had an informative discussion with the author of this paper, which helped me to understand better the topic. It is interesting that even re-training the network with such a perturbed image does not improve its robustness. However, I do not think this is something we have to worry about now when applying deep networks as the perturbations do not normally appear in images and especially in those from controlled sources. Nevertheless, we will definitely consider this topic further and follow any future research in this area.
  2. I really enjoyed the idea presented by Iasonas Kokkinos in his work ‘Ubernet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory’. This paper was about training a single network for multiple tasks such as semantic segmentation, object detection, boundary detection, etc.

  1. Huang et al. received the Best Paper Award for their ‘Densely Connected Convolutional Networks’ work. Their DenseNet architecture pushes the borders of deep nets ever further using the idea of ResNet. It has the potential to become another well-known and widely used convolutional architecture. The idea behind the work is to concatenate all the outputs of convolutional layers in one block between pooling, which pushes the ResNet architecture even further. The basic idea of DenseNet seems simple and effective.

  1. Joseph Redmon’s talk about ‘YOLO9000: Better, Faster, Stronger’ was great, with an engaging presentation including a live demo and it received the Best Paper Honourable Mention Award. Significant improvements have been achieved compared to the first version of YOLO. I really enjoyed the idea about hierarchical classification: if the system is confident about its prediction it goes down through the hierarchy but once it is not confident, it outputs the higher-level class. I believe this idea may also be applicable in zero-shot learning problems. I enjoyed their demo on a smartphone, even if I was slightly disappointed by the computational requirements of many of the presented works at the conference. It looks like several of the authors were unconcerned about this as they often remarked, “Yes, it is suitable for real-time application. You can see it on this laptop with two GPUs…” The demo of Redmon and Fahradi was a nice exception to this trend.
  2. The last talk of the main conference was a keynote speech given by the MIT neuroscientist James Di Carlo. He discussed how the problem of reverse engineering of the human mind can be solved by combining the effort of neuroscience and cognitive sciences and forward engineering that aims to emulate the human mind (where he focused on its capacity for object categorization and detection). Di Carlo showed how recent achievements in deep neural network helped to understand the functioning of the human mind. However, the most recent deep neural network architectures (the ones after AlexNet) deviate from this trend. They achieve better results on ImageNet but do not explain the brain better than those in the past did.

In conclusion, the conference was great. I really enjoyed the many interactions and chances for discussion with the authors and I have learned a lot. It is great that more than half of the authors made their code available online so others can easily build on top of their work to increase the collective knowledge and accelerate the achievements in the Computer Vision area. I am definitely looking forward to explore several of these methods in our laboratory as part of the development within the Cognitive Hub project and I hope to take part in the next 31st IEEE Conference in Computer Vision and Pattern Recognition in Salt Lake City.

Of course, this list of top talks is only a portion of the great many superb papers delivered at CVPR 2017, as I did not want to make this post too long. But I am curious to read others’ comments and opinion on the topics and talks presented in CVPR. Mahalo!

Pavel Dvorak
Pavel Dvorak
Research Specialist CV Area
X