Here at AutonoMusings, we’ve just returned from an exciting week at the annual Computer Vision and Pattern Recognition (CVPR) conference in Salt Lake City that brought together some of the most prominent computer vision and perception researchers.
While Josh offers commentary on a few robotics-related papers from CVPR, I’ll just add one small observation: China, India, and Germany seem to dominate in terms of sheer numbers of women engineers. Almost all of the female grad students that I met at the conference were from one of those countries, whereas women from the US seemed to be sorely underrepresented by comparison. What’re they doing differently?
Without further ado, Josh’s favorite papers from CVPR:
This paper focuses on finding an efficient way to use point cloud data in convolutional neural networks for scene segmentation and classification without a lot of pre-processing. What’s really interesting however, is that the method they describe actually makes it easier to combine LIDAR and camera data in a single network – which could benefit applications in robotics and self-driving vehicles that make heavy use of both.
Efficient, real-time Simultaneous Localization and Mapping (SLAM) is a key part of how robots use sensor data to determine their location and orientation in space. This research takes advantage of the fact that depth data is highly structured: individual points are usually about the same distance away as the points nearest them and lots of geometries repeat in different scenes. This insight allows the authors to create a simpler, smaller representation of the environment – saving on computation and memory, but still capturing the right details for effective dense SLAM.
As computer vision moves from still images to video, most techniques still require decomposing video streams back into individual images. But working with a series of individual images can require up to two orders of magnitude more storage, bandwidth, etc. – which seems silly in a world where video compression has been so well-optimized. This paper outlines a method for making convolutional neural networks (CNNs) work directly with compressed videos, even outperforming their still-image counterparts.