In order to solve complicated computer vision tasks, large labelled datasets are required for supervised machine learning. Person detection and human pose estimates have come a long way in recent years, thanks to large-scale labelled datasets. However, these databases did not include any guarantees or analyses of human activities, poses, or context variety. Furthermore, issues about privacy, legality, safety and ethics may limit the ability to acquire more human data.
As a result, a recent study published on arXiv.org offers a synthetic data generator that is human-centric. The suggested generator allows for a wide variety of simulations for reality domain gap research, such as model training strategies and data hyper-parameter search. The research is a huge contribution towards Hyper-scale Data Center Market as it would enable people to generate their own Human-based data generators, thus improving the quality of data analysis available on these platforms.
It includes a variety of 3D person models with various attributes. Distractors and occluders are given via a set of object primitives. Lighting, camera settings, and post-processing effects are all under the researchers' control. The team set out to help the community lower the barrier to entry by assisting them in creating their own version of a human-centric data generator.
The data generator contains a simulation-ready camera system, parameterized lighting, 3D human assets. Further, it generates 2D and 3D semantic segmentation, instance, bounding box, and COCO pose labels. The team stated they utilized PeopleSansPeople in order to perform benchmark synthetic data training, a Detectron2 Keypoint R-CNN variant via synthetic data as a benchmark.
Researchers found that pre-training a network with the help of synthetic data and fine-tuning it on real-world target data resulted in a keypoint AP of 60.370.48 (COCO test-dev2017). The system acquired the capability to outperform models trained with the same real data alone (keypoint AP of 55.80) and pre-trained with ImageNet (keypoint AP of 55.80).
The team added that this freely available data generator should enable a wide spectrum of study in the crucial domain of human-centric computer vision, including simulation and actual transfer learning.