Coalescent Mobile Robotics’ robots help supermarkets by moving trolleys to help its staff. Specifically, our robots can move, dock trolleys, move docked trolleys and undock trolleys. Supermarkets change over time, with different sections and products. Besides, they contain several dynamic objects, such as trolleys, other robots, people etc. Our robots need to understand such dynamic environments for localization and obstacle avoidance, among others. Due to the prohibitive costs of generating a dataset with real data to train and validate our robots for the tasks we want them to learn, we need to build a synthetic dataset. 


We want to be able to automatically generate high-fidelity virtual scenes of supermarkets based on objects found in the real dataset, e.g., shelves, products, trolleys, staff, and customers. These scenes must be generated using a Domain Specific Language (DSL), which must include elements that are deterministic as well as randomized. The DSL generates the scene which is then rendered using the camera’s parameters and pose. From each rendered scene the solution shall provide segmentation, depth and pose information for each one of the objects in the scene. 

Problem Statement

Domain Specific Language

Design a DSL that describes a supermarket scene. The DSL can as well make use of vector graphics to describe it. This DSL will start as a small proposal and throughout the project it will get updated to support additional features and improvements. Thus, the DSL will evolve throughout the project continuously improving it. 

Model Dataset

Create a 3D model dataset from commercial, open-source, and other available repositories containing the main objects present in the supermarket. Just like the DSL, the 3D dataset will evolve throughout the project, initially having a small subset of 3D models, and growing as the project progresses.

DSL Parser

A DSL parser that generates a supermarket scene description that will contain all the objects in the scene, their poses as well as any other information necessary to generate the 3D scene (e.g., lights and materials). The scene description can be a YAML file or another custom DSL and shall contain the full description of the scene. 

Scene Generator

A scene generator that will use the scene description and produce an actual 3D scene that can be visualized using software. The scene generator shall be written for the specific software used, such as Unity 3D, Blender, and Unreal Engine (the specific software will be decided during the execution of the project). As examples, the generator could be based on a Python script using Blender Python API or a C# project using Unity 3D. 

Sample Generator

A sample generator that given a scene and the camera pose and parameters in the scene description generates the following data. 

      – A rendered RGB image 

      – The renderer information (type of renderer used as well as its parameters) 

      – For each (x, y) projected pixel in the image: 

  • the object identifier that the pixel belongs to (if there are two trolleys in the same image, they shall have different identifiers to distinguish them), 
  • the class identifier, and 
  • the depth of the pixel 

      – For each object identifier: 

  • the object’s pose with respect to the camera, 
  • the (x, y) projected points of the eight 3D points that represent the bounding box that contains the object and 

                 -for each point, whether the point is occluded and whether the point is outside the image dimensions, 

  • eight (x, y, z) 3D points of the bounding box. 

By producing many distinct virtual samples, one can generate an entire synthetic dataset to develop and, in the case of supervised learning, train computer vision algorithms. 


  1. Produce a set of assets that are used in the supermarkets. This shall be collected ideally from commercially or freely available 3D object assets. 
  2. A DSL that can generate a scene of supermarkets. The DSL can use SVG or other vector graphics formats to describe the scene. 
  3. A DSL parser that can generate a scene description. 
  4. A 3D scene generator from the scene description. 
  5. A sample generator that, given a 3D scene and a camera pose and parameters, generates a sample containing the data described in the Sample Generator section. 
For more information about the project, please contact Dawon Park at

Join our team – apply today!