
This module defines the common dataset IO interfaces.


The common interface of importing a dataset.

class ImportHelper[source]

Bases: object

A mixin class that adds helper functions to import a dataset.

static format_image_data(uri: str, thumb_uri: Optional[str] = None, width: Optional[int] = None, height: Optional[int] = None, id_: Optional[int] = None, metadata: Optional[dict] = None, flag: int = 0, flag_ts: int = 0)[source]

A helper function to format image data.

static format_annotation(category: str, label: str = LabelName.GroundTruth, label_type: str = 'GT', conf: float = 1.0, is_group: bool = False, bbox: Optional[Tuple[int, int, int, int]] = None, segmentation: Optional[List[List[int]]] = None, alpha_uri: Optional[str] = None, keypoints: Optional[List[Union[float, int]]] = None, keypoint_colors: Optional[List[int]] = None, keypoint_skeleton: Optional[List[int]] = None, keypoint_names: Optional[List[str]] = None, caption: Optional[str] = None, confirm_type: int = 0)[source]

A helper function to format annotation data.

class Importer(name: str, id_: Optional[str] = None)[source]

Bases: ImportHelper, ABC

The importer interface. Any subclass of Importer should implement the following methods:

  • __init__: do the initialization works.

  • __iter__: yield a tuple of image and annotation list in every iteration.

And the following methods are optional:
  • pre_run: a hook before the importing process.

  • post_run: a hook after the importing process.


A pre-run hook for subclass importers to prepare data.


A post-run hook for subclass importers to clean up data.

on_error(err: Exception)[source]

A hook to handle error.


load existing user added data from mongodb, so they are not lost when re-importing the database.


Save manually added user data back.


The main process of importing the dataset. This Iterates over the dataset and import every image and annotations.


The start point of the importing process.

class FileImporter(path: str, name: Optional[str] = None, id_: Optional[str] = None, enforce: bool = False)[source]

Bases: Importer, ABC

The importer interface for file-based dataset. In addition to abstract methods defined in base Importer class, any subclass of FileImporter should implement the following methods:

  • can_import: a static method, check if the given path can be imported by this importer.

And these methods are optional:
  • collect_files: collect the files related to this dataset, {file_tag: file_path}.

    By default, this function returns {LabelName.GroundTruth: dataset_file_path}. If there are other related files, such as prediction files, they should be collected here too.

collect_files() dict[source]

Collect the files related to this dataset, {file_tag: file_path}.

abstract static can_import(path: str)[source]

Check if the given path can be imported by this importer.


The start point of the importing process.

classmethod get_subclasses()[source]

Get all subclasses of this class. This is used together with can_import function to choose a proper importer for a given path.

choose_importer_cls(target_path: str) Optional[Type[FileImporter]][source]

Choose the proper importer class for target_path. The right importer is the importer class which returns true on importer_class.can_import(target_path).


target_path – the target path to import, either a dataset or a dataset group.

import_dataset(target_path: str, enforce: bool = False) DataSet[source]

Choose the right auto importer for target path, and run the import task.

  • target_path – the target path to import, either a dataset or a dataset group.

  • enforce – enforce the import task, even though the dataset is imported into mongodb before.