process

deepdataspace.process

This module defines common interface and implementations of processing a dataset.

calculate_fnfp

deepdataspace.process.calculate_fnfp

This module defines the processor for calculating false negative and false positive analytics for dataset.

class FNFPCalculator(dataset_path: str, enforce: bool = False)[source]

Bases: BaseProcessor

This processor calculates false negative and false positive analytics for dataset.

classmethod dependencies() List[str][source]

This processor depends on nothing.

classmethod should_auto_run() bool[source]

This processor should run automatically on program start-up.

can_process()[source]

This processor can process any dataset.

static calculate_detection_thresh(dataset_id: str)[source]

For each label set,

static calculate_detection_result(dataset_id: str, label_id: str, thresholds: List[Dict])[source]

For given label set, calculate fn/fp analytics for each image with given precision thresholds.

process_dataset()[source]

The major steps of calculate fnfp for the dataset.

task_func = <@task: FNFPCalculator of dds>

processor

deepdataspace.process.processor

The common interface of processing a dataset.

class ProcessorMeta(name, bases, attrs)[source]

Bases: type

Metaclass of a process class. This metaclass will: - register all process classes - resolve the dependency links between them - register the task function of each process class

processors = [<class 'deepdataspace.process.calculate_fnfp.FNFPCalculator'>, <class 'deepdataspace.plugins.tsv.process.RankByFlags'>]
name2class = {'FNFPCalculator': <class 'deepdataspace.process.calculate_fnfp.FNFPCalculator'>, 'RankByFlags': <class 'deepdataspace.plugins.tsv.process.RankByFlags'>}
static register_class(name: str, cls: ProcessorMeta)[source]

Find all processor classes, and order them by dependency links. The depended processor will be prior to all depending processors.

class BaseProcessor(dataset_path: str, enforce: bool = False)[source]

Bases: object

The common interface for processing a dataset. Any subclass should implement all abstract methods.

Processors may be executed asynchronously by celery. To do so, the processor class should implement the register_task_func function, which returns a celery task.

task_func = None
abstract classmethod dependencies() List[str][source]

What processors this processor is depending on.

abstract classmethod should_auto_run() bool[source]

Should this processor automatically run at program start?

abstract can_process()[source]
property dataset
property is_processed: bool

Check if the dataset is processed before.

property should_process

Check if the process task should be run.

process_dataset()[source]

Process a subset of this dataset. Derived class should implement this interface accordingly.

update_dataset_status(status)[source]
process_dataset_context()[source]
run() Union[None, Dict[str, Any]][source]

The function invokes the whole processing procedures. This stars the processing of dataset directly in current thread. If you want to process the dataset asynchronously, use run_async instead.

static update_task_status(task_id, update_data: dict)[source]
static on_async_start(task)[source]

This function is called before the processor is executed by celery.

Parameters:

task – celery task instance.

static on_async_success(task, retval, task_id, args, kwarg)[source]

This function is called if the processor is executed by celery successfully.

Parameters:

task – celery task instance.

static on_async_fail(task, exc, task_id, args, kwargs, einfo)[source]

This function is called if the processor is failed to be executed by celery.

Parameters:

task – celery task instance.

classmethod register_task_func()[source]

This function registers the process class as a celery task.

classmethod run_async(dataset_path: str, enforce: bool)[source]

Run the processor asynchronously by celery.

process_dataset(dataset_dir: str, enforce: bool = False, auto_triggered=False)[source]

Process the dataset with all registered Processors.

Parameters:
  • dataset_dir – the dataset dir to be processed.

  • enforce – enforce the import task, even though the dataset is processed before.

  • auto_triggered – is this function called automatically on program start up?