Performance Evaluation

TRIOSlib offers some methods to measure the performance of trained image transforms. In order to compute more reliable performance measures we recommend reserving some input-output pairs exclusively for this task in a test set. Measuring performance in the input-output pairs used during training is known to produce optimistic values that do not correspond to performance in unknown images.

In all the examples in this page we set trios.show_eval_progress=False to hide the progress messages in the evaluation code. We also use the trained image transform from Using trained transforms and the jung dataset from Introduction.

The simplest way of measuring performance is using the WOperator.eval method. It receives as input a trios.Imageset and returns the pixel error of the image transform. That is, it computes the proportion of pixels that were assigned an incorrect gray-level.

The keyword argument binary=True computes binary evaluation measures (True Positives, True Negatives, False Positives, False Negatives) and returns them in that order.

WOperator.eval also supports computing accuracy independently for each image. Just pass the keyword argument per_image=True to obtain a list containing the performance measures for each image. See the code below for an example of use of these measures.

# file docs/examples/using_woperator_eval.py
import trios.shortcuts.persistence as p
import trios.shortcuts.evaluation as ev
import trios

trios.show_eval_progress = False

if __name__ == '__main__':
    operator = p.load_gzip('trained-jung.op.gz')
    testset = trios.Imageset.read('jung-images/test.set')

    print('Error:', operator.eval(testset))
    print('Binary:', operator.eval(testset, binary=True))
    print('Error per image:', operator.eval(testset, per_image=True))
Error: 0.005500750102286675
Binary: (11721, 185163, 563, 526)
Error per image: (0.005500750102286675, [(118, 19767), (95, 21930), (115, 18197), (72, 18931), (116, 20463), (67, 19107), (154, 19646), (54, 20293), (164, 20033), (134, 19606)])

In the code above we called eval several times with different parameters and each time it applied the image transform to the same test images, which is necessary since WOperator.eval does not save the results. Besides the obvious waste of time (specially for large test sets), visually inspecting the result images is a very good way of getting insight on what the image transform is doing and how to improve its performance.

A better way of evaluating performance is by using the functions in the trios.shortcuts.evaluation module. In all examples we import this module as ev, so we will refer to functions in this namespace using the ev. prefix.

We can use the ev.apply_batch(op, testset, result_folder) function to apply an image transform to all images from a testset and save them in the specified folder. Then, we can call ev.compare_folders(testset, result_folder, window) to compute the same performance measures of WOperator.eval. Do not forget to pass operator.window to ev.compare_folder! Since the estimated image transforms are local we do not evaluate when the neighborhood selected falls off the image.

See the code below for a simple example.

# file docs/examples/evaluation_functions.py
import trios
import trios.shortcuts.persistence as p
import trios.shortcuts.evaluation as ev

trios.show_eval_progress = False

if __name__ == '__main__':
    operator = p.load_gzip('trained-jung.op.gz')
    testset = trios.Imageset.read('jung-images/test.set')

    ev.apply_batch(operator, testset, 'jung-result')
    print('Error:', ev.compare_folders(testset, 'jung-result', operator.window))
    print('Binary:', ev.compare_folders(testset, 'jung-result', operator.window, binary=True))
    print('Error per image', ev.compare_folders(testset, 'jung-result', operator.window, per_image=True))
Error: 0.00550075010229
Binary: [ 11721 185163    563    526]
Error per image (0.0055007501022866752, [(118, 19767), (95, 21930), (115, 18197), (72, 18931), (116, 20463), (67, 19107), (154, 19646), (54, 20293), (164, 20033), (134, 19606)])

Finally, we can compute Recall, Specificity, Precision, Negative Preditive Value and F1 measure by calling ev.binary_evaluation(TP, TN, FP, FN), where TP, TN, FP, FN were obtained by calling WOperator.eval or ev.compare_folders with binary=True. The example below prints a performance report based on these measures for problems with binary output images.

# file docs/examples/performance_report.py
import trios.shortcuts.persistence as p
import trios.shortcuts.evaluation as ev
import trios 

trios.show_eval_progress = False

if __name__ == '__main__':
    operator = p.load_gzip('trained-jung.op.gz')
    testset = trios.Imageset.read('jung-images/test.set')

    mes = ev.compare_folders(testset, 'jung-result', binary=True)
    acc, recall, precision, specificity, neg_pred, F1 = ev.binary_measures(*mes)

    print('''
Accuracy: %f
Recall: %f
Precision: %f
Specificity: %f
NPV: %f
F1: %f'''%(acc, recall, precision, specificity, neg_pred, F1))
Accuracy: 0.994369
Recall: 0.954945
Precision: 0.954168
Specificity: 0.996972
NPV: 0.997025
F1: 0.954557