• Training Image Augmentation by Color

    Data augmentation:

    in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It acts as a regularizer and helps reduce overfitting when training a machine learning model. from Wikipedia

    The images I have don’t have any uniform distribution of colors per block type - so I was afraid my networks learn to recognize the block type by their colors.

    One common advice I’ve got was to augment the data by manipulating the colors somehow.

    (more...)

  • Debugging Machine Learning

    Neural networks are close to being black boxes. You train a network and check if it works - but if it doesn’t there is no debugging to find out the cause of that. You can only guess, and try with different data, and try some other random changes to see if that improves the score.

    I have been training my networks on batches of data - and the accuracy of its predictions were improving. Untill it stopped. I did not know why. I suspected that maybe the data is biased and the networks learned something they shouldn’t (that is overfitting probably) and the new batches are different so it does not work on them. I tried measuring that bias - as described in Unwanted Correlations and it proved difficult. But then I noticed that the images that don’t work have the special property that they end up in my data pipeline as rectangles - while all the rest is processed into squares.

    (more...)

  • Unwanted Correlations

    I am training neural networks to detect some objects by their shapes. The problem is that the photo sets are not color-balanced - that is there are correlations between the object color and the object type - so the networks are probably learning it instead of recognizing the shapes.

    I would like to have two tools - fist is to measure the bad correlation in the data set. The second is a tool for helping choosing photos from a photo set to create a more color balanced data set. The second one might be difficult.

    This looks like a very common and generic problem - but I have trouble in finding good solutions for it.

    (more...)

  • My script for installing Python TensorFlow Object Detection libs

    I had a lot of trouble installing the Python TensorFlow libs required for object detection in Ubuntu - then I wrote a script that works for me and my collaborators (Ubuntu 20.04 and 21.04):

    Python TensorFlow Object Detection installation

    It installs all the system prerequisites (like protobuf), CUDA, the python libs, and also downloads one neural network (because I worked with it).

    (more...)