Isidro Jr
Member
You can create a labeling system that takes in "video" and on the video itself you can manually label the objects of interest (drawing a bounding box) and then select the number of frames (e.g. 20 frames) you would like to use for training. This step bypasses labeling manually on 20 frames, but it is still done at the frame level (backend code), but in essence you are only doing it on 1 screen shot (which corresponds to 20FPS, e.g. 20 images in 1 second).Wasn't the idea video would let them do far LESS supervised labeling?
Like when they did it frame by frame you had to label an object TRUCK in every frame but with video it would understand if a human labels a thing TRUCK in frame 1, the system can self-label that same object in future frames so long as it remains visible- thus saving a ton of human effort?
Now, take everything I wrote above and instead of manually labeling, you have the system which is already trained (your model) make inferences on where the object of interest is and create more training data this way.