Computer vision modeling has relied on using images broken down into various shapes and colors. This however fails to correctly identify familiar objects when in variations of the model. Say a curled up cat or a dog with a Santa hat.
In her 2015 TED talk, computer scientist Fei-Fei Li highlighted artificial intelligence’s limitations. Animal visual processing took millions of years to enable naming objects, inferring spatial relationships, and detecting emotion. Children learn to see by amassing images acquired a picture every 200 millisecond, thus building their repertoire through intensive life training.
Together with her team at the Stanford Vision lab, she shifted focus using training data akin to what a child sees day to day. To that end, she harnessed the power of online crowd sourcing platforms to obtain help from close to 50000 users to sort a billion images. This led to a database of 15 millions images in 22000 categories made free to researchers and consultants using it in image processing applications.
The existing convolutional neural network algorithm that relies on neural like hierarchical nodes was fed the gathered data leading to significant improvements in processing busy images. They next focused on teaching computers to describe what they see with mixed success, generating one of the first models able to use human like sentences in describing images it sees.
The implications of this ongoing research are not limited to consumer electronics applications, and will no doubt lead to improvements in robotics, visual assistance software, surgical optical instruments, and space exploration.