Our subjectivity, the unique tastes, feelings and opinions found within each of us, has long been thought to be what makes us human. As opposed to the visible performances of our outward selves, which could be–and in fact have been–replicated by automated machines, these aspects of our personality are considered to be deeper, impenetrable even to the all-seeing eye of technological progress. Rather than precise mechanisms to be deconstructed and understood by scientists, their elucidation was left to the words of philosophers, poets, and novelists alike.
Times, however, have changed, and Google’s NIMA (short for “Neural Image Assessment”) is just one more reminder that nothing is too sacred to be performed, predicted, and ultimately replaced, by complex computing algorithms.
“Our proposed network can be used to not only score images reliably and with high correlation to human perception, but also it is useful for a variety of labor intensive and subjective tasks such as intelligent photo editing…” -Hossein Talebi, Google Software Engineer
Built from Google’s “convolutional neural networks” (CNNs)–artificial intelligence machines taught to recognize objects, such as car, by being fed millions of images of various cars–NIMA is “trained to predict which images a typical user would rate as looking good.” As opposed to restricting itself to recognizing visible qualities, that is, NIMA proposes to extrapolate from an image to what you would think about that image. Unsurprisingly, it has proven to be widely accurate.The above image displays the results of various tests undertaken by NIMA to predict expert’s scoring in a digital photography contest. The number on the left is the score (from 1-10) generated by NIMA, while the number in parenthesis is the score actually attained through an average of 200 human’s ratings. As you can see, many predictions fall within a tenth of a point, with only one outlier having a difference of over a point. For a first iteration, this is truly impressive. NIMA can also be used to compare different images of the same subject, ranking them according to levels of distortion. Above is various pictures of the same sailboat which NIMA then collated according to pictorial accuracy.
Lastly, NIMA’s scores can be used, when maximized as part of a loss function, to enhance “perceptual quality of an image.” Given their accuracy in predicting human reactions to an image, the thinking goes, paying close attention to these scores will be a particularly useful tool for automating image enhancement in a way that will be consistently successful and appear “real” to its eventual human recipients’ retina.
Now that Google is creating algorithms capable of acting as “reasonable, though imperfect, proxies for human taste in photos and possibly videos,” what’s left for us? Truth be told, not much. NIMA, they tout, can eventually (if not already) “find the best picture among many,” improve picture-taking with “real-time feedback,” and, as far as post-production goes, “guide enhancement operators to produce perceptually superior results.”
It can not, however, recognize meaning, and that may be the key to remaining relevant. While NIMA works effectively as a tool to curate the aesthetic possibilities of an image, guiding your hand to create the most appealing picture possible, it can not tell you what to photograph. The context in which we as artists operate is forever shifting, as is the relative importance of our various cultural symbols. Determining what is worthy of being transformed from a flighty, evanescent event into a timeless artifact testifying to its own reality is, when done correctly, a reflection on this context, and something that can not be made concrete, able to be predicted with accuracy.
Though of course we once said that about subjectivity, too, didn’t we?
Feature Image Courtesy Arthur Osipyan