AI: The Era of Big Integration
Unifying Disciplines within Artificial Intelligence
by Song-Chun Zhu
Download eBook
AI - The Era of Big Integration

What is cognitive AI?

Learn about cognitive AI through six disciplines.


The promise of artificial intelligence (AI) with human-level cognition has been beyond our grasp for the past sixty years. Over the past decade, advances in AI have created breakthroughs in many of the field’s most historically intractable challenges: speech-to-text, language translation, image and pattern recognition. With such advances, how far can we possibly be from surmounting all remaining obstacles and building machines that not only think like humans but machines that will improve our lives by reducing drudgery and by solving problems great and small?

In this study, Professor Song-Chun Zhu of UCLA offers his perspective on the history, current state, and path forward for next generation cognitive AI. He dispels the myth that human- level AI is already solved; in some regards, he argues, we have barely begun. He also imagines and illustrates a future in which humans and machines collaborate with a kind of mutual "understanding." Written in plain language, this study seeks to draw AI newcomers into the field, while including enough technical material to keep an AI insider engaged.

In all, Professor Zhu explains a paradigm shift that moves away from big data for small tasks towards small data for big tasks. In the process, Professor Zhu challenges today’s "ABCD" conception of AI:

AI ≠ Big Data + Computing Power + Deep Learning

To appreciate the differences between human and artificial intelligences, let's, for a moment, compare crows and parrots. Like deep learning networks and other algorithms that use big data, parrots mimic the sounds of the world around them.

But does mimicry, does imitation suggest authentic learning, particularly the mastery of concepts and the application of those concepts to new contexts? It is doubtful. Similar to humans, crows, on the other hand, armed with spatio-temporal-causal reasoning, with a sense of how things work, observe the world with singular intent.

In the case of human intelligence, we are capable of imagining the thoughts of others. This capacity gives us the power to reason not only about space, time, and the physics of cause and effect, but also about the intent and values of those around us. Such social reasoning is the basis of communication and the ultimate prize of language.

Human-level AI must be built with all these capabilities.

Raised in rural China, Song-Chun Zhu completed his Ph.D. at Harvard and Brown with Professor David Mumford, a Fields Medalist. Zhu’s unique Sino-American perspective on AI covers as much range culturally as it does scientifically and informs the body of work he has built in AI over twenty-five years, including the last fifteen at UCLA.

Zhu makes a case for integrating the disparate disciplines that comprise the AI research fields and sets a course that may yet bring AI to the next level on the ladder to true human-level cognition.

If you believe human-level AI is already here because your mobile phone answers when you speak, this study will clarify just how far we have to go.

If you believe human-level AI is impossible, this work may just change your mind.

Object and Pattern Recognition,
Scene Understanding, Image Processing
and Activity Analysis
Computer Vision is the most important source of information for the human brain, and the "entrance hall" of AI. My research started here. As a vast and complicated discipline, many problems in computer vision are far from being solved.
Processing physical and
social "common sense"
Intellectual dark matter already belongs to the combination of perception and cognition. From there we enter the mind, the inner world of humans and animals. The inner world reflects the external world, similarly impacted and distorted by the motive of tasks.
Voice recognition and
synthesis, text and dialogue
analysis and generation
The human language center is unique. Interestingly, it is in the vicinity of the action planning area. Why do we talk? The origin of language is to convey a message from a person’s mind to the mind of another person, which includes the ambient knowledge discussed in the previous section, the intention of the plan, summarized as the figure of the three triangular expressions. The hope is to form a consensus through dialogue, to form common task planning, that is, we act in concert. Therefore, the basis of language is that people are attempting to cooperate.
Interaction, Confrontation and
Cooperation of Multi-Agent Systems,
Game Theoretic Equilibria and Social Norms
To communicate with people, a robot must understand human values. Philosophy and economics have a basic assumption that a rational person’s behavior and decision-making are driven by the maximization of his own interests. If you rule out the possibility of deceit, observing a rational person’s behavior and choices, allows you to reverse engineer his reasoning and learning, and estimate his values.
Motor control, design,
motion planning
Robotics is the platform of large tasks. Not only does it dispatch tasks such as visual recognition, language communication, and cognitive reasoning, but also expends a lot of effort to change the environment. People and robots need to perform tasks, tasks that can be broken down into a series of actions, and each action is meant to change the fluent of their environment.
Statistical modeling and
predictive analytics
The other five areas are "problem areas" at various levels, called Domains. We attempted to put these questions in one framework, to think and seek a unified expression and algorithm. Machine learning is designed to research and solve the "methods", studying how to fit, to acquire the necessary knowledge.
Through 6 disciplines
Unifying Disciplines within AI