Academician Li Deyi
The Four Elements of Machine Cognition
Li Deyi1, Yin Jialun2*, Zhang Tianlei3, Han Wei4, Bao Hong5
1. Department of Computer Science and Technology, Tsinghua University, Professor, Doctoral Supervisor, Academician of the Chinese Academy of Engineering, Academician of the International Eurasian Academy of Sciences, Honorary Chairman of the Chinese Association for Artificial Intelligence. 2. Department of Computer Science and Technology, Tsinghua University, Doctoral Candidate. 3. Founder and CEO of TRUNK.TECH. 4. CEO of Zhongke Yuandongli Company. 5. School of Robotics, Beijing Union University.
Abstract: The author employs the method of cognitive physics, starting from the source of cognition—formalization of cognition, from the "matter theory" of the universe's composition, to the "two elements (matter, structure) theory" of tool composition, and then to the "three elements (matter, structure, energy) theory" of dynamic machine composition, further developing to the "four elements (matter, structure, energy, time) theory" of machine cognition composition. The discussion is deeply expanded, advancing step by step, from the automation of thought to the self-growth of cognition. Using the "Four Elements of Cognition Theory," the author explains typical cognitive events, such as interpreting Einstein's mass-energy equivalence equation, discussing the limitations of Simon's "Physical Symbol System Hypothesis," and the contribution of machines to the precise calculation of pi. With the increasing global use and recognition of ChatGPT day by day, the author predicts that following the normalization of the conversational Turing test, the next milestone for artificial intelligence will be the normalization of the embodied Turing test.
Detailed Abstract: The widespread trial use of ChatGPT in various industries around the world has realized the normalization of the conversational Turing test. Our country was the first to propose a major national development strategy on a global scale—the new generation of artificial intelligence. To meet this challenge, it is necessary to answer Turing's 1950 question from the perspective of cognitive physics: Can machines think? Further questions arise: How do machines think? How do they cognize? Whether it is the cognition of carbon-based humans or silicon-based machines, it is composed of the complex interactions between the four most basic elements: matter, energy, structure, and time. They all depend on negative entropy for life, with structure and time being the cornerstones of cognition. They parasitize on the matter and energy in physical space, forming hard constructs; soft constructs in the cognitive space parasitize on hard constructs or existing other soft constructs, forming a rich hierarchical, multi-scale sense, concepts, information, and knowledge, reflecting the spiritual world. Expanding from the symbolic AI school's notion of "abstraction," from the connectionist school's concept of "association," and from the behaviorist school's idea of "interaction," we stand on the shoulders of scientific giants like Schrödinger, Turing, and Wiener, making abstraction, association, and interaction the core of cognition, with soft constructs interacting with hard constructs. Cognitive machines can be composed of hard constructs such as Field-Programmable Gate Arrays (FPGAs), Data Processing Units (DPU), Computing Processing Units (CPU), Graphics Processing Units (GPU), Tensor Processing Units (TPU), and memory, and can also be implanted with heterogeneous hard constructs like the "infant cognition core" that reflects genetic inheritance, forming embodied machines; on top of which rich, multi-scale soft constructs can parasitize. The order of machines reflects the ability to maintain themselves and generate orderly events, precisely controlling matter and energy with soft constructs to form coordinated and orderly thinking activities. The heterogenous sensors and the speed of thought configured in machines will no longer be confined by the extreme biochemical parameters of carbon-based life forms. They can engage in brute-force thinking through multi-channel, cross-modal perception, maintaining the continuity of cognition with memory, and using memory to constrain computation, thereby generating computational intelligence and memory intelligence within the cognitive space. These machines can self-elevate, self-reuse, and self-replicate, enhancing imagination and creativity. The new generation of artificial intelligence will evolve from the mechanization of mathematics to the automation of thought, and then to the self-growth of cognition. The thinking in the cognitive world and the actions in the physical world will verify each other, embodying the unity of knowledge and action. From the normalization of conversational Turing tests, we will move towards the normalization of embodied Turing tests. Cognitive machines will collaborate with scientists, engineers, and craftsmen to make discoveries, inventions, and creations. Each intelligence will contribute its wisdom, and wisdom will be shared collectively, becoming a super-accelerator for thought and a super-amplifier for embodied actions. Humanity is entering an era of symbiotic co-creation and iterative development in intelligence.
Keywords: Cognitive Physics, Negentropy, Soft Construct, Infant Cognition Core, Brute-Force Thinking, Embodied Turing Test
0 Introduction
Humans possess two spaces. One is the objective, external physical space or world. In this space, humans perceive the universe, recognizing celestial bodies such as the sun, stars, moon, and Earth; natural substances like oceans, rivers, lakes, and mountains; and all life outside of "me," including microorganisms, plants, and animals; as well as artificial objects like tools, tables and chairs, lamps, cars, buildings, books, maps, schools, sculptures, machines, and artificial satellites. All of these are tangible physical existences. We also have a subjective, internal cognitive space or spiritual world, where we need to recognize consciousness, desires, emotions, beliefs, and intelligence, and understand our own feelings, experiences, and perceptions [1].
The consciousness, desires, emotions, beliefs, and intelligence of the internal world combine and evolve into different value systems. When a value system based on consciousness, desires, emotions, and beliefs is considered the primary set of values, love, equality, justice, human rights, respect, growth, and creativity become our value standards [1]. If intelligence alone is regarded as the main value when viewing the objective physical world, cognition is reified. We use language, tools, art, machines, and even intelligent machines to implement these values, separating intelligence from life, unencumbered by consciousness and emotions, extending intelligence beyond the body to create artificial intelligence that exists in the physical world and is even launched into space, becoming a part of the human civilization ecosystem and promoting the development of human intelligence. The death of a carbon-based life, that is, the disappearance of individual spirit and the end of cognition, is merely the transformation of organic matter into inorganic matter. Humanity continues to reproduce, the universe remains vast, and the stars continue to rotate. The universe is approximately 14 billion years old, and the Earth is about 4.5 billion years old, while the evolution of human cognition has at most 5 million years. If we compress the Earth's age into one year, humans only appear in the last half hour [2]. Descartes said [3], "I think, therefore I am," referring to the spirit within the cognitive space that persists. Whether you and I think or not, the Earth still exists and operates. In the face of the objective physical world, do not exaggerate the subjective spiritual world of humanity. The forces of the universe are originally unrelated to individual desires and happiness, and whether the physical world is as people perceive it cannot be verified. This is the broader context for understanding the human invention of machine cognition.
Currently, the widespread trial use and recognition of ChatGPT[4] around the world have realized the normalization of the conversational Turing test, and it also poses a challenge to the major national development strategy proposed by our country on a global scale—new generation artificial intelligence. A profound understanding of the foundation, connotation, extension, technological characteristics, and development pathways of the new generation of artificial intelligence is an important guarantee for the realization of this national strategy. Whether it is human cognition or machine cognition, whether it is the global frontier of artificial intelligence or the artificial intelligence with Chinese characteristics, it is necessary to use the method of cognitive physics[5], starting from the source of cognition—the formalization of cognition.
1 Formalization of Cognition
1.1 Definition of Cognition
The entire activity of human cognition is about how to interpret and solve the practical problems encountered by humans in the process of survival and reproduction. Each cognitive activity can be divided into a cycle of perception, thinking, action, and feedback to perception. Perception is the source of cognition, thinking is an activity that takes place in the cognitive space of the self, it is systematic, full of imagination, and can be deeper than the physical world; action is the external manifestation and purpose of cognition, both perception and action occur in the physical space, forming embodied intelligence through interaction, the cognition of the spiritual world and the behavior of the physical world are one, cognition continuously spirals between the objective physical space (physical world) and the subjective cognitive space (spiritual world), to answer questions such as "where," "what," "why," and "how to do."
The great physicist Albert Einstein once said, "The most incomprehensible thing about the universe is that the universe is understandable at all" [6]. However, the individual's position in the universe is too insignificant, with a lifespan of no more than a hundred years, making it very difficult, if not impossible, to comprehend the infinite. Humans have been able to, through the transmission of culture and civilization from generation to generation, form scientific theories and invent technologies to explain such a vast universe, never stopping the pace of exploration, turning more and more of the unknown into the known, into a certain degree of explainability. This limited rationality is actually a collective consensus in the process of human cognition. Things that were incomprehensible a thousand years ago may have been partially understood by people today, and others may still be incomprehensible due to knowledge discontinuities. Human cognition develops in a spiral, with no end in sight. The greater the scope of human cognition, the larger the interface with the unknown will be.
Despite the twists and turns in human cognition, it is gradually approaching the truth. The ability to cognize is the ability to learn (that is, the ability to explain and solve preset problems), as well as the ability to explain and solve real-world problems [7]. Preset problems usually come from real-world issues and have been formalized, with proven effective solutions, such as knowledge written in school textbooks. Learning is the process of transforming the previously unknown into the known with the help of the outside world, and it is the foundation for explaining and solving new problems. Explaining and solving real-world problems is the purpose of learning, and the two promote each other. Embodied intelligence in the physical space includes perceptual intelligence and behavioral intelligence. Perceptual intelligence can be further divided into spatiotemporal recognition intelligence (i.e., the ability to recognize position, direction, and time) and pattern recognition intelligence. Due to the needs of survival and reproduction, it has even become perceptual intuition, such as the recognition of faces and voices. In the cognitive space, there is computational intelligence and memory intelligence, with memory taking precedence over computation, constraining it, and reflecting the scope or boundaries of computation. Whether it is learning or explaining and solving real-world problems, in addition to cognitive activities in the cognitive space, it is also necessary to repeatedly interact and verify in the physical and cognitive spaces, to be manifested externally through embodied actions including language, and to form the accumulation of memory, achieving self-growth of cognition. The result of learning is the modification, modification, and reshaping of memory, the storage, regulation, and retrieval of memory. The purpose of learning is to explain and solve new real-world problems encountered. As the saying goes, "To learn without thinking is labor lost; to think without learning is perilous." The domestic translation of supervised learning as "supervised learning" and unsupervised learning as "unsupervised learning" is not very accurate, because guidance includes rich connotations such as guidance, explanation, error correction, and supervision. Nature has endowed humans with a greater desire than wisdom, "seeking truth," "seeking knowledge," and "seeking beauty" are innate human desires, which have been developed through natural selection and survival of the fittest in human survival and reproduction. The cognitive space is not only a warehouse for human memory knowledge but also a sky for imagination to soar. The scope of imagination is hard to constrain by material, and humans can imagine things that do not exist, which is precisely the greatness of spiritual power.
1.2 Four Basic Modes of Cognition: Deduction, Induction, Creation, and Discovery
When discussing the formalization of cognition, we must analyze the openness of cognition, especially its interactivity; the uncertainty of cognition, especially the fundamental certainty within uncertainty; the hierarchical nature of cognition, especially its recursiveness; the proactivity of cognition, especially the attention mechanism; the complexity of cognition, especially the emergence mechanism; and the holistic nature of cognition, especially the synergy between perception, thought, and action. We analyze how cognition, formed iteratively between the objective, real, external physical space and the subjective, abstract, internal cognitive space of humans or machines, gradually approaches the truth; and how the deductive, inductive, creative, and discovery modes of human cognition, formed iteratively through generations of inherited collective intelligence and individual intelligence, represent the four basic patterns of cognition. These include knowledge-driven reasoning modes, such as the proof of mathematical theorems; memory-driven experiential modes, such as deep learning; association-driven creative modes, such as the progression from celestial navigation to artificial satellite positioning to the Starlink project; and hypothesis-driven discovery modes, such as Mendeleev's prediction of new chemical elements. We must also analyze the intelligence formed by multiple cognitive modes, which encompasses visual thinking, logical thinking, and insight, with abstraction, association, and interaction mutually supporting, driving, leading, and complementing each other, advancing in a spiral. In terms of individual cognitive development, there are differences in imagination and creativity. Even for the same problem, different cognitive modes may be adopted in different periods and contexts, with uncertainty. When discussing the formalization of cognition, we must also explore how artificial intelligence, accumulated as an extension of human intelligence, enhances human cognition; how humans or machines possess diverse cognitions in the process of thinking, learning, and growing, each with its own intelligence, learning from each other, sharing intelligence, and being inclusive; and how the continuous transformation between cognitive modes constitutes an endless cognition tending towards unity.
1.3 Overcoming Entropy Increase and Maintaining Order in Machines through Recursion and Iteration
Whether it is human cognition or machine cognition, neither can violate the laws and principles of general physics, the most fundamental and basic of which is the principle of entropy increase [8]. The total degree of disorder, or "entropy," of an isolated system will only increase in natural processes. When entropy reaches its maximum value, the system will become severely chaotic and come to an end, and humans will die. To maintain order, the universe, life, and machines are all adept at applying simple, repetitive basic operations in cycles or loops to overcome entropy increase, maintain order, and exhibit regularity. Human survival and reproduction are iteratively carried out from generation to generation, manifesting as genetic inheritance. There are many recursive and fractal phenomena within the individual's life organization. Human evolution, including human intelligence, can also be said to be a cyclical process. Both in humans and machines, there are numerous cyclical phenomena in cognitive activities. When discussing the formalization of cognition, we notice an important cyclical activity in life and cognition is iteration, using the result of this iteration as the initial value for the next iteration, continuously advancing and accumulating development. Another important form of cyclical activity is recursion, but recursion is different from iteration. Iteration is moving forward, metaphysical. For example, the knowledge in the human brain, from elementary school to university to adulthood, grows by self-reuse and iteration; or the science and technology of human society, especially the mass production of intelligent machines from generation to generation, also develops iteratively. Recursion, on the other hand, is looking back, metaphysical. For example, the embodied intelligence in cognitive machines is ultimately executed through recursive machine instructions in the hard constructs [9]. Or the autoregressive generation system in ChatGPT, which fully utilizes recursion and iteration [32]. Recursion and iteration are particularly important for the self-guidance and self-growth of life and cognition. Carbon-based life is composed of cells. Nobel laureate Erwin Schrödinger wrote in "What is Life?" [10]: Life is a codex that can determine the complete pattern of individual future development, and living is to fight against the law of entropy increase. Humans, like all living beings, follow the most basic physical laws, will age, and depend on negative entropy for life. From the perspective of machine cognition, we can understand Charles Darwin's theory of evolution [11], especially the diversity of species; understand Francis Crick's genetics [12], especially genetic engineering; understand Eric Kandel's cell biology [13], especially cognitive neurobiology. Therefore, it is necessary to understand how machines rely on energy to form order, how they produce negative entropy through interaction with the outside world, and thus understand how machines think and how they cognize.
The universe is vast, with the Milky Way in constant motion. The universe is composed of matter, and matter and energy can be interchanged. Some say that the Big Bang was an emergence of the new, while others argue that it was the Big Bang that gave birth to Earth [14]. Long before life appeared on Earth, there were already various forms of matter. Over millions of years, humans have survived and multiplied on Earth, with the sun rising and setting, and the seasons changing from winter to spring. The evolution of human cognition has created more and more artificial objects, and has gradually developed from the material theory of the universe's composition to today's "Four Elements of Machine Cognition Theory."
2 From the "Tool Two-Element Theory" to the "Cognition Four-Element Theory"
2.1 The Tool Two-Element Theory
Let's first look at the "Tool Two-Element Theory" since the Stone Age and the Agricultural Age, which is the theory of matter and structure. The material of human-made tools is matter, and various structures are directly parasitic on matter, parasitic on the interrelationships of the components that make up the tools. The structure determines the function, forming a hard construct (see Figure 1). "Making the first stone into a knife" [15] took hundreds of thousands of years, and in the physical space, structure and matter are inseparable. As early as 3200 BCE, the Sumerians invented the early wheel [16]. If you cut a natural trunk with two parallel "planar structures" and endow it with a "circular structure" around it, it becomes a "wheel". Without this structure parasitic on matter, allowing "objects to roll around an axis," it is hard to imagine how any mechanized tool could work today with such a seemingly simple invention and continuous invention of hard constructs. This accumulated knowledge covers all ranges from gears to bicycles, cars, jet engines, and precision instruments. The wheel has a history of 5500 years, and the significant role of the wheel in human history is often compared with the invention of fire. For another example, the first known gear calculation tool invented by humans is the Antikythera mechanism of ancient Greece more than 2000 years ago. If we trace the mechanization of Chinese mathematics, the earliest abacus, which is a hard construct parasitic on matter, was invented by Xu Yue, a mathematician of the Eastern Han Dynasty in China. He wrote in "Shu Shu Ji Yi": "The abacus controls the four seasons, and the vertical and horizontal threads the three talents." It can be seen that a hard construct is by no means equivalent to matter. The various complex structures parasitic on matter are a manifestation of human imagination, and the scale involved in the structure can be as large as or as small as about 18 orders of magnitude of human embodiment. The tools of the Stone Age and the Agricultural Age themselves have no power, and their structural design does not consider energy. The tools are not machines, let alone life, but they can greatly expand human physical strength and behavior.
Figure 1: Agricultural Age Tools: Examples of Hard Constructs with Structures Parasitic on Matter
2.2 The Machine Three-Element Theory
In order to better replace and extend human physical strength with machines, an important element—"energy"—was added to the machines of the Industrial Age, together with matter and structure, forming the "Machine Three-Element Theory." The structure directly parasitizes on matter and energy, parasitizing on the interrelationships of the components that make up the machine (as shown in Figure 2), forming a hard construct that replaces the physical body and is capable of operating but cannot think. For example, pendulums, steam engines, electric vehicles, and so on, extend human physical capabilities and extend human behavioral abilities. The speed at which humans walk and the strength of their muscles have not changed much over thousands of years, but the power machines invented by humans have greatly extended and expanded human physical strength and behavior. Vehicles, ships, airplanes, and rockets invented by humans have extended the range of human activities to land, sea, sky, and space. The speed of rockets is four or five orders of magnitude higher than the walking speed of a person. Nowadays, various types of power machines, whether it is mechanical energy, radiant energy, thermal energy, chemical energy, electrical energy, or nuclear energy, are widely used and ubiquitous, and energy has even become an important sign of whether a country is developed or not. In the physical space, the complex hard constructs formed by matter, structure, and energy, whether they are tools or machines, have brought fundamental changes to the production form, organizational structure, economic prosperity, and lifestyle of human society. Some people call the information revolution the fourth industrial revolution, but it may not be accurate. Human history has already occurred and continues to promote the agricultural revolution, the industrial revolution, and the cognitive revolution, which develop in parallel and rise and fall alternately. In the Stone Age and the agricultural age, humans invented various tools to extend physical strength. In the industrial revolution era, humans invented various machinery and power machines to extend and expand human physical strength and behavior, but power machines still cannot think. For example, the clock machine relies on matter, energy, and structure to maintain operation, yet time does not make any substantial contribution to the operation of the clock machine.
Figure 2: Industrial Age Machines - Examples of Hard Constructs2.3 Basic Element 2.3 Analysis in Machine Cognition
Cognitive machines in the intelligent era are different from the power machines of the industrial era. This paper proposes the four-element theory of machine cognition, adding an important element—time—on the basis of matter, energy, and structure. In physics, especially astrophysics, space and time are often the two major dimensions. In physics, the structure of any object must be parasitic on matter and energy; structure cannot be an isolated entity in the physical world. The movement and change of matter and energy in space are described by time, and their structure also changes with time. Matter and energy can be interchanged, existing in motion and change, and life exists in growth or aging. We believe that time is a subjective concept created by humans, and there is no absolute time in the universe, nor is it an isolated entity in the physical world. Time is not any kind of thing, and it should not be spatialized or materialized. Humans have invented the concept of time, which is divided into moments and time intervals, to describe the movement and change of matter and energy in the universe. People have once given time the highest philosophical status [17], only recognizing the movement and change of matter and energy in the cognitive space as the highest status. Structure is an independent element relative to time and does not appear overly specific. Structure and time are the cornerstones of human cognition and also of machine cognition. Intelligence originates from the human brain, especially from the complexity caused by the interaction of countless types of nerve cells, and the neocortex of the brain is the organ that forms structure and time thinking.
Without memory, we would forever live in the present, and there would be no concept of structure or time. It is memory that provides continuity for our cognition of the past and the present, allowing humans to have the biological basis for abstract structures and the invention of time. Water, air, and nutrition are necessities for sustaining life; memory is the human cognition of structure and time, and with a bit of forgetting, it is a necessity for maintaining the life of thought, the soul of life, running through life from beginning to end. Only with normal memory can humans possess intelligence, accumulate intelligence, realize value, have a sense of history in life, and have a sense of growth in cognition. Structure and time permeate the entire cognitive space of humans. Mathematics is the most abstract soft construct of humans, the most abstract professional language used by humans based on natural language. According to Einstein's definition, time is just a reading on the dial of a clock. If a batch of pendulum clocks produced at the same time are placed in different positions in the universe, they will show different times, which fully proves that time cannot be completely separated and independent from space, but is just an attribute used to express space. With the concept of time, matter and energy can be mapped into the topological structures and relationships shown in the cognitive space at different moments (that is, when time is frozen).
Matter and energy are the real existences at the physical level, while structure and time are abstract concepts at the cognitive level, representing the parameters of human cognition of the existence and changes of matter and energy. In the spiritual world of humans, structure is used to express the topology and deformation of matter in space, and time is used to express the movement and change of matter, reflecting the transmission and transformation of energy. Many hard constructs that structure and time parasitize on matter and energy form the embodiment of machines; and the numbers, symbols, and information in the machine's thinking process are a large number of soft constructs, just like the thoughts expressed in the cognitive space of humans. They parasitize on hard constructs or existing other soft constructs, can self-guide, that is, self-lift, can self-reuse, that is, recursion, can be used iteratively, that is, iteration, and can even self-replicate or modify, forming imagination. At least there is the next time cycle, allowing the machine to "think" again. Time gives the machine an active order, and thinking can "come to life." The soft constructs in the human cognitive space are the elements of thinking, supporting visual thinking, logical thinking, and intuitive thinking, reflecting the rich imagination and creativity of humans, reflecting the spiritual world, with a sense of scale, a sense of time, and a sense of hierarchy. If we want to name the underlying soft constructs, they may be symbols, letters, strokes, numbers, front and back, left and right, up and down, order, speed, etc., and some people also call them mental language, which can be collectively referred to as the state of association between symbols and symbols. As for feelings, concepts, information, and knowledge, they are all upper-level soft constructs, which are merged and classified, reflecting different levels of abstraction, and are the mirror and superstructure of the physical world in the cognitive space, and are the reality of imagination.
Today, the great success of deep learning, including ChatGPT, lies in its historical significance: training machines with a large number of hard constructs from the physical world, using annotations to replace memory, using a sufficiently large number of cycles to approach infinity, using a sufficiently large number to approach the infinitely large, and using a sufficiently small number to approach the infinitely small, to generate soft constructs of different granularities. Deep learning, on a macro level, belongs to the experiential cognitive model driven by memory. It can recognize the different graphic As written by different people on paper, these hard constructs, which are merged, classified, and abstracted into the soft construct of the letter "A," forming memory; similarly, it can also judge and recognize the countless entities (hard constructs) that exist in the physical world, and memory forms the abstract concepts of various soft constructs in the cognitive space, such as "mountain," "water," "tree," "grass," "chair," "house," "person," "pet," and so on. From the cognitive level, all human thinking activities are abstract. Soft constructs are the results of abstraction, the "virtual units" of thought, and the mirror images of hard constructs. If we say that deoxyribonucleic acid (DNA) is a chemical substance in the cells of animals and plants with genetic codes, which is matter that has parasitized structure and is a hard construct, then the genetic code is a soft construct.
Take autonomous driving as an example again; the hard constructs in the physical space include the car body, chassis, tires, electric motors, sensors, chips, etc., while the soft constructs in the cognitive space include the operating system, the "driving brain" [18] program, driving maps, traffic rules, etc. The soft and hard constructs interact, complementing each other between the virtual and the real, forming the car's embodied intelligence. The embodied behavior of the autonomous vehicle can be indistinguishable from that of a human-driven one. It is important to note here that the elements of cognition do not refer to the elements of the composition of the universe. The universe, in terms of matter, is a single element (matter). We start from the single element (matter) of the universe's composition, to the two elements (matter, structure) of tool composition, then to the three elements (matter, structure, energy) of the power machine composition, and further develop to the four elements (matter, structure, energy, time) of machine cognition composition. Although in physics, matter and energy can be interchanged, in the cognitive process, without energy, there can be no thinking activity, nor can there be perception and behavior. If a machine stops being supplied with energy, such as a power outage, the machine "dies"; and then if the power supply is restored, the machine can self-lift again, activate the operating system, and re-enter the cognitive working state; however, the hard constructs in the cognitive machine cannot grow by themselves, self-repair, nor can they self-replicate, which is very different from the soft constructs; if the hard constructs age or fail, they can be restarted after being repaired, and if there are new hard or soft constructs added, as long as they are compatible, after upgrading, they can enhance the machine's cognitive ability. After all, the cognitive machine is not a life composed of cells, it does not have the biological basis of cell fission and growth, and it cannot reproduce, cannot self-replicate the machine itself, and cannot self-power on. However, it can self-replicate soft constructs under the support of the four elements, replicate and extend thoughts, achieve self-growth in cognition, display embodied intelligence and general intelligence, and also put itself in a sleep state, waiting to be awakened.
The human evolution of the past 4 million years has led to genetic advantages, breaking away from barbarism, and directly parasitizing structure onto matter, inventing tools; over the past 3 million years, humans have developed linguistic advantages; 6,000 years ago, humans invented writing and education, forming cultural and civilizational advantages, marking the first cognitive revolution; in the last 500 years, by utilizing matter, structure, and energy, machines were invented, forming technological advantages, liberating human physical strength, and greatly expanding the physical space of human activities, marking the second cognitive revolution; in the last 100 years, the invention of more sensors and thinking machines has liberated human intelligence, forming an advantage in intelligence, and humanity has entered the third cognitive revolution. Matter, energy, structure, and time are the core elements of human cognition, and also the core elements of machine cognition, with cognitive machines incorporating more soft constructs.
3 Using the "Cognition Four-Element Theory" to Explain Typical Cognitive Events
3.1 Explaining Einstein's Mass-Energy Equation
The mass-energy equation E = mc^2 proposed by Einstein in 1905 is a cognitive understanding in the cognitive space of the order between matter and energy in the universe through structure and time, relying on the support of soft constructs such as "displacement," "meter," "kilogram," "second," "joule," and "velocity" to explain the relationship between matter and energy in the universe [19]. The speed of light C = λM·fM, where λM and fM are the wavelength and frequency corresponding to matter, representing the unique physical properties of the substance and its spatial wavelength and temporal frequency characteristics, expressing that energy and mass can be interchanged. Every kilogram of mass can be converted into joules of energy. Matter in the universe is derived from the energy of the Big Bang [19]. When an object emits energy in the form of radiation, its mass decreases, reflecting the overall conservation and unity of mass and energy in the universe. Without the soft construct concepts of "displacement," "meter," "kilogram," "second," "joule," and "velocity," the relationship between matter and energy would be inexplicable. The mass-energy equation expresses the four elements of matter, energy, structure, and time and their transformation laws in a single formula.
3.2 Revisiting Simon's "Physical Symbol System Hypothesis"
Looking back at the "Physical Symbol System" hypothesis proposed by the pioneers of artificial intelligence, Herbert A. Simon and Allen Newell, in 1976 [20], it was used to express the abstract capabilities in cognitive activities and was considered a sufficient and necessary condition for general intelligent behavior. Mathematics is one of the means to cultivate abstract thinking, which originates from early human imitative activities. Abstraction is the imagination and creative activity of thought. Physical entities, through human cognitive abstraction, are refined to become soft constructs, representing generality and universality in the form of symbols, forming memory, and not actually existing. The symbol system consists of a set of abstract "symbols" representing entities, which can be recombined into another type called "expressions" (or symbol structures). Simon later proposed the chunking theory [21] in his cognitive system model, combining scattered components into meaningful information units. Now, it seems that these very limited symbols, expressions, and their combinations and operations can all be called soft constructs, albeit existing at multiple levels of abstraction, with different scales of abstraction for each soft construct. Operations of high-level abstract soft constructs can be supported and completed by lower-level soft constructs, and lower levels of abstraction can be implemented by even lower levels of abstraction, with jumps up and down or recursive layers. However, Simon's Physical Symbol System hypothesis greatly underestimated the richness of human abstract capabilities at the base level, being too simplistic and not constituting a sufficient condition, and the soft constructs do not need such strict logical relationships. The success of today's large language models proves this point, with ChatGPT having as many as hundreds of billions of parameters. In addition to abstraction, association and interaction are indispensable, which were not noticed in the Physical Symbol System hypothesis. Creativity stems from imagination, imagination stems from bold abstraction and simple association, deepens through calm analogy, and ultimately is verified by practice. The topological connections between soft constructs are diverse, association leads to correlation, leading to similarity, analogy [31], and transfer, from this to that, drawing inferences about kindred matters, forming general knowledge and general intelligence. Interaction also ensures that abstraction and association cannot be separated from the physical world, verified, and ensures the unity of knowledge and action. However, Simon used new symbol structures equivalent to high-level soft constructs to reflect the iterative development of cognitive activities, allowing machines to complete thinking through recursion, contributing greatly. Simon was awarded the Turing Award for his foundational contributions to artificial intelligence, cognitive psychology, and list processing in programming, later received the Nobel Prize in Economics and the Lifetime Achievement Award from the American Psychological Association, and in 1994, he was elected as one of the first foreign members of the Chinese Academy of Sciences, which commands our admiration.
3.3 Silicon-Based Machines as Supercharged Accelerators of Human Thought
In 1936, Turing published "On Computable Numbers, with an Application to the Entscheidungsproblem" [22], providing a rigorous mathematical definition of computability. The Turing machine, a simple yet extremely powerful computational model, laid the theoretical foundation for "computation is intelligence." With a Turing machine, one can compute all imaginable computable numbers. The famous "Church-Turing thesis" [23] later illustrated the equivalence of "lambda calculus, recursive functions, and Turing computability," that is: all effectively computable or mechanically programmable functions are precisely those approached by general recursive functions. It can be considered that "Turing computability" is the process of soft constructs approaching infinity through self-reuse, pioneering the era of machine brute-force computation. Taking the human cognition of pi as an example for discussion: around 1900 BCE, the Babylonian clay tablet recorded pi to be approximately equal to 25/8 (3.125). Over 200 years before the common era, Archimedes used the outer and inner 96-sided polygons of a circle to determine that the value of pi should be between 3.140845 and 3.1428571. Around 500 CE, Zu Chongzhi calculated pi by drawing a circle with a diameter of 10 feet on the ground, starting from the inscribed hexagon of this circle to a 12,288-sided polygon, and concluded that the value of π was between 3.1415926 and 3.1415927. Relying on natural evolution, humans used simple tools to calculate the value of π; it took 1,700 years to improve the precision by one decimal place, and another 800 years to improve by four decimal places. However, in 1950, π was calculated to 2,037 decimal places on the ENIAC computer. In 1954, 3,089 decimal places were calculated in 13 minutes on the NORC computer. In 1989, IBM-VF supercomputer calculated it to 1.01 billion decimal places. In 2010, a Japanese person assembled a computer to calculate 5 billion decimal places. In 2011, computers calculated it to a trillion (10^12) decimal places. If an A4 paper can write 60 lines, with 17 digits per line, it would take 1 billion sheets of paper to write out, and if stacked together, the height would be 100,000 meters! With the help of computers, Turing computability, and soft construct reuse, the precision of π has improved to 10^12, only taking 70 years. It can be seen that silicon-based machines are supercharged accelerators and amplifiers of human thought and intelligent behavior, and the exponential growth rate of brute-force computation is beyond the reach of carbon-based life intelligence (see Figure 3). Humans should fully enjoy the benefits of machine brute-force thinking and let artificial intelligence serve the creative modes driven by engineers' association and the discovery modes driven by scientists' hypotheses.
Figure 3: Visualization of the Brute Force of Calculating the Precision of Pi Using Silicon-Based Machines
The brute force computing capability of silicon-based machines has paved the way for new directions in machine animation and virtual reality. The abstraction and association within cognitive machines can generate many virtual realities that deceive the human eye through brute force calculations, such as creating virtual tsunamis. Of course, just as excessive imagination can sometimes lead to hallucinations and delusions in people with mental illnesses, silicon-based machines can sometimes fall into infinite loops of soft constructs, manifesting as system crashes while still consuming energy.
When discussing the phenomenon of biological evolution, the common timescale used is "ten thousand years"; when discussing the phenomena of human civilization and ecology, the common timescale is "thousand years"; when discussing the progress of human thought and cognition, especially the development of science and technology, the common timescale is "hundred years" or even "ten years". Currently, the thinking speed of intelligent machines has reached the nanosecond (10^-9) level and is advancing towards the picosecond (10^-12) and even femtosecond (10^-15) levels, approaching the infinitely small with sufficiently small increments. The speed of human thought has not changed much over thousands of years, relying on the reaction speed of carbon-based life forms evolved naturally, still at the millisecond (10^-3) level, or perhaps even lower. The thinking speed of machines has left human thinking speed behind by seven or eight orders of magnitude. With the improvement of computer clock precision, the working frequency of CPUs is also synchronized to increase, which is equivalent to the execution cycle of language instructions in the infant cognition core[24] (whether it is complex instruction set or reduced instruction set) being greatly shortened. The intervention of quantum computers will further increase computational power. Today, it is not surprising that machine Go programs[25] and protein folding structure prediction[26] are far superior to the human brain. More importantly, the brute force thinking of machines can, in turn, promote the imagination of the human brain. Turing said in "Computing Machinery and Intelligence" in 1950[27]: "I am not interested in the fact that machines cannot perform well in beauty contests, nor am I interested in people who fail to race against airplanes." Today, we can understand this as: We do not belittle thinking machines that do not have consciousness and emotions, nor do we belittle biological humans whose thinking speed is far inferior to that of silicon-based machines. It is entirely possible for cognitive machines to surpass the general urban white-collar workers who do clerical work, but it is still difficult to replace jobs that require high emotional intelligence and direct service to people. However, given time, there will be improvements.
4 Embodied Turing Test Normalization
4.1 The Way Forward for Autonomous Driving: Normalization of Vehicle Embodied Behavior Testing
At present, autonomous driving is often mistakenly considered an automatic control issue, overly influenced by the L0 to L5 level classification proposed by the Society of Automotive Engineers (SAE) J3016 standard; or it is seen as a problem of pre-training plus fine-tuning in deep learning, overly influenced by end-to-end deep learning from companies like NVIDIA. Some people start from the road environment, emphasizing intelligent networking, relying on Beidou/GPS high-precision positioning, RSU/OBU and other roadside facilities guidance, 5G/6G communication networks, or high-precision navigation maps [28] to meet the needs of driving cognition; others add more and more sensors to the vehicle, increasing cameras to dozens, and LiDARs to seven or eight, and from 64 lines to 128 lines or even more, also adding millimeter-wave radars, infrared radars, etc.; there are also those who let smart cars drive millions of kilometers on actual roads, attempting to cover all kinds of situations or accident situations; and traffic management departments are establishing various testing and evaluation standards, trying to cover as many driving situations as possible, such as lane changing and overtaking, unprotected left turns, merging into and out of traffic, turning at intersections, parallel parking, following driving, driving on snowy roads, rollovers, tire blowouts, and other accident prevention, etc., in order to issue driving licenses to smart cars. More than ten years ago, we proposed the development of a driving brain and took the lead in successfully completing the actual road unmanned driving from Beijing to Tianjin, and from Zhengzhou to Kaifeng. The chassis of the bus used at that time was produced by a famous bus manufacturer, who recently reluctantly decided to stop the development of autonomous driving and turn to outsourcing.
The general public humorously say about autonomous driving, "We only hear the stairs creaking, but we don't see anyone coming down." The industrialization of autonomous driving is fraught with difficulties. Automobiles have a glorious history of nearly 200 years of manufacturing and development, are a paragon of the Industrial Revolution, and also a paragon of intelligent manufacturing, achieving the mobile life of humans. Especially the practice of automotive ergonomics, through the steering wheel, accelerator, and brake, very naturally extends people's limbs and physical strength, making the car a controllable part of the body. Although vehicle dynamics research is increasingly mature, and the automation of cars has reached its peak, a car that operates on its own without a driver, if it cannot keep up with the times in learning ability like humans, does not yield to pedestrians, is not decisive enough when changing lanes, does not probe when attempting to merge, and does not interact with the taxi needs of surrounding vehicles and pedestrians, and cannot deal with various marginal conditions, it is difficult to gain the recognition of human society. The core of intelligent driving is the formalization of driving cognition, the development and mass production of the machine driving brain, and how to ensure that machine driving is safer, more energy-efficient, and more comfortable than human driving. It is really asking too much for car manufacturers to develop a driving brain. The "Embodied Turing Test" refers to the test where a third party cannot distinguish from the behavior of the machine whether various powered machines are controlled by humans or controlled autonomously by the machine. Traffic accidents are endless, and there are accidents beyond accidents. The Embodied Turing Test for vehicles cannot distinguish whether it is a benchmark driver or an autonomous machine driving. The embodied interactive intelligence of the vehicle is the starting point and destination of unmanned driving.
Figure 4: The Learning Process of the Machine Driving Brain
A prominent advantage of the machine driving brain is that it always maintains attention, focusing on the self-generation of current road rights and driving situation diagrams during the vehicle's behavioral process. This is something that human drivers, who can become fatigued and emotional, cannot achieve. Autonomous driving is an inevitable trend in technological development. Teaching machines to drive and training the machine driving brain to take over the work of benchmark drivers involves the following three gradual steps. First is the benchmark driver's operation and the machine driving brain's learning, which is supervised learning; then is the driving brain's self-operation with the benchmark driver's intervention, which is semi-supervised learning; and finally, the machine operates and learns autonomously, which is unsupervised learning. The feedback in Figure 4, from back to front, is multiple and uncertain, sometimes returning to supervised learning, and sometimes to semi-supervised learning. Supervised learning includes preconceived notions, task assignment, guidance, doubt resolution, interaction cognition, and supervision; while unsupervised learning is an important step in converting the results of supervised learning into long-term memory. Only through the machine's self-learning, with continuous iteration, can cognitive self-growth be achieved. The driving brain can also flexibly attach an external memory stick, such as a typical situation response library, accident prevention library, parking library, etc., which can turn more and more unknowns in driving into knowns during the process of supervised learning, reinforcement learning, and especially unsupervised learning. Keeping attention always focused on the changes in current road rights, self-correction, and realizing cognitive self-growth through the normalization of the embodied Turing test is the fundamental way out for autonomous driving.
4.2 Normalization of the Embodied Turing Test as Another Milestone After the Conversational Turing Test
Linguistic intelligence is the most basic manifestation of human intelligence. At present, ChatGPT is undergoing a Turing test day after day in front of all humanity [29], showing an extraordinary performance. Conversation, whether it is listening and speaking, handwriting, or typing on a keyboard to communicate through text, is an embodied behavior that consumes energy. However, ChatGPT cannot replace a person's multiple intelligences because it has never gained any behavioral experience and experience in the physical world beyond text. In the long process of human evolution, in addition to visual and auditory interaction, more is physical interaction with the external environment using limbs and the body [30], which we often call "labor." All kinds of power machines are widely used, varied, and ubiquitous. Such as tractors, transplanters, harvesters in the fields, excavators, cranes, shield machines on construction sites, cars, airplanes, ships on transportation lines, engines, generators, machine tools, production lines in factories, spacecraft in space, etc. Machines operate continuously, creating a batch of new labor positions, especially skilled positions, and training a large number of excellent operators, craftsmen, and great craftsmen, whose operating skills often amaze people. People revolve around machines, and machines revolve around people, repeating from generation to generation, becoming the norm of life, and Engels even exclaimed "Labor Created Humanity."
All designs, implementations, and evaluations related to power machinery involve human control, human-machine interaction, the embodiment of the human body interacting with the behavior of the machine, and are developing towards more perfect, natural, and convenient directions, with humans becoming the center of real-time control of the entire production activity. If all the various machines invented and used by humans since the Industrial Revolution can be silently penetrated by artificial intelligence to achieve self-control, working tirelessly around the clock, and their behavior can no longer be distinguished from the embodied behavior of machines controlled by skilled craftsmen, people will be freed from the daily, countless, and various machine labor positions (especially arduous job positions). No longer being enslaved and bound by all kinds of machinery and machines for a long time, while still being able to maintain the increase in all kinds of industrial and agricultural products, maintain economic and social prosperity, people will engage in more creative free labor. How great the change in human society will be! Therefore, the normalization of the embodied Turing test for unmanned machine control will be another important milestone after the normalization of the conversational Turing test. Perhaps this process will not take a hundred years, and the self-control of vehicles may take the lead. For any machine, we hope it can be versatile, but we do not expect it to be omniscient and omnipotent, capable of everything. We hope that intelligent machines can operate autonomously, replacing various types of labor in human society, especially in harsh conditions and adverse environments. The world that machines perceive is determined by the heterogeneous sensors they are equipped with. The types and accuracy of the sensors determine the quality of the machine's perception, the limits of the physical world observed by the machine, and affect the machine's cognition and intelligence. Humans can completely transcend the limitations of carbon-based life sensory organs and equip machines with a variety of silicon-based sensors and recognition systems. For example, let them wear microscopes or telescopes, or see polarized light, electromagnetic fields, or hear ultrasonic waves, infrasound, and equip them with Beidou positioning receivers. Even specific forms of language, such as programming languages, art languages, chemical languages, material formula languages, etc., can be given to different machine individuals, allowing them to interact with human experts using professional terminology. The behavior of the machine, that is, embodied intelligence, is closely related to the kinematic properties of the machine. For example, the behavior of self-driving cars is related to vehicle dynamics, the behavior of self-navigated surface warships is related to naval and ocean dynamics, the behavior of self-piloted aircraft is related to aerodynamics, the behavior of self-controlled shield machines is related to the behavior of their servo systems, and the behavior of self-operated surgical robots is related to the dexterity of the surgical knife. Humans can completely equip machines with a variety of powerful or delicate kinetic behavior devices. Because machines are no longer limited by human sensory organs and behavioral capabilities, no longer limited by the flesh, with the self-growth of machine cognition, it is very normal for machines to do better than humans in almost any labor position.
Figure 5: Interaction and Collaboration in Machine Learning and Autonomous Machine Operations
Autonomous cognitive machines, in addition to being composed of metal and non-metal materials to form various mechanisms, can have powerful power systems and complex servo systems. More importantly, they can be composed of heterogeneous, silicon-based hard constructs such as Field-Programmable Gate Arrays (FPGAs), Data Processing Units (DPUs), Computing Processing Units (CPUs), Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), and memory. They can also be implanted with hard constructs that reflect genetic inheritance, known as "infant cognition cores," forming the machine's embodiment, on which a rich and multi-scale range of soft constructs can parasitize, capable of self-lifting and self-reusability. Interactions between intelligent machines and humans are achieved through cross-modal perception, realizing an external cycle of behavior. Within the machine brain, there is heterogeneous and parallel cooperation between instantaneous, short-term, and long-term memory, building memory intelligence. In current machine brains, instantaneous and short-term memories can use parallel processors and circuits such as DPUs, GPUs, TPUs, FPGAs, etc., according to system requirements, while computation can be achieved using CPUs, GPUs, and other processors. Future machine brains may adopt new architectural system chips with higher processing efficiency, such as 3D integrated memory-computing systems. In summary, the new generation of intelligent machines is composed of heterogeneous, even super-heterogeneous components. These machines interact with the humans who train them during learning and operations to achieve mission alignment (see Figure 5). Soft constructs obtain feedback from the physical world through hard constructs, making full use of prediction and control to form a perceptual-cognitive-behavioral loop circuit, which can be verified to form increasingly correct cognition.
Figure 6: Machine Self-Operation Flowchart
The self-operation flowchart of cognitive machines that can interact, learn, and grow autonomously is shown in Figure 6. The chart shows the multi-level nested execution of the control system, with feedback loops for embodied behavior, attention, and sensor perception of the environment, with reasonable allocation between levels; there is a transition between long-term memory, instantaneous memory, and working memory, an engine for searching relevant facts and knowledge, decision-making for actions, and even modifications to memory and rapid memory retrieval. Negative feedback loops correcting errors are used to align with mission objectives.
Conclusion: Machine Intelligence in the Age of Intelligence
Machine intelligence in the age of intelligence has evolved from the mechanization of mathematics to the automation of thought, and further to the self-growth of cognition. Moving from the normalization of conversational Turing tests to the normalization of embodied Turing tests, cognitive machines will work alongside scientists, engineers, and skilled craftsmen to make discoveries, inventions, and creations. Humanity is entering an era of symbiotic co-creation and iterative development of intelligence, which also lays the foundation for the system architecture of a new generation of interactive, learning, and self-growing artificial intelligence. Finally, it should be added that simulating human physical behavior is not the focus of artificial intelligence, as intelligent science and technology are not primarily about creating artificial life or pursuing bionics in human form. To date, we have not seen a single machine with a hint of self-esteem or curiosity, nor any signs of a global robot union.
We should firmly remember that the fundamental purpose of human invention of cognitive machines is to take the physical world as the direct object of cognition, to explain and solve the practical problems encountered by humans in the process of survival and reproduction. Let us embrace the era of a new generation of artificial intelligence that replaces a large number of human labor positions in society and assists humanity in creating more. Humanity will certainly live a wiser, more dignified, and more elegant life!
Acknowledgments
Science is rooted in discussion. During the writing of this paper, I received help from many scholars including Yike Guo, Nick, Guanrong Chen, Yu Wei, Jie Chen, Qionghai Dai, Yun Xie, Bing Li, Liwei Huang, Chenglin Liu, Jingnan Liu, Pengju Ren, Yan Peng, Sheng Jiang, Yuchao Liu, Chengqing Zong, and Yu Hu, as well as graduate students, to whom I extend my sincere gratitude.
References
[1] Sun, R. Complete Growth: The Self-Creation of Children's Lives [M]. 2nd Edition. Beijing: China Women's Publishing House, 2014: 25-41. [2] Wu, J. The Light of Civilization [M]. Beijing: People's Posts and Telecommunications Publishing House, 2014: 52-54. [3] DESCARTES, R. The Philosophical Writings of Descartes [M]. Cambridge: Cambridge University Press, 1985: 65-67. [4] Sang, J., & Yu, J. From ChatGPT to the Future Trends and Challenges of AI [J]. Computer Research and Development, 2023, 60(06): 1191-1201. [5] Li, D. Cognitive Physics—The Enlightenment by Schr&oum