NVIDIA: it will take 5-10 years to truly realize the interaction between virtual and real world

Share the QR code with wechat scanning code to friends and circles of friends surging reporter Zhou Ling recently, because NVIDIA CEO Huang Renxun used a 14 second "digital man" video in a speech, which attracted extensive attention, thus bringing the "digital man" and even the "meta universe" from professionals to the public the surging news reporter recently talked with three NVIDIA technical experts, Shi Chengqiu, senior technical marketing manager of NVIDIA China, song Yiming, senior solution architect of NVIDIA, and he Zhan, manager of media and entertainment industry in NVIDIA China (which is uniformly called by NVIDIA technical experts in the text), to tell what digital people and meta universe are, How far is the meta universe from us? What changes will the meta universe bring NVIDIA positions itself as a provider of basic and underlying service architecture. The meta universe concept platform Omniverse is like a toolbox and a technology platform base, which integrates NVIDIA's technologies, algorithms and standards in AI, HPC and graphics over the past 20 years. NVIDIA hopes to build a technology platform to serve artists, creators and some C-end and b-end users NVIDIA technical experts said that Omniverse introduced the expression and language system closest to "human" for "digital human". Compared with the previous cumbersome technical links, now it only needs to turn words into rich facial expressions through NVIDIA conversational artificial intelligence system based on AI deep neural network perception, and then superimpose them on the virtual human based on AI real-time rendering system. NVIDIA believes that when the communication and interaction between digital people and real humans and the interaction between real and virtual worlds can be realized, it will perfectly fit the concept of meta universe. This stage will take five to ten years the following excerpt from the dialogue with NVIDIA technical experts surging news: a documentary about GTC keynote speech was broadcast at siggraph2021 conference last week, revealing the new application of digital generation in front of and behind the stage. Among them, a 14 second video is Huang Renxun's "digital man" virtual video, which has attracted extensive attention. What technologies are used here? Please introduce the development of digital people NVIDIA technical expert: in his speech, Huang Renxun has more than ten seconds of virtual separation, which can be regarded as a digital person. In fact, the concept of digital man can be regarded as cartoon characters or virtual characters. It has existed for a long time. For example, virtual idols hold concerts, and the Japanese animation industry has also launched several virtual idols with a very plump image. The biggest difference between these and traditional cartoon characters is that they not only use 3D rendering technology to make it look more like a person, but also use holographic technology to realize naked eye 3D and make it stand on the stage and sing vividly. These are some essential elements of virtual human in traditional concepts the omnisurface system in Omniverse (NVIDIA's virtual collaboration and simulation platform based on the concept of meta universe) can render different materials and surfaces, and also has a series of different rendering mechanisms for digital people. Using GPU rendering can make digital people closer to real humans, which is the first aspect of digital people the most important thing is to introduce the joys and sorrows closest to "people" - expression and language system for digital people. Audio2face only needs a paragraph of text, so it can sense the emotion under the language through NVIDIA conversational artificial intelligence system based on AI deep neural network perception ability, turn a paragraph of text into rich facial expressions, and then automatically superimpose it on the virtual human based on AI real-time rendering system. In the past, dubbing and mouth shape adaptation were required, and the current voice and context were considered to separate facial expressions and re render facial expressions. The whole link was too cumbersome, and the cost of time, manpower and material resources was too huge with the enlargement of the environment around each digital person, the continuous improvement of GPU computing power and Omniverse platform capability and version iteration in the future, we can finally realize the concept of metauniverse surging news: what is NVIDIA's Roadmap for digital people? When can the interaction between dummy and real person be achieved NVIDIA technical experts: NVIDIA has a special digital people research team in Silicon Valley and has a very clear road map in the first stage, we need to make an image of it. We use CG (computer animation) technology and real-time ray tracing rendering technology to make an image of it. For example, the 14 second Huang Renxun digital human separation stage has been very mature and can be achieved on many platforms. As long as there are experts and designers who understand CG art and stack the relevant animal collection data, lenses and even blood flow algorithms in the second stage, the digital virtual human is driven by some elements. At present, there are three popular elements in the world that can drive digital virtual human: one is video, which is similar to short video. It can do some actions of video characters and drive digital virtual human that is very similar in the first stage through video. The other is the "man in the middle", which comes from Japan. Now the most appropriate way to display is the actors in dynamic fishing clothes, just like the digital virtual human driven by "man in the middle" in blockbuster shooting. Another category is phonetic semantics. NVIDIA digital human research institute has selected phonetic semantics as the technical entry point to drive digital human. Why? The most easily obtained of these three categories is voice semantics, which is very convenient. Digital virtual human can be driven by a piece of voice or a piece of text the third stage is relatively sublimated. Each stage takes two to three years. We think that the third stage is the stage when the digital virtual human really reaches the application level product, which should be five to ten years later in the third stage, we can realize the communication and interaction between digital people and real humans, as well as between digital people and digital people. This is also called the interaction between real and virtual worlds. In fact, it also perfectly conforms to the concept of meta universe surging news: what are the conditions for truly entering the third stage of interaction between virtual and real people NVIDIA technical expert: the first stage is actually very mature, and there are many platforms to do it. The second stage is the initial stage. Some of our customers and partners have achieved the initial stage, and many algorithms have been delivered to the market. You can see that the logic behind many voice announcers is the technology of the second stage. We believe that this stage will enter a mature stage in two to three years in another two to three years, it will enter the third stage, the initial development stage and embryonic stage. The time point of about five years is ?
2023-03-22 10:04:41

新人小程序+APP定制199元起

发放福利，助力中小企业发展，真正在互联网中受益

点击询问定制

广告服务展示