
Introduction
Ubiquitous camera networks in smart cities create massive amounts of images and videos at a range of spatial-temporal scales. However, the capabilities of visual processing systems often lag behind the rapid growth of video data and city brain system. To address this challenge, a novel collaborative visual computing framework, termed as digital retina, has been established to align high-efficiency and intelligent perception models with the emerging visual coding for machines. Within this framework, video stream, feature stream, and model stream work collaboratively over the end-edge-cloud platform. Specifically, the compressed video stream targets human vision, the compact feature stream serves for machine vision, and the model stream incrementally updates deep learning models to improve the performance of human/machine vision tasks. The digital retina system enables comprehensive, intelligent, and efficient interactions among retina-like cameras, edge servers, and cloud centers through multiple data streams. It is expected to play a fundamental role in visual big data analysis and retrieval in smart cities. Standardization of digital retinal systems can bring remarkable benefits such as efficient utilization and real-time processing of massive visual data, full utilization of resources, and achieving competitive performance by processing the original visual signals.

A series of standards for digital retina systems is planned by the 3161 Working Group of IEEE Computer Society Data Compression Standards Committee with 12 parts, including: system architecture, end subsystem, edge subsystem, cloud subsystem, algorithm and model repository, storage system, end-edge-cloud collaboration, security and privacy protection, protocols and interfaces, test specification, Measurement and evaluation system, and application guideline. This standard is the first part—-system architecture.
Overview of the Standard
IEEE Std 3161-2022 aims to establish a unified architecture for visual computing systems. By standardizing the collaborative framework among the end, edge, and cloud, along with the cooperative characteristics of multiple data streams, it seeks to reduce the data transmission pressure within the system and alleviate the computational burden on the cloud. Ultimately, it enhances both the efficiency and the performance of video data processing in large-scale applications. On this foundation, this standard specifies a biologically-inspired visual computing framework, named as digital retina, in which three streams, i.e., video stream, feature stream, and model stream, work collaboratively for real-time analysis and processing of video big data. It mainly defines the architecture, components, and functional requirements of digital retina systems. In addition, this standard addresses the fields of visual perception systems and visual information processing technologies. It is applicable to various application scenarios such as intelligent transportation, public safety and intelligent manufacturing in smart cities.
Key Features and Benefits
IEEE Std 3161-2022 outlines a reference architecture, technical characteristics, components, and functional requirements for digital retina systems. The end-edge-cloud collaboration and multi-stream cooperation mechanisms are established, ensuring efficient interaction across all levels. Specifically, technical characteristics of digital retina systems are specified, including globally unified time-space ID, efficient video coding, compact feature representation, model-updatability, software-definability, attention-adjustability, etc. On this foundation, end subsystem is defined as a subsystem mainly used for the perception of scenario information, with such functions as data acquisition, processing, analysis, and transmission. Edge subsystem is defined as a subsystem using edge computing and providing functionalities of multi-channel data aggregation and forwarding, cooperative resource scheduling, and data computing. Cloud subsystem is defined as a subsystem using cloud computing and providing functionalities of system management and collaborative interaction, data aggregation and storage, data collaborative analysis, mining, and decision-making at the global level.
This standard provides an advanced end-edge-cloud collaborative computing architecture for massive video acquisition, processing and transmission, which optimizes resource utilization, and enhances video processing efficiency and performance. Moreover, efficient data processing and transmission mechanisms significantly reduce bandwidth and storage demands, resulting in cost savings for stakeholders. The stakeholders involve and are not limited to end, edge, and cloud device manufacturers or service providers, AI algorithm providers, system developers or integrators.
Adoption and Impact
IEEE Std 3161-2022 has been adopted and implemented on the internet of video things system, thereby overcoming the bottlenecks in large-scale video data processing, effectively reducing the cost of artificial intelligence applications and facilitating the establishment of technology ecosystem for internet of video things. Within this system, end devices continuously capture video streams in real-time, with intelligent analysis performed either at the end device or at the edge, while the cloud is utilized for data mining and decision-making. Relevant systems developed based on IEEE Std 3161-2022 have been deployed across various industries such as safety production, transportation and public safety. In intelligent transportation, the system realizes real-time traffic flow monitoring and analysis and generates real-time traffic congestion heatmaps and accident risk prediction models. It provides fast and accurate decision support for adaptive traffic signal control, thus enhancing traffic management efficiency significantly. In public safety, the system enables rapid and extensive detection of abnormal events, strengthening overall monitoring and response capabilities of societal security.
Digital retina represents a complex technological framework. The adoption of this standard facilitates research in key technologies such as video encoding, feature encoding, feature interoperability, model production, and end-edge-cloud collaboration. Furthermore, the implementation in scenarios like intelligent transportation and smart manufacturing of smart cities will drive intelligent upgrades and technological advancements in related industries, so that industry practices are evolving towards the direction of more collaborative and standardized.
Future Developments
Based on the foundation built upon IEEE Std 3161-2022, IEEE 3161.9-2023 “Standard for Protocols and Interfaces of Digital Retina Systems” was officially published in February 2024. This standard further defines the communication protocols and data interfaces for transmitting videos, features, models, and control information within the digital retina system, facilitating interoperability among end, edge, and cloud devices.
At present, seven additional standards are being developed, including end subsystem (P3161.2), edge subsystem (P3161.3), cloud subsystem (P3161.4), algorithm and model repository (P3161.5), and so forth. Moreover, the IEEE C/DC 3161 WG is starting the development of standards for the test specification, the measurement and evaluation system, and the application guideline, aiming to establish a comprehensive and scientific standard system for all aspects of digital retina technology applications. The IEEE C/DC 3161 WG will continuously promote the development and iteration of this series of standards to address the challenges of emerging technologies.
Visit the home page of IEEE C/DC 3161 WG to explore more details:
https://sagroups.ieee.org/3161/
Conclusion
IEEE Std 3161-2022 defines an end-edge-cloud collaborative computing architecture of digital retina systems, in which video stream, feature stream, model stream, and control flow are generated and transmitted across the system, thereby enhancing processing efficiency and performance of large-scale video data. The adoption of this standard in domains such as intelligent transportation, public safety, and smart manufacturing has accelerated technological innovation and driven the intelligent transformation of related industries. It is recommended that readers explore this standard further and consider its application in their respective fields. Detailed information and purchasing options for the standard are available at the following link:
https://store.accuristech.com/ieee/standards/ieee-3161-2022?vendor_id=10879&product_id=2501653
Authors
Yaowei Wang (Chair), Wenwu Zhu (Vice Chair), Wen Ji (Vice Chair), Xinbei Bai, Peng Chen, Hongyu Chi, Lingyu Duan, Dong Feng, Wen Gao, Xuesong Gao, Kui Hou, Jun Li, Pan Li, Changyu Liu, Xiaoxu Luan, Wenqi Ren, Yonghong Tian, Feng Wu, Peng Yang, Jinyu Yuan, Chunhao Zhao, Haojie Zhao, Weishi Zheng, and Yunhong Zhou
Disclaimer: The authors are completely responsible for the content of this article. The opinions expressed are their own and do not represent IEEE’s position nor that of the Computer Society nor its Leadership.