Abstract Super nodes with large cardinalities remain serious threats to production networks. Super node identification is significant for network security and management, including network attacks detection such as DDoS attacks, spam emails, etc. Since the cardinality distribution exhibits dynamic change, most existing approaches are not able to adaptively allocate the memory size for nodes with small and large cardinalities in order to balance accuracy and memory usage in cardinality estimation. Moreover, there are not capable of simultaneously measuring multiple kinds of cardinalities and efficiently recover super nodes due to high calculation and memory cost by constructing data structures only once. To solve these problems, we present a data streaming approach for identifying super nodes based on novel summary data structures. The main idea of our approach is to design a changeable and reversible data structure, which increase its size according to the dynamic cardinality distribution, collect the information associated with cardinalities in network-wide view, and reconstruct super sources and destinations by simple inverse computation based on the aggregated data structure. We perform theoretical analysis and conduct extensive experiments on real network traffic. The experimental results show that the proposed approach can identify up to 96% super nodes with the low memory and computation requirement in comparison with state-of-the-art approaches.
展开▼