In this paper, the results of the study concerning modeling the processes of human perception and recognition of audio signals are discussed. Based on the analysis of reliable psychophysical data, a model (conception) is synthesized within the framework of the Neyman-Pearson approach, which is well-known in the theory of statistical decisions. This model allows a universal view of numerous facts and dependences that have been established by now in relation to the auditory system. A sequential recognition procedure was developed around solid data on the structure and functions of the auditory system. Optimization of the recognition procedure revealed that the perceptually meaningful feature set representing an audio signal consists of sampled values of component envelopes taken in certain time instants. These instants depend on the pattern for which the similarity hypothesis is tested. In the general case, the time instants are not equidistant, contrary to the existing speech recognition techniques. We show how this peculiarity is related to the well-known psychophysical Weber-Fechner law. Theoretical study of the recognition procedure is supplemented by the discussion of possible realization issues. It is shown that the realization of the procedure, especially recursive realization, results in fast and efficient numerical algorithms. These algorithms can be naturally realized on structures similar to neural networks. The relation is considered between recursive realization and well-established recognition techniques, such as LCP (PLP) and MFCC.
展开▼