Abstract:The manual method of extracting abstract features to capture video anomalies was no longer applicable to large-scale aquaculture due to the problems of insufficient feature learning, difficult feature selection and poor generalization. In this study, computer vision technology was introduced into the study of fish movement behavior anomaly detection, and an unsupervised learning approach was used to propose a fish movement behavior anomaly detection method that combined multilayer memory enhancement and residual spatio-temporal transformer to effectively extract the motion correlation and appearance characteristics of fish. Firstly, based on U-Net network, its encoder and decoder were used to implement encoding and decoding of video frames, and behavior anomaly detection was achieved based on the difference between predicted and real frames. In order to strengthen the connection of spatio-temporal information features between consecutive video frames, the residual temporal transformer module and the residual spatial transformer module were proposed to enhance the network’s ability to model temporal and spatial information. Since the convolutional neural network had certain generalization ability, the memory enhancement module was used instead of the jump connection in the U-Net network to alleviate the ability of the encoder to represent the anomalous frames. In addition, Generative Adversarial Networks was used to generate more realistic prediction frames, thereby improving the detection accuracy of the network. The results indicated that this method could effectively extract the motion and appearance characteristics of fish. On two self-made fish datasets, the area under the curve(AUC) reached 0.916 and 0.921, respectively, achieving fish movement behavior anomaly detection.