Skip to main content
Fig. 1 | Surgical Endoscopy

Fig. 1

From: Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy

Fig. 1

Overview of our proposed neural network, MS-TCN—Multi-Stage Temporal Convolution Network [22]. a The LC video is processed at 1 frame per second (fps). b Each frame is fed into a deep convolutional neural network—Resnet50 [9]. The Resnet50 model is trained to classify each frame’s associated surgical phase independently. Following training, the last prediction layer of the Resnet50 is removed, and all the network parameters are frozen (not trainable subsequently). For each frame, the Resnet50 produces a feature vector which expresses the visual information content of the frame as a lower dimensional (compared to the original frame) numerical “feature vector”. c All feature vectors from the input LC video are combined to form a sequence of feature vectors representing the entire LC video. This sequence is inserted to the MS-TCN model which consists of temporal convolution layers with a dilation rate that increases across layers. The temporal convolution layers capture temporal connections, and the increasing dilation setup enables the capturing of long term temporal dependencies. The final layer of the MS-TCN model outputs the surgical phase prediction for each frame in the video

Back to article page
