The human depth perception of 3D stereo videos depends on the camera separation, point of convergence, distance to, and familiarity of the object. This paper presents a robotized method for rapid and reliable test data collection during live open-heart surgery to determine the ideal camera configuration.
Stereo 3D video from surgical procedures can be highly valuable for medical education and improve clinical communication. But access to the operating room and the surgical field is restricted. It is a sterile environment, and the physical space is crowded with surgical staff and technical equipment. In this setting, unobscured capture and realistic reproduction of the surgical procedures are difficult. This paper presents a method for rapid and reliable data collection of stereoscopic 3D videos at different camera baseline distances and distances of convergence. To collect test data with minimum interference during surgery, with high precision and repeatability, the cameras were attached to each hand of a dual-arm robot. The robot was ceiling-mounted in the operating room. It was programmed to perform a timed sequence of synchronized camera movements stepping through a range of test positions with baseline distance between 50-240 mm at incremental steps of 10 mm, and at two convergence distances of 1100 mm and 1400 mm. Surgery was paused to allow 40 consecutive 5-s video samples. A total of 10 surgical scenarios were recorded.
In surgery, 3D visualization can be used for education, diagnoses, pre-operative planning, and post-operative evaluation1,2. Realistic depth perception can improve understanding3,4,5,6 of normal and abnormal anatomies. Simple 2D video recordings of surgical procedures are a good start. However, the lack of depth perception can make it hard for the non-surgical colleagues to fully understand the antero-posterior relationships between different anatomical structures and therefore also introduce a risk of misinterpretation of the anatomy7,8,9,10.
The 3D viewing experience is affected by five factors: (1) Camera configuration can either be parallel or toed-in as shown in Figure 1, (2) Baseline distance (the separation between the cameras). (3) Distance to the object of interest and other scene characteristics such as the background. (4) Characteristics of viewing devices such as screen size and viewing position1,11,12,13. (5) Individual preferences of the viewers14,15.
Designing a 3D camera setup begins with the capture of test videos recorded at various camera baseline distances and configurations to be used for subjective or automatic evaluation16,17,18,19,20. The camera distance must be constant to the surgical field to capture sharp images. Fixed focus is preferred because autofocus will adjust to focus on hands, instruments, or heads that may come into view. However, this is not easily achievable when the scene of interest is the surgical field. Operating rooms are restricted access areas because these facilities must be kept clean and sterile. Technical equipment, surgeons, and scrub nurses are often clustered closely around the patient to secure a good visual overview and an efficient workflow. To compare and evaluate the effect of camera positions on the 3D viewing experience, one complete test range of camera positions should be recording the same scene because the object characteristics such as shape, size, and color can affect the 3D viewing experience21.
For the same reason, complete test ranges of camera positions should be repeated on different surgical procedures. The entire sequence of positions must be repeated with high accuracy. In a surgical setting, existing methods that require either manual adjustment of the baseline distance22 or different camera pairs with fixed baseline distances23 are not feasible because of both space and time constraints. To address this challenge, this robotized solution was designed.
The data was collected with a dual-arm collaborative industrial robot mounted in the ceiling in the operating room. Cameras were attached to the wrists of the robot and moved along an arc-shaped trajectory with increasing baseline distance, as shown in Figure 2.
To demonstrate the approach, 10 test series were recorded from 4 different patients with 4 different congenital heart defects. Scenes were chosen when a pause in surgery was feasible: with the beating hearts just before and after surgical repair. Series were also made when the hearts were arrested. The surgeries were paused for 3 min and 20 s to collect forty 5-ssequences with different camera convergence distances and baseline distances to capture the scene. The videos were later post-processed, displayed in 3D for the clinical team, who rated how realistic the 3D video was along a scale from 0-5.
The convergence point for toed-in stereo cameras is where the center points of both images meet. The convergence point can, by principle, be placed either in front, within, or behind the object, see Figure 1A–C. When the convergence point is in front of the object, the object will be captured and displayed left of the midline for the left camera image and right of the midline for the right camera image (Figure 1A). The opposite applies when the convergence point is behind the object (Figure 1B). When the convergence point is on the object, the object will also appear in the midline of the camera images (Figure 1C), which presumably should yield the most comfortable viewing since no squinting is required to merge the images. To achieve comfortable stereo 3D video, the convergence point must be located on, or slightly behind, the object of interest, else the viewer is required to voluntarily squint outwards (exotropia).
The data was collected using a dual-arm collaborative industrial robot to position the cameras (Figure 2A–B). The robot weighs 38 kg without equipment. The robot is intrinsically safe; when it detects an unexpected impact, it stops moving. The robot was programmed to position the 5 Megapixel cameras with C-mount lenses along an arc-shaped trajectory stopping at predetermined baseline distances (Figure 2C). The cameras were attached to the robot hands using adaptor plates, as shown in Figure 3. Each camera recorded at 25 frames per second. Lenses were set at f-stop 1/8 with focus fixed on the object of interest (approximated geometrical center of the heart). Every image frame had a timestamp which was used to synchronize the two video streams.
Offsets between the robot wrist and the camera were calibrated. This can be achieved by aligning the crosshairs of the camera images, as shown in Figure 4. In this setup, the total translational offset from the mounting point on the robot wrist and the center of the camera image sensor was 55.3 mm in the X-direction and 21.2 mm in the Z-direction, displayed in Figure 5. The rotational offsets were calibrated at a convergence distance of 1100 mm and a baseline distance of 50 mm and adjusted manually with the joystick on the robot control panel. The robot in this study had a specified accuracy of 0.02 mm in Cartesian space and 0.01 degrees rotational resolution24. At a radius of 1100 m, an angle difference of 0.01 degrees offsets the center point 0.2 mm. During the full robot motion from 50-240 mm separation, the crosshair for each camera was within 2 mm from the ideal center of convergence.
The baseline distance was increased stepwise by symmetrical separation of the cameras around the center of the field of view in increments of 10 mm ranging from 50-240 mm (Figure 2). The cameras were kept at a standstill for 5 s in each position and moved between the positions at a velocity of 50 mm/s. The convergence point could be adjusted in X and Z directions using a graphical user interface (Figure 6). The robot followed accordingly within its working range.
The accuracy of the convergence point was estimated using the uniform triangles and the variable names in Figure 7A and B. The height 'z' was calculated from the convergence distance 'R' with the Pythagorean theorem as
When the real convergence point was closer than the desired point, as shown in Figure 7A, the error distance 'f1' was calculated as
Similarly, when the convergence point was distal to the desired point, the error distance 'f2' was calculated as
Here, 'e' was the maximum separation between the crosshairs, at most 2 mm at maximum baseline separation during calibration (D = 240 mm). For R = 1100 mm (z = 1093 mm), the error was less than ± 9.2 mm. For R = 1400 mm (z = 1395 mm), the error was ± 11.7 mm. That is, the error of the placement of the convergence point was within 1% of the desired. The two test distances of 1100 mm and 1400 mm were therefore well separated.
The experiments were approved by the local Ethics Committee in Lund, Sweden. The participation was voluntary, and the patients' legal guardians provided informed written consent.
1. Robot setup and configuration
NOTE: This experiment used a dual-arm collaborative industrial robot and the standard control panel with a touch display. The robot is controlled with RobotWare 6.10.01 controller software and robot integrated development environment (IDE) RobotStudio 2019.525. Software developed by the authors, including the robot application, recording application, and postprocessing scripts, are available at the GitHub repository26.
CAUTION: Use protective eyeglasses and reduced speed during setup and testing of the robot program.
2. Verify the camera calibration
3. Preparation at the start of the surgery
4. Experiment
CAUTION: All personnel should be informed about the experiment beforehand.
5. Repeat
6. Postprocessing
NOTE: The following steps can be carried out using most video editing software or the provided scripts in the postprocessing folder.
7. Evaluation
An acceptable evaluation video with the right image placed at the top in top-bottom stereoscopic 3D is shown in Video1. A successful sequence should be sharp, focused, and without unsynchronized image frames. Unsynchronized video streams will cause blur, as shown in the file Video 2. The convergence point should be centered horizontally, independent of the camera separation, as seen in Figure 9A,B. When the robot transitions between the positions, there is a small shake in the video, which is to be expected at a transition velocity of 50 mm/s. With the too large separation between the right and left image, the brain cannot fuse the images into one 3D image, see Figure 9C and Video 3.
The position of the heart in the images should be centered during the entire video, as shown in Figure 1C. Several reasons can cause this to fail: (1) The convergence point is too far away from the heart, see Figure 7. The camera positions relative to the patient can be modified from the robot application setting screen (Figure 6B). (2) The camera tool coordinate system is not properly configured. The robot program will simultaneously move the camera symmetrically in a radial motion around the convergence point (Figure 2C) and rotate the cameras around the camera tool coordinate system (Figure 5). If the camera adaptor plates (Figure 3) are assembled or mounted incorrectly, the default values will not work. Rerun step 2.1-2.4 and ensure that the crosshairs in the recording application (Figure 6) point at the same object during the full robot motion. When adjusting the coordinate frames, ensure that the object used for calibration (Figure 4) is centered between the cameras; otherwise, the calibration will result in non-symmetrical coordinate frames.
If the colors are incorrect after debayering with the debayering application (Figure 8), the captured videos have the wrong debayering format. This requires the user to modify the code for the debayering application or use another debayering tool. Similarly, if the automatic synchronization between the stereo videos failed, the user should use video editing programs such as Premiere Pro to align the videos.
To analyze the results, the video should be displayed on a 3D projector for the intended audience. The audience can subjectively rate how well the 3D video corresponds to the real-life situation. The labels added in step 6.3 can be used to score different distances.
Figure 1. Placement of convergence point. Different placement of convergence points relative to the object of interest (grey dot). (A) Convergence point in front of the object, (B) behind the object, and (C) on the object. The midline for each camera image is shown with a dotted line. The surgeon is shown from above, standing between the cameras. At the top, the resulting position of the object in the left and the right camera images are displayed relative to the midline. Please click here to view a larger version of this figure.
Figure 2: Robot motion. The camera separation was increased from (A) 50 mm to (B) 240 mm with incremental steps of 10 mm. (C) The robot moved the cameras radially, always pointing the cameras toward the convergence point – the heart. Here the distance D is the distance between the cameras, R is the radius 1100 or 1400 mm, and a is the angle of the cameras, sin(a) = D/2R. The right and left cameras were angled a degree in the negative and positive direction, respectively, around the tool Z-axis. Please click here to view a larger version of this figure.
Figure 3: Mounting cameras on the robot. (A) Exploded view of the components for one camera: lens, camera sensor, camera adaptor plate, circular mounting plate, robot wrist, and screws. The two assembled camera adaptors are shown from (B) the robot side and (C) the front. (D)Adaptors attached to the robot wrist with four M2.5 screws. (E) USB cables connected to the cameras. Please click here to view a larger version of this figure.
Figure 4: Camera calibration with the recording application. A calibration grid and a screw-nut were used to calibrate the camera tool coordinate systems relative to the robot writs. The cameras should be angled so that the nut is in the center of the images. Please click here to view a larger version of this figure.
Figure 5: Camera tool coordinate system. The X-axis (red), Y-axis (green), and Z-axis (blue) of the camera tool coordinate system. Please click here to view a larger version of this figure.
Figure 6: The robot application. (A) Display of the main screen on the touch display for running the experiments. (B) The setup screen for tool calibration and adjustment of the convergence point. Please click here to view a larger version of this figure.
Figure 7: Error estimation. Convergence error (A) above and (B) below the desired convergence point. The horizontal baseline distance (D = 240 mm), the distance between the cameras, and the convergence point (R = 1100). The vertical distance between the cameras and the convergence point (z = 1093 mm), the maximum separation between the image center points (crosshairs) (e = 2 mm), the vertical error distance when the real convergence point is above the desired convergence position (f1 = 9 mm). The vertical error distance when the real convergence point is below the desired convergence position (f2 = 9.2 mm). Figure not drawn to scale. Please click here to view a larger version of this figure.
Figure 8: Postprocessing applications for debayering and merging. (A) Start and (B) Finish screens of the debayer application. (C) Start and (D) Finish screens of the merge application. Please click here to view a larger version of this figure.
Figure 9: Snapshots of finished stereo videos. Only every other pixel row was used from the original images to comply with standard top/bottom 3D stereo format. Upper images are from the right camera and lower from the left camera. (A) 3D stereo image with 50 mm baseline distance and the convergence point on the OR-table behind the heart. (B) 3D stereo image with 240 mm baseline distance and the convergence point at the OR-table behind the heart. (C) 3D stereo image with 240 mm baseline distance and the convergence point 300 mm behind the heart. Please click here to view a larger version of this figure.
Video 1. Stereo 3D video at 1100 mm. The convergence point is on the heart, 1100 mm from the cameras. The video starts with a baseline distance of 50 mm (A) and increases with steps of 10 mm to 240 mm (T). Please click here to download this Video.
Video 2. Unsynchronized stereo 3D video. The right and left videos are not synchronized which causes blur when viewed in 3D. Please click here to download this Video.
Video 3. Stereo 3D video at 1400 mm. The convergence point is behind the heart, 1400 mm from the cameras. The video Please click here to download this Video.
During live surgery, the total time of the experiment used for 3D video data collection was limited to be safe for the patient. If the object is unfocused or overexposed, the data cannot be used. The critical steps are during camera tool calibration and setup (step 2). The camera aperture and focus cannot be changed when the surgery has started; the same lighting conditions and distance should be used during setup and surgery. The camera calibration in steps 2.1-2.4 must be carried out carefully to ensure that the heart is centered in the captured video. To troubleshoot the calibration, the values of the camera tool coordinate system can be verified separately by jogging the robot in the coordinate system (step 2.3.3). It is critical to test the full robot program and cameras together with the recording application before the surgery. The height of the operating table is sometimes adjusted during surgery; the height of the robot cameras can also be modified live in the robot application (step 3.4) to keep the desired distance to the heart. The distances and wait times of the robot program can be modified as described in step 2.5.
One limitation of this technique is that it requires that the surgery is paused; therefore, data collection can only be carried out when it is safe for the patient to pause the surgery. Another limitation is that it requires physical adaptation of the operating room to mount the robot in the ceiling and the programmed robot motion assumes that the robot is centered above the heart. Additionally, the cameras are toed-in instead of parallel, which can cause a keystone effect. The keystone effect can be adjusted in postproduction29,30,31.
An array of multiple cameras placed on an arc can be used to collect similar data23. The camera array can capture images simultaneously from all cameras; thus, surgery can be paused for a shorter time. A source of error for a camera array is that the cameras can have a slightly different focus, aperture, and calibration and when videos from different camera pairs are compared, other parameters than the baseline distance can affect the image quality and depth perception. Another drawback with a camera array is that the step size between baseline distances is limited by the physical size of the cameras. For example, the lens used in this study has a diameter of 30 mm, which would equal the minimum possible step size. With the setup presented in the study, step sizes of 10 mm were tested but could be set smaller if necessary. Also, with the array setup, height and convergence distance cannot be dynamically adjusted.
Another alternative is to manually move the cameras to predefined positions22. This is not feasible during live heart surgery because it would infringe on critical surgical workspace and time.
This method is applicable to many types of open surgery, including orthopedic, vascular, and general surgery, where optimal baseline and convergence distances are yet to be determined.
This method can also be adapted to collect images for purposes other than 3D visualization. Many computer vision applications use the disparity between images to calculate the distance to an object. A precise camera motion can be used to 3D scan stationary objects from multiple directions to create 3D models. For 3D localization, the 3D viewing experience is less important if the same points on the object can be identified in different images, depending on accurate camera positioning, camera calibration, light conditions, and frame synchronization.
Robot-controlled camera positioning is both safe and effective for collecting video data for the identification of optimal camera positions for stereoscopic 3D video.
The authors have nothing to disclose.
The research was carried out with funding from Vinnova (2017-03728, 2018-05302 and 2018-03651), Heart-Lung Foundation (20180390), Family Kamprad Foundation (20190194), and Anna-Lisa and Sven Eric Lundgren Foundation (2017 and 2018).
2 C-mount lenses (35 mm F2.1, 5 M pixel) | Tamron | M112FM35 | Rated for 5 Mpixel |
3D glasses (DLP-link active shutter) | Celexon | G1000 | Any compatible 3D glasses can be used |
3D Projector | Viewsonic | X10-4K | Displays 3D in 1080, can be exchanged for other 3D projectors |
6 M2 x 8 screws | To attach the cXimea cameras to the camera adaptor plates | ||
8 M2.5 x 8 screws | To attach the circular mounting plates to the robot wrist | ||
8 M5 x 40 screws | To mount the robot | ||
8 M6 x 10 screws with flat heads | For attaching the circular mounting plate and the camera adaptor plates | ||
Calibration checker board plate (25 by 25 mm) | Any standard checkerboard can be used, including printed, as long as the grid is clearly visible in the cameras | ||
Camera adaptor plates, x2 | Designed by the authors in robot_camera_adaptor_plates.dwg, milled in aluminium. | ||
Circular mounting plates, x2 | Distributed with the permission of the designer Julius Klein and printed with ABS plastic on an FDM 3D printer. License Tecnalia Research & Innovation 2017. Attached as Mountingplate_ROBOT_SIDE_ NewDesign_4.stl |
||
Fix focus usb cameras, x2 (5 Mpixel) | Ximea | MC050CG-SY-UB | With Sony IMX250LQR sensor |
Flexpendant | ABB | 3HAC028357-001 | robot touch display |
Liveview | recording application | ||
RobotStudio | robot integrated development environment (IDE) | ||
USB3 active cables (10.0 m), x2 | Thumbscrew lock connector, water proofed. | ||
YuMi dual-arm robot | ABB | IRB14000 |