Project: ScreenshotMatcher - Taking Smartphone Photos to Capture Screenshots

Application that allows creating screenshots of a computer screen by photographing it with a smartphone.

Status: ongoing

Runtime: 2020 -

Participants: Andreas Schmid, Thomas Fischer, Alexander Weichart, Alexander Hartmann

Keywords: cross device communication, interaction technique

Even though all broadly used desktop operating systems ship with dedicated tools for creating screenshots, instead a smartphone camera is often times used to capture the contents of a computer screen.

In a 2020 survey among 66 university students and employees (31 male, age 19-39), we found that 97% regularly took screenshots - mostly of pictures, web pages, text documents, and program code. 52% used only the screenshot function of their device, 6% only ever took photos of their screen, and 42% did both, depending on the situation. Whereas screenshots were often used for personal documentation, screen photos were seen as faster and more convenient when sharing information with others.

There are several reasons for using a smartphone camera instead of taking screenshots:

  • Novice users might not know how to use screenshot tools.
  • Taking a screenshot saves the image on the PC. If users want to share the image with an instant messenger on their smartphone, taking a photograph is faster than transferring the image from PC to smartphone.
  • Camera applications provide a viewfinder which allows for selecting a region of interest before capturing the image.
  • People are used to taking photographs of all kinds of things with their smartphones, therefore this is a rather natural interaction. Why should the form of interaction be different for the contents of screens?

These advantages outweigh the fact that photographs of screens are commonly very low in quality because of reflections, distortions and moiré patterns (Fig. 1). Furthermore, the file size of screen photos is significantly higher than that of screenshots with the same content because of modern smartphone's high resolution cameras.

Figure 1: Degradations that occur in screen photos. From left to right: Moiré pattern, reflections, perspective distortion.


Goal of this project is to provide a straightforward solution to the problem of low quality screen photos. Users should be able to use the interaction technique of photographing screens but should also be provided with high quality images of the screen's content. To make the interaction as close as possible to capturing a normal photograph, the whole process should feel responsive and reliable. In case of a failure, there should be a fallback mechanism providing the user with an image at least as good as a normal photograph.

In future versions, the system should not only work on private devices but also on public displays. Additionally, the method of matching camera images to screen content could be used as a foundation for more sophisticated (e.g. real-time) applications.


To combine the advantages of screen photos and screenshots, we developed ScreenshotMatcher, an extensible interaction technique for capturing impeccable screenshots of screen regions by taking a photo with a smartphone camera. ScreenshotMatcher is a two-part application: A smartphone app takes screen photos which are then sent to an application running on the host computer. The host application applies feature matching to find the photographed region within the screen contents, extracts the region of interest and sends it back to the smartphone where it can be shared with others or stored in the gallery – just like with a normal camera application.

All communication between phone and PC takes place via WiFi. Therefore, ScreenshotMatcher can be used with all phone/PC combinations within the same network. PCs running the ScreenshotMatcher desktop application are automatically discovered by the smartphone app and connection can be established via a list of available devices.

The ScreenshotMatcher smartphone application resembles a typical camera app (Fig. 2, left). The top right area indicates the connection status and doubles as a button which opens a list of available PCs running the ScreeshotMatcher desktop application. Once the capture button is pressed, the app extracts the current image from the live feed. The image is scaled it down in resolution and converted to grayscale to save bandwidth and match the requirements of later processing steps. This image is then sendt to the connected PC via HTTP, where the extraction of the photographed region from an actual screenshot happens.

After a result image from the PC has been sent back to the smartphone, one of two result screens (depending on success of the process) is displayed. If the screenshot could be extracted successfully, the result screen (Fig. 2, center) previews the result image and provides options to share the image with other applications, save it to the phone's gallery, or re-capture the image if the user is not satisfied with the result. In case of a failure, users have the option to try again or to request a full screenshot from the PC and manually crop the image.

The ScreenshotMatcher smartphone application is currently only available for Android phones.

Figure 2: The three screens of the Android app:
Main Screen (left): (a) live view of the smartphone's main camera, (b) capture button, (c) change settings, (d) display recently saved screenshots, (e) indicator for the current connection status.
Result Screen (center): (f) result image, (g) buttons to switch between cropped and full screenshot, (h) buttons to share or save the image.
The Failed-Screen (right) is shown if the matching process was unsuccessful.


The desktop counterpart of the smartphone app is a cross-platform daemon written in Python 3.9 and OpenCV 4. Once it receives a screen photo from the smartphone, a screenshot of the computer screen is captured. Keypoints in both images are defined with the ORB keypoint detector and matched together with a feature matching algorithm (brute force matcher + hamming distance and KNN). After bad matches have been discarded with Lowe's ratio test, a homography between screen photo and screenshot is calculated with RANSAC and both images are aligned with a perspective transformation. This way, the photographed region can be extracted from the screenshot by cropping it to an axis-aligned bounding box around the screen photo. To avoid false positives, the result's size and dimensions are validated. In case of a successful match, the result image is sent back to the smartphone.

Figure 3: Processing pipeline of ScreenshotMatcher. A feature matching algorithm searches for the photographed region of interest within a screenshot. This region is then cropped and sent back to the smartphone as a result.


The desktop application can be customized via a tray menu. This way, requesting full screenshots can be disabled for privacy reasons and there is an option to restrict requests from unknown phones.

Especially for regions with lots of text, the matching algorithm is still somewhat unreliable. In this case, it is recommended to request a full screenshot from the PC and crop the result manually.

For ScreenshotMatcher to be a usable in practice, the matching algorithm should deliver the correct screenshots reliably and as quickly as if the user had only taken a normal photo. Therefore, we systematically compared keypoint detectors, feature matchers, and associated parameters in terms of recognition rate and computation time. Even though feature extraction with artificial neural networks is commonplace, we restricted ourselves to comparing standard computer vision algorithms as the training set requirements and the computational effort are not justified if an approach with less overhead can deliver sufficiently accurate results.

To compare the different matching algorithms and later evaluate the system, a data set of 68 screenshots in 1080p resolution were captured on 3 different operating systems (Debian GNU/Linux, Windows 10, MacOS). They were categorized as GUI, icons, text, article (combination of text and images) or image. The data set was split up into a optimization data set and an evaluation data set to avoid bias towards the data set. Two screenshots of each content category were used to compare the different algorithms (Fig. 4), the remaining 58 screenshots were used for the evaluation of the final system. We then asked nine colleagues to display each of the screenshots full-screen on a computer screen and take a photo of an interesting region as if they were about to share the content with a friend or colleague. As some of them owned multiple phones or monitors, 16 data sets of 68 photographs (1088 total) could be collected this way.

Figure 4: Optimization data set. Each category is represented by two images (from left: GUI, Icons, Text, Article, Image).


We included the keypoint detectors SIFT and SURF because of their high accuracy and acceptable speed, as well as BRISK and ORB because of their good balance between computation time and accuracy. The first variable of interest was the image size of the scaled-down photograph which should be minimized while keeping an acceptable success rate, as the transfer of images between devices is the most time-consuming part of the whole process. Because of the similar aspect ratios of different smartphone cameras, we use the length of an image's long edge in pixels as a measure for image size and compared sizes between 128 and 2048 pixels (Fig. 5). For each keypoint detector, all suitable feature matching algorithms included in OpenCV were compared. Furthermore, different thresholds were tested for each detector/matcher combination.

The success rate and processing time of all matcher/parameter combinations were compared by computing matches between the photographs and screenshots in our test data set (Fig. 4) on an HP EliteBook 850 G4 (Intel i7 CPU with 2.7 GHz, Intel HD Graphics 620, 16 GB RAM).

We found an image size of 512 pixels (long edge) to be the sweet spot as smaller sizes lead to low success rates and larger sizes hardly increase success rate. The fastest matching algorithm (mean: 95 ms, sd: 16 ms) for this image size is an ORB keypoint detector (feature limit: 2000), together with a brute force matcher using Hamming distance. This combination achieves a success rate of 89% which we consider accurate enough for use in an interactive application where users can repeat the process until they get a positive result.

Figure 5: Comparison of keypoint detectors for different image sizes in terms of success rate and computation time. As responsiveness of the system is important, processing times of over 500 ms were considered as unsuccessful.


The final matching algorithm (ORB keypoint descriptor and a brute force feature matcher using Hamming distance) was evaluated with the evaluation data set described earlier.

As we excluded the ten images that we already had used for selecting the best algorithm, the data set contains 16 sets of photos of 58 screenshots (928 photos in total). The evaluation was run on the same hardware as the optimization. Before the evaluation, all photographs were converted to grayscale and scaled down so the long side was 512 pixels wide. No further pre-processing was applied to the images. Each of those images was then passed to the matching algorithm together with the corresponding original screenshot. We measured how well the matching algorithm performs on a realistic data set (1), and how processing time and success rate are affected by the content of the image (2) and the phone/screen combination used to capturing the photo (3).

1. Success Rate and Processing Time.

The system could detect matches between screenshot and photograph for 86.9% of the complete data set. Mean computation time was 90 milliseconds (range: 57 – 336 ms, sd: 24 ms). This confirms the results from the optimization step. While not sufficient for applications such as real-time optical tracking, the computation time of the matching process is short enough to be perceived as responsive.

2. Effect of Image Content.

As the success of keypoint detection algorithms is dependent on the content of the image, we investigated whether the algorithm selected for ScreenshotMatcher is suitable for all real-world use cases. The screenshots in the evaluation data set were divided up into five categories: graphical user interfaces, text, articles (combination of text an images, e.g. most websites), icons (e.g. a file explorer) and images (Fig. 4). Both processing time (mean: 84 – 93 ms, sd: 11 – 22 ms) and success rate (84% – 93%) were in a similar range for all categories. For seven individual screenshots, a success rate below 75% was found. Those screenshots were spread across all categories but had in common that they were either very cluttered or contained very few recognizable elements.


Content Category GUI Text Article Icons Image
Mean Processing Time (in ms) 84.2 (sd: 11.2) 93.4 (sd: 12.2) 84.1 (sd: 8.3) 86.1 (sd: 9.2) 85.6 (sd: 21.8)
Success Rate 85.1% 90.1% 84.4% 92.9% 83.8%

3. Effect of Phone and Screen.

The evaluation data set contains photos of nine different screens (laptop and desktop monitors) captured with nine different smartphones (total phone/screen combinations: 16). Mean computation time was similar for all combinations with values between 80 and 90 milliseconds (sd: 9 – 15 ms). For the success rate of the matching process, a bigger influence of the phone/screen combination could be observed. Success rates ranged from 71% (Samsung Galaxy A3 + Sony Vaio 17“ laptop with glossy display) to 97% (OnePlus One + Dell 24” matte monitor).

Figure 6: Comparison of different phone/screen combinations. Boxplots represent computation time, barplots represent success rate.


To test ScreenshotMatcher in a real-world context, we asked 19 participants from our computer science department (13 male, 6 female) to use the application over the course of one week however they wanted. During the study, metadata about each screenshot taken (participant ID, timestamps, used matching algorithm, and match success) with ScreenshotMatcher was sent to a log server. No image data was logged to preserve participant's privacy. We also did not collect any feedback during use of the application in order to not affect how participants used it. After the week of use, 14 of the 19 participants answered a questionnaire about their usage of ScreenshotMatcher, their personal assessment of its performance and usability, and which problems occurred during the study.

Eleven participants used it to send screenshots from the PC using the phone's instant messenger. Four participants had problems with an unstable connection between phone and PC. Seven participants reported that the wrong region of the screenshot was extracted on some occasions. Suggested improvements were the possibility to further crop the screenshot within the app, annotating the screenshot, recording animations, and integrating ScreenshotMatcher in the default camera app. Twelve of the 14 participants stated that they would continue using ScreenshotMatcher after the study.

A total of 635 images were captured with ScreenshotMatcher over the course of the study. However, one participant alone captured 326 screenshots whereas ten participants captured less than ten screenshots. Mean processing time (from pressing the capture button to the result being displayed on the phone) for successful matches was 878 ms (sd: 806 ms, range: 287 – 6588 ms). Mean processing time of the matching algorithm was 178 ms (sd: 235 ms, range: 41 – 1964 ms), indicating that participants' computers had on average less processing power than our reference hardware. Only 47.4% of screen photos were recognized successfully, much less than in the technical evaluation ( 85%).

As we did not store the screen photos, we do not know for sure what the reasons are.

ScreenshotMatcher is a two part application: an Android app and a python program running on the PC (Windows, MacOS, or Linux). To use ScreenshotMatcher, both phone and PC have to be connected to the same WiFi network.

On Windows, a setup wizard can be used for the installation, on MacOS and Linux a working python installation is required.

Windows

Step 1: Download the setup wizard from the latest release.

Step 2: Run the setup wizard to install the application. By default, ScreenshotMatcher is configured to automatically start when the system is turned on. On some systems, a warning due to the installation of software from unknown sources pops up before the installation. This warning can be ignored. If you are unsure, there should be a virustotal scan link on the release page

Step 3: After the installation, ScreenshotMatcher starts automatically and can be accessed via an icon in the system tray (bottom right). In case a warning from Windows Defender pops up, it can be ignored.

Done! You can now continue with installing the Android app.

Linux and MacOS

Requirements:

Step 1: Download the desktop application's source code from the latest release and extract it.

Step 2: Open a terminal and navigate to the directory of the extracted archive. Enter the Screenshotmatcher directory and install requirements by running pip install -r requirements.txt. On some systems, such as Debian based Linux distributions, it might be necessary to use pip3 instead of pip.

Step 4: Start ScreenshotMatcher either executing sh ScreenshotMatcher, clicking on the ScreenshotMatcher shortcut in a visual file manager or running python ./python-server/src/main.py. Debian may require the use use of python3 instead of python.

Done! You can now continue with installing the Android app.

Step 1: Download the ScreenshotMatcher APK from the latest release on your smartphone. For ease of use, we recommend using the provided QR-Code there.

Step 2: Open the downloaded screenshotmatcher.apk on your Android phone to start the installation. Make sure the Android setting for “Allow Installation from Unknown Sources” is enabled for your installation source (browser, file manager, etc.). Warnings regarding the app not being certified by Google Play can be ignored.

Step 3: Start the ScreenshotMatcher app and grant the privileges it asks for. (camera, file access)

Step 4: In case ScreenshotMatcher is already running on the PC, the app should connect automatically. A successful connection is indicated by a green area in the top right corner of the app. If the area is red, click it to open a list with devices available for connection.

Done! As soon as the connection between phone and PC is established, ScreenshotMatcher can be used.

Use the ScreenshotMatcher app to photograph your PC's screen. The photographed region is then extracted from the screen's content and the result is sent back to the phone. This way, high quality screenshots can be captured by photographing a screen region with the phone.

If the photographed region could be found within the screen's content, the result image can be shared with other applications by clicking Share cropped, or saved to the phone's gallery by clicking Save cropped. In case the extraction failed or you dislike the result, a screenshot of the whole screen can be requested by switching from Cropped to Full with the top button.


Andreas Schmid, Thomas Fischer, Alexander Weichart, Alexander Hartmann, Raphael Wimmer

Proceedings of the Mensch und Computer 2021

Application that allows creating screenshots of a computer screen by photographing it with a smartphone. (Tweet this with link)