Intelligent User Interfaces

From CS2610 Fall 2015
Jump to: navigation, search

slides

Readings

Reading Critiques

Matthew Barren 19:32:03 10/30/2015

Summary: The authors of Sikuli: Using GUI Screenshots for Search and Automation examine the use of targeted screenshots to provide faster search and automation by using graphical targets. Using automation through graphical targets has a lot of promising opportunities. One of the examples noted, related to a observing babies with a dot to inform a system whether the child has turned over or not. Another potential area is observing a color change, which could notify of a state change in an experiment. For example, examining a litmus sticker and an automated titration. When the fluid gets to a critical pH, the state change will activate the sticker, and a computer system will be notified to handle the action accordingly. Additionally, this could be extended for personal use. For example, an individual may want to his/her email to pop up when an important inbox message reaches his account. The detection of the user name could pull up the email application. The screenshot search is effective for a particular subset of searches. These searches involve a physical or digital representation of the object in question. Frequently this is not the case. Imagine accessing a computer, and searching for a class that you are looking for at Pitt. In this instance, it may be more of a hassle to get an image of this class than to simply type the text. Additionally, the queries using a screenshot only address the question of “what”. For example, what does this lasso icon mean. Often, a user wants to do a more expressive search, such as how does the lasso icon work. Or how can I use the lasso icon to do a particular action. This expressiveness is lost when it is only accompanied with one screenshot and some surrounding characteristics. One way to make search more expressive is to utilize multiple screenshots to search for an item. Using these compound gestures the user may be able to more clearly present their search. Although this may defeat the purpose of using screenshots to search, the user could supplement the screenshot with a short text description, such as “how” or “why”. These key search types get lost when only searching a screenshot.

Vineet Raghu 12:23:11 10/31/2015

Sikuli: using GUI Screenshots for Search and Automation The authors here present Sikuli, which is a two-fold application that allows users to search for help and documentation for UI elements via a screenshot query, and allows users to write scripts which utilizes screenshots in the scripts themselves to perform automated UI alterations. Sikuli search uses three visual features (surrounding text, visual words, and embedded text) to perform a search over many UI manuals, websites, etc. to find help for UI elements. In addition, they allow screenshots of UI elements to be thumbnails for annotation. For example, a user can link a note of IP Addresses to a UI element for opening a remote connection. The search system was evaluated via a within subjects designed user study comparing a standard keyword text search for UI elements with the screenshot search provided by Sikuli. The experimental design appeared to be sound, and the results showed that the query formulation time was significantly better for the visual image search, and the relevance of the results ( as determined by the users) was not significantly different. More users may have made the study results clearer, as only 12 users were recruited for the study. But overall, the users appeared to have a positive sentiment towards the visual search scheme. The technical performance evaluation showed significantly better precision and recall for the image search methodology; however, I could not really see a usefulness in these results, as users would probably use embedded text in their queries. The authors mention this note, but then I was unsure why this section was included regardless. Finally, the authors describe the visual scripting language that allows users to use UI elements screenshots within the scripts themselves in functions such as find(), and pattern(). They provided some very interesting applications of these scripts, such as using google maps interface to determine when a bus is nearing a specific street corner, and monitoring a baby’s movement remotely. Overall, Sikuli is a very interesting concept that utilized advances in NLP and Vision appropriately to create a promising research direction in simplifying query processing and scripting. Google Goggles is a phone application I have used that uses general image search to search google web pages, but I have found this to be faulty, so there doesn’t appear to be a state of the art implementation of such a screenshot image search for general purpose querying. Perhaps, features of Sikuli will be utilized in such an application in the future.

Zihao Zhao 10:45:05 11/2/2015

This paper is novel to me and it is inspirational. It introduces Sikuli which uses GUI Screenshots for search and automation. The motivation comes from our daily life that when people communicate, we not just communicate in words but also communicate in the help of vision objects. The pattern is that we simulate what atoms actions into bits actions. Thus the author builds the Sikuli which help users for search and make a complimentary for the search area that can not use screen shots. Picture search is not a novel idea in this paper for sure but we didn’t realize that the picture can be used as a general input method as screenshots for search. Moreover, screenshots have an obvious advantage that it can be applied into any applications and any platform. Another idea the paper came up with is the automation using screenshots. It applied python language as a script language and it is much more convenient for us to use if we want to make a automation by using the property of a icon or a dialog box. The icons and the stuff in the screen shot are hard to articulate in a programming language or it may require a lot of endeavor. Currently, the programming language can take use of a lot of elements aggregated such as some directory or url. The idea of Sikuli Script inspire us with another new approach that’s the picture itself not the directory or the object. I realize that the computer vision technology has gained great success recently and we can even analyze the property or the feature of the picture. I learned a lesson from this article in the discussion part that we should go one step further besides the idea and the implementation itself and make a generalization of the research area. The paper proposed that in the screenshot way there exists some defects such as the invisible icons and the them variations which indicates the future research direction. Although the author himself point out the defects which can be addressed in the future, there also exists some disadvantage in my point of view.There are so many compatibility issues with scripting tasks on the fly. This application could do a huge swath of stuff that couldn't be accomplished with anything less than a full specific software solution. Really though, this is closer to how computer use should have been all along. It's absurd that we can ever get hung up on a technical problem like clicking 500 stars. The computer is supposed to help by automating stuff.

Mahbaneh Eshaghzadeh Torbati 11:48:21 11/3/2015

Critique for Sikuli: Using GUI Screenshots for Search and Automation. This paper can be considered as a new research direction, since use screen shots as a key to do automation and search. This research leads a new area of research. Since this visual approach makes the search and automation become much simpler than before, this paper did a great contribution for the research of HCI. In their method, when people want to do search, they can just circle what their target and the results will pop up. This trend is a simple means since visual presentation is easy to use. People don’t bother themselves entering their request by words. Using words is hard and also sometimes fails to find users’ request. Two main factors of wrong explanation and lack of information mainly cause such difficulties and failures. Visual presentation also can be contributory to writing scripts. By mapping words with pictures, scripts can be written easily and consciously. Users’ request in words may result in different objects. However, pictures are always correct. Writing or reading of the scripts is easy to conduct so that developers’ job becomes easier by using Sikuli script, the script language presented in the paper. It also makes automation easy too. In todays’ business world, there are some approaches similar to the authors’ idea. For instance, Google provide the opportunity of searching pictures by entering pictures as the input request. This is the same as Sikuli, authors’ search method. Moreover, nowadays searching engines can use descriptions to search pictures instead of searching words in the webpage that picture showed. Single word also results in related pictures. Based on my experience as a user, it works well. Speed is satisfactory and results are comprehensive enough. I can conclude that based on the research results in this area, since they will bring users more convenient for life, there will be more similar product show in the market.

Ameya Daphalapurkar 18:35:36 11/3/2015

The paper titled ‘Sikuli: Using GUI Screenshots for Search and Automation’ presents to the readers a method to provide automation and search using the way of screenshots. It basically presents a way by allowing the user to take a screenshot of the GUI element like the menu or tool bars and then querying the system. Sikuli also roots for automating the GUI interactions. The article first explains the terminological meaning of Search and Automation and by relating to the other aspects of using verbal and representational icons together. Users can just screenshot a query he or she needs to put on and send it. The paper makes many contributions in the form of search and script. Screenshot for searches included and engine for screen shots, user interface for querying and user interface for screenshots. The screenshot engine uses text surrounding in source and visual features. Interface for screenshot involved a range for users to submit the screenshot and annoting involved hooking screenshots as queries as generic as common websites. The user study on diverse participants also resulted the best outcomes by the screenshot method. Screenshots for automation included various methods in API scripting such as Find, Pattern, Region, Action etc. The examples for Sikuli script were minimizing active windows, deleting documents of multiple types, tracking bus movement and responding message boxes automatically. Themed variations and Visibility constraints were the possible limitations despite the fact that this method presented a visual approach to search and automation.

Manali Shimpi 18:46:09 11/3/2015

Sikuli: Using GUI Screenshots for Search and Automation: The paper talks about Sikuli which is visual approach to search and automation of graphical user interfaces using screenshots. It allows users or programmers to make direct visual reference to GUI elements. To search a documentation database about a GUI element, a user can draw a rectangle around it and take a screenshot as a query. The paper has two parts. First it describes sikuli Search and in the next part it explains Sikuli script. Sikuli Search is a system for searching GUI documentation by screenshots. Sikuli Script is a visual approach to UI automation by screenshots. Sikuli Script is a visual scripting API which gives an existing full featured scripting language a set of image-based interactive capabilities. It has several components: Find(),The find() function locates a particular GUI element to interact with. Pattern: The Pattern class is an abstraction for visual patterns. It has four Methods- exact(),similar(float similarity), anyColor(),anySize(). Region: The Region class provides an abstraction for the screen region(s) returned by the find() function matching a given visual pattern. Visual Dictionary: A visual dictionary is a data type for storing key-value pairs using images as keys. Author then talks about two systems that can be enhanced by their image-based interactive techniques which are Creating Stencils-based Tutorials and Automating Minimal Graphical Help. Sikuli Script is not applicable to invisible gui elements.

Long Nguyen 19:51:36 11/3/2015

Sikuli: Using GUI Screenshots for Search and Automation: The paper illustrates an approach for automation and search. I think the most creative part of this paper is the idea where human can use graphical language like images as a communication to computers. This was a new approach compared to the traditional searching services using keyword, with advantages of helping users to search things they do not even know the name. Even though image searching technique is somewhat similar to many applications like Google image, I think at the time this paper was released, this idea was quite new. In the evaluation part, first authors did not show evaluation of scripting API, which is a big question to readers. Second I do not think 12 subjects is enough to tell statistical significant, I hope the evaluation was performed with higher number. Otherwise evaluation was really good with many graphs, to show the effect of search and automation using screenshot.

Shijia Liu 21:59:23 11/3/2015

Sikuli: Using GUI Screenshots for Search and Automation Sikuli is a visual approach to search and automa- tion of graphical user interfaces using screenshots. First of all, this paper introduce what is the specific of screenshots for search, it includes several keywords: Motivation, System Architecture, Prototype Implementation, The User Study, and Performance evaluation. In the later part, this paper shows us about the screenshots for automation, it includes motivation, algo- rithms for matching screenshot patterns, visual script- ing API, an editor for composing visual scripts, and several example scripts. After that, within the Integration with other systems part, we can see Creating Stencils-based Tutorials and Automating Minimal Graphical Help as two examples to illustrate how the techniques works. And at last, the author shows us Sikuli in the future, it will still keep the advantages, but in some perspectives, it have some limitations.

Ankita Mohapatra 22:03:32 11/3/2015

Reading Critique on Sikuli: Using GUI Screenshots for Search and Automation:- This paper is about using a software that can take screenshots as inputs for search queries and produce relevant output. Thus in contrast to traditional search techniques, such a method can take as input any particular screenshot instead of the traditional way of inputting search keywords. The primary advantage of this system is that it uses a visual input as compared to a text input. This is beneficial because humans are visual in nature and thus the input would be much closer to their intuitive level of working. It also helps because most times we are unable to specify exactly what we are looking for, especially in case of objects represented by icons because then we have to somehow convert this icon into words which can often be a messy process. Such a system can be combined with a web search database such as that of Google’s to achieve fast results on a variety of situations. But the system’s offline capabilities proved even more interesting. Examples like finding out the approximate time the bus will reach in by querying its symbol is definitely something that is very useful. If such a system becomes widespread then users will find it very easy to do things that earlier required a lot of coding. A possible concern for this method, as stated by the author, is that each user can have their own personalization and this could hamper the search process. One way to correct this would be to set the PC to “default standard mode” when taking inputs for this system. But will that mode have to be kept as long as the program is running or just to initialize? Because a code that, for example, tracks the bus could run for a long time and it would not seem logical to keep the user preferences out of their PC for that long. Another concern is the processing that is involved in such a task. With a relatively large database an iterative query could take a very long time and thus the processing power for other tasks would be reduced. This would especially be a problem if the task that this program is monitoring is unable to run efficiently because of this processing load. Thus even though there are more factors to be considered before a full scale implementation of the system, such a system has a lot of potential to benefit users and I foresee it to be implemented on a large scale very soon.

Priyanka Walke 23:30:36 11/3/2015

Reading Critique on Sikuli: Using GUI Screenshots for Search and Automation This paper discovers a new way to address the search and automation functions. These ways are to compare the one’s given in the textbook and also the previous researches completed in this field. Currently we use the classic search approach used by google which uses some textual keywords from the input to retrieve the most relevant information from its repositories. This also provides with the functionality to search for images to find out what they are and also provide related citations. Even though such a technology exists, Sikuli extends this standard for some other styles of interactions and applies it in different situations. In Sikuli, we use the GUI elements screenshots in order to find out what they are and what functions it performs. A very important feature of it is mentioned in the fact that beginners usually do not know what these elements are called, and the text gives what their usage. Sikuli provides us the answer for performing an efficient as well as a simple search. Use of Sikuli as a graphical scripting language is definitely is not an unusual way, however, it tries to overcome the existing drawbacks of previous implementations. Knowing this, it is still necessary to carry out a thorough comparison in order to shed light on its uniqueness that would thereby contribute in validating the author’s hypothesis. The author also mentions about providing a proof for the statistical significance with only 12 subjects which is golden number as mentioned, however is still not convincing enough as it does not hold true in all scenarios. A very important aspect of this paper was the case in which mentions scenarios where the scripting language really shows its advantages starting from being used to minimize all windows, to travel maps and into the physical domain as well. As mentioned before, this on not the best way, use of python scripts too can allow a delete operation on multiple files and at the same time maintain greater control. One of the major drawback being that the screenshots will not be accurate always. Changing themes will also have bad effects along with the problem of accuracy. A suggested solution to this problem is to have predefined screenshots and allow the interaction through right clicking the target to perform the operations.

Mingda Zhang 23:56:24 11/3/2015

Sikuli - using GUI screen shots for search and automation This paper presents a novel approach of using screen shots as input to search and automate with graphical user interfaces. From my perspective, they solved a long-lasting problem for users who are less familiar with graphical user interfaces. Notice that the paper was published on 2009. At that time, search engine has become quite popular and people got used to search for problems they encountered while operating computers. However, graphical user interfaces could be troublesome for search engines because users were difficult to describe their confusions, because most of the time it was an icon or an visible elements rather than keywords. The authors put screen shots into their search queries and made it possible for users to directly search for solutions to their problems. They also verified their system with user experience survey. By recruiting participants from craigslist they evaluated their search process in a reasonable way. They also constructed automation based on the search results. It significantly reduced the gulf of execution especially compared with other scripts-based automations tools back then. To be honest, I agreed that the authors were focusing on a very valuable problem since I myself encountered the problem many times, and I believe that I was not alone. However, with the rapid progress of image searches, I was able to solve my problem easily with Google almost every time. Therefore, I think that this invention was useful and innovative at that time, but it might be less important nowadays.

Zinan Zhang 0:56:19 11/4/2015

For Sikuli:------------ This paper mainly focuses on explaining the author’s new program, called Sikuli, an application can help with providing information about GUI interface tools. As for me, it is a new tool replacing the original system query function. With the help of the new tools, people can get the information quicker and easier than the original function. In the paper, the author says that it is very convenient for user to use. Just like the result of the survey, most people think this is a good tool. People do not think this kind of new invention is familiar with, but it do really easy to get on to use. In my point of view, it should be so. When people using a brand new GUI application that they never used before, they will first be confused by the great deal of tools in the tool bar. For example, use the Photoshop. When I first used the Photoshop, I was shocked by the different kinds of tools in the left tool bar and many menus on the right. They are all useful when I deal with a picture in Photoshop, but it actually too much for me to learn at my first time. So, I have to search online for every tools. And have to hover my point on each tool icon to figure out their name in text first because they are picture but not a text word. That wastes too much time and largely low down my learning efficiency. However, if I can learn it with the help of this Sikuli, learning is just a case f taking a screenshot and then I can get the whole explanation in few seconds. It means that it will take me significantly shorter time to learn how to use the Photoshop, a brand new GUI application for me. In addition, I think the questionnaire in the author’s experiments is quite well. First of all, the questionnaire is necessary. The experiment’s data can always determine whether the experiment is success, but cannot determine whether a new design is success. Whether a design is succeed or not, is judged by the users. So the questionnaire here is used for gathering the feeling of the users. They think the invention is good to use then the design is successful. Secondly, if I were the author, I would add another question in the questionnaire: Is there any other functions need to be added to this system? When using the system, the users, or say the objects in this experiment know the feel of using this system best. They may wonder if there is another function exist, their using experience will be much better. And that function is what the system needs most. But as the father of the system, we perhaps think it is perfect with no weakness. We may overlook some facts. Other people who do not participate in the invention procedure can easily pick out these facts. By the way of the questionnaire, we can get them easily.

Darshan Balakrishna Shetty 1:21:00 11/4/2015

Sikuli:using GUI screenshots for search and automation: The authors of this paper present Sikuli which is a system that takes a visual approach to search and automation of GUI's using screenshots. In this case, the contextual clues are provided by the screenshots and the verbal commands are represented as search queries and code. Sikuli is further divided into Sikuli search and Sikuli script. The former is simply a search engine where the input is a screenshot. The latter, is automating GUI interactions, or scripting a “help” demonstration is a very time consuming task as it requires knowing exact coordinates on the screen and dimensions of the area of interest. Sikuli script uses a very novel, and intuitive, approach to automation that is built on top of their image processing libraries. The scripting language, based on Python, manipulates two objects called Patterns and Regions. Patterns simply describe an arrangement of pixels, and Regions are areas the script should work within. Using these two ideas, combined with screenshots, programmers can insert images directly into the code and operate of them. The inserted images can either be patterns or regions. For example, to click “OK” in a dialogue box the region would be defined as a picture of the dialogue box and the pattern as a picture of the “OK” button.

Chi Zhang 1:38:07 11/4/2015

Critiques on “Sikuli: using GUI screenshots for search and automation” by Chi Zhang. This paper introduces “Sikuli”, an interface for users to automate actions based on screenshots. It is mainly about using screenshots for searching. This system takes screenshots and matches the selected icon with an icon in the documentation. Once the region of interest is matched the user can annotate the results. Its prototype utilized a database of over 100 computer books. The authors also talk about several automation software, of which the implementation tries to identify small patterns which can be matched. The automation is created by using a scripting language which can take advantage of this pattern matching. In general, it’s quite an innovative idea to come up with Sikuli. This is a very good paper, and it introduces the details about Sikuli. It’s actually providing us very good views to deeply understand current researches on intelligent user interface.

Sudeepthi Manukonda 1:46:05 11/4/2015

Sikuli is an interactive search technique that uses screenshots to search and automate graphical user interfaces. Sikuli search allows the user to search a documentation data base using screenshots of interface elements. When the user submits the screenshots to Sikuli, it searches the current database and returns the screenshots contained in the database. The user can browse, select a page, and read the contents of the page in detail. Sikuli script allows user to write simple script to automate interaction using screenshots. The user can include the screenshots in command or in the syntactical codes to perform the actions. It can also be used in conditional questions to make some other action to be performed. The paper “Sikuli: Using GUI Screensfor Search and Automation” is a very interesting paper putting forth a revolutionary concept in human interactions. This paper talks about Sikuli in great details and also experiments and user studies that have been conducted in this area. Sikuli search searches the documentation for screenshots. There are five topics that are responsible of this. And they are motivation. system architecture, prototype implementation, the user study and performance evaluation. Motivation is due to the lack of an efficient and intuitive mechanism to search for documentation about a GUI element. Along with Sikuli Search, Sikuli script is also very important to get the complete utility of the software. There are various commands that are used in order to achieve the operation. To name a few, Action, Find, Pattern, Region, etc. To conclude, this paper presented a visual approach to search and automation and demonstrated its various benefits, This paper talks about two problem that need to be addressed by the future and they are theme variations and variability constants.

Xinyue Huang 2:14:50 11/4/2015

Sikuli: Using GUI Screenshots for search and automation Sikuli is a visual approach to search and automation of graphical user interfaces using screenshots. It allows users to take a screenshot of a GUI element and query a help system using the screenshot instead of the element’s name. It also provides a visual scripting API for automating GUI interactions, using the screenshot patterns to direct mouse and keyboard events. The development of screenshot search system is motivated by the lack of an efficient and intuitive mechanism to search for documentation about a GUI element. The architecture of Sikuli search consists of three components: a screenshot search engine, a user interface for querying the search engine, and a user interface for adding screenshots with custom annotations to the index. The prototype system indexes screenshots extracted from a wide variety of resources such as online tutorials, official documentation, and computer books. Sikuli search allows a user to select a region of interest on the screen, submit the image in the region as a query to the search engine, and browse the search results. Sickly search’s annotation interface allows a user to save screenshots with custom annotations that can be looked up using screenshots. To save a screenshot of GUI element, the user draws a rectangle around it to capture its screenshot to save in the visual index. The motivation of UI automation by screenshots is the desire to address the limitations of current automation approaches. At the core of visual automation approach is an efficient and reliable method for finding a target pattern on the screen. The goal of visual scripting API is to give an existing full-featured scripting language a set of image-based interactive capabilities. For screenshots, there are two limitations, the first one is theme variations. For example, many users prefer a personalized appearance theme with different colors, fonts and desktop backgrounds, which may pose challenges to a screenshot search engine. Possible solutions would be to tinker with the image-matching algorithm to make it robust to theme variation or to provide a utility to temporarily switch to the default theme whenever users wish to search for screenshots. The second one is visibility constraints. Sickly script operates only in the visible screen space and thus is not applicable to invisible GUI elements, such as those hidden underneath other windows. One solution would be to automate scrolling or tab switching actions to bring the GUI elements into view to interact with it visually. Another solution would resort to platform - or application specific techniques to obtain the full contents of windows and scrolling panes, regardless of their visibility.

Samanvoy Panati 2:56:44 11/4/2015

Critique: Sikuli – using GUI screenshots for search and automation This paper illustrates a visual approach named Sikuli for searching and automatizing GUI elements. At present, search engines are widely used for getting help on any GUI module. This approach uses keyword search. But coming up with the right keywords is more challenging task. This new visual approach makes use of the screenshots to search for any help and documentation on any GUI element. This new visual approach is used for two different tasks. One is for searching and the other is for automatizing. Sikuli search is the system used for searching GUI documentation by screenshots. Users are able to take the screenshots by pressing the hot-key and dragging a rubber-band rectangle. This screenshot when submitted as a query will open a web browser and display the results in it. A user study is conducted where participants are recruited online and asked to use both keyword and screenshot systems followed by a survey. The results are in favor of the new approach. The other component Sikuli Script is used for automation using screenshots. Here the screenshots are taken and the user uses a scripting language to write the commands to be performed when the page with the screenshot is being displayed. The speed of the template-matching depends on the target pattern. If it is small, then the matching is done fast and if it is large then the matching will be slow. Then the user should enter the commands using the API for performing automation. This can be a little tedious for the users who have no idea of scripting. However, this seems to be a good approach for automation using GUI elements.

Jesse Davis 4:24:32 11/4/2015

Sikuli: Using GUI Screenshots for Search and Automation At the beginning of the paper I didn’t really understand the idea or desire to do a visual lookup for a specific piece of GUI in order to find out information about it. I figured that it could potentially be faster than doing a keyword search, but I thought the implementation of having question marks that would give additional information about a particular part of a GUI would be far more helpful (especially if the user could turn off specific parts once they’ve familiarized themselves with said specific part). However, the interesting part of this paper comes in with the Visual Scripting API in which users actually are able to create macros/scripts using GUI automation via the queries/selections of GUI elements. The various actions including a generic find, pattern searching, region scanning, actual actions (i.e. click, type), and it even has a visual dictionary data type. The paper continues to showcase some really cool example scripts they’ve created such as deleting multiple types of documents and even tracking the movement of a bus. Overall this was a good paper and I enjoyed reading it as well as looking at some of the potential ideas for the future as well as the collaboration ideas for other projects.

Matthew Barren 6:31:53 11/4/2015

Summary: The authors of Sikuli: Using GUI Screenshots for Search and Automation examine the use of targeted screenshots to provide faster search and automation by using graphical targets. Using automation through graphical targets has a lot of promising opportunities. One of the examples noted, related to a observing babies with a dot to inform a system whether the child has turned over or not. Another potential area is observing a color change, which could notify of a state change in an experiment. For example, examining a litmus sticker and an automated titration. When the fluid gets to a critical pH, the state change will activate the sticker, and a computer system will be notified to handle the action accordingly. Additionally, this could be extended for personal use. For example, an individual may want to his/her email to pop up when an important inbox message reaches his account. The detection of the user name could pull up the email application. The screenshot search is effective for a particular subset of searches. These searches involve a physical or digital representation of the object in question. Frequently this is not the case. Imagine accessing a computer, and searching for a class that you are looking for at Pitt. In this instance, it may be more of a hassle to get an image of this class than to simply type the text. Additionally, the queries using a screenshot only address the question of “what”. For example, what does this lasso icon mean. Often, a user wants to do a more expressive search, such as how does the lasso icon work. Or how can I use the lasso icon to do a particular action. This expressiveness is lost when it is only accompanied with one screenshot and some surrounding characteristics. One way to make search more expressive is to utilize multiple screenshots to search for an item. Using these compound gestures the user may be able to more clearly present their search. Although this may defeat the purpose of using screenshots to search, the user could supplement the screenshot with a short text description, such as “how” or “why”. These key search types get lost when only searching a screenshot.

Adriano Maron 6:52:51 11/4/2015

Sikuli: using GUI screenshots for search and automation: This paper presents a tool for automate user interface actions based in screenshots of the applications. Instead of relying on keywords or commands to perform actions and search for screenshots. The authors developed a screenshot search engine that queries for images rather than keywords. The idea of searching for screenshots was also applied to a Python-based scripting language that can be used for automation of GUI tasks. Searching for images in the GUI interface avoids the problem of location change when GUI tasks are based on mouse coordinates. However, automation via image comparison is susceptible to different screen resolutions, i.e., an automation script with images created in a low resolution monitor might not work properly when executed in a high-resolution monitor. Despite the limitations, Sikuli provides an interesting approach for GUI testing and automation.

Kent W. Nixon 8:58:51 11/4/2015

Sikuli: Using GUI Screenshots for Search and Automation In this paper, the authors discuss the benefits of using image processing techniques to extract elements from a screenshot in order to power intelligent search and automation scripting. In the first part of the paper, the authors describe the underlying image recognition algorithm. They discuss how the algorithm (SIFT) is able to identify objects in images independent of rotation, scale, or coloration. They then describe how such objects can be converted to a textual representation and added to a database. This is done in order to allow for efficient searching of the information. They prove that for certain tasks, such as getting help on OS dialog boxes, image search is faster and produces just as relevant results as text search. The second part of the paper discusses the creation of a scripting automation extension to Python which uses to proposed image recognition algorithm. The authors detail the commands which can be used to identify matching portions of the screen, adjust similarity of matches, and perform actions on matching results. They don’t really discuss any metrics related to the efficiency or usability of this scripting extension, other than the fact that they created it and that when it works correctly it can be used to complete tasks in a more robust method than existing automation tools. While this is not at all related to my area of research, I thought that the ideas presented in the paper were rather novel. Obviously, they could be easily broken by people who theme their operating system, but those people probably don’t need to use help documentation that much. While this technology wouldn’t be extremely useful to large companies like Microsoft or Apple, as they could be realistically expected to manually link the appropriate help file with the corresponding dialog boxes, it would certainly help community driven efforts like forums where specific solutions may have been discovered to specific problems that are not directly addressed in the help file.