Intelligent User Interfaces

From CS2610 Fall 2017
Jump to: navigation, search

slides

Readings


Reading Critiques

Tahereh Arabghalizi 15:55:46 11/7/2017

Sikuli: Using GUI Screenshots for Search and Automation: This paper presented a new visual approach to search and automation and demonstrated its various benefits and capabilities. Today the traditional approach of searching is used which uses textual keywords to find the most relevant information, but Sikuli extends this paradigm for new modes of interactions and applies it in a different scenario. Given a screenshot of some graphical user interface (GUI) element, this system can take an image, attach some semantic meaning to it, and use this meaning to search in the documentation to provide some information about the specified element. In addition, another system allows users to select a GUI element by screenshot and manipulate future instances of that element through a script. Although, using it as a graphical scripting language is not innovative, but it does try to overcome the existing drawbacks of previous implementations. I think this work is significant because it provides a system that makes programming easier, more natural and visual, however, since the screenshots can be inaccurate sometimes, this approach is not certainly the best one in this field.

Kadie Clancy 12:50:54 11/8/2017

Sikuli: using GUI screenshots for Search and Automation: In this paper, the authors present Sikuli, which uses GUI screenshots for search and automation. Sikuli attempts to make human-computer interaction more like human-human interaction. For example, humans make visual references to tangible objects when giving commands. Search and automation are examples of interfaces that don’t allow humans to use these visual references and force users to use keywords or explicit element names. Sikuli and Sikuli Script allow users to use these visual references in the form of screenshots; programmers can insert a screenshot directly into a script statement and users can insert screenshots into queries. This allows users more robust interaction using a screenshot of an element instead of the name of the element in question. A user study was completed to determine if screenshot search can simplify query search without decreasing the quality of results. Results showed that query time was reduced when using Sikuli, there was no significant reduction in relevant results, and that their system was easy to learn. I think is is important to future work that the authors also identified limitations. First, background variation can challenge the image-matching algorithm. Second, Sikuli only operates on visible screen space which poses challenges to detecting out-of-view elements.

Spencer Gray 16:50:43 11/8/2017

The paper, Sikuli: Using GUI Screenshots for Search and Automation, the authors created a system for users to visually search for help and automate tasks using visual scripting. As a result of their study, the researchers found that this system was easier to learn and faster to use as opposed to a traditional text-based queries. I found this system to be an interesting approach to reducing the gulf of execution in a system. It takes less cognitive power to search using screenshots of exactly what you need more information about than to try to describe the problem with words. It reduces the chance of the user not knowing what they need to search for. This is a fairly significant paper in the HCI literature. In many GUI systems, figuring out how to use the system can be challenging and frustrating. Creating a better way to overcome the learning curve in user interfaces is clearly an important topic for research. However, as the Internet of Things and ubiquitous computing is becoming more prevalent, GUIs will eventually not be the most common way to interface with a computer. Thus, this paper is important now, but in a few years its significance may be forgotten as research into ease of use with ubiquitous computing is the focus.

Xiaoting Li 20:15:55 11/8/2017

Sikuli: Using GUI Screenshots for Search and Automation: In this paper, the authors present Sikuli, which is a visual approach to search and automate GUI using screenshots. Sikuli searches GUI documentation by screenshots, aiming to solve the problem that searching keywords sometimes are not obvious to users. The prototype system indexes screenshot extracted from a wide variety of resources. And based on the user study carried out by the authors, the screenshot method achieved the best result on coverage, recall, and precision. Sikuli also aims to solve the problems such as GUI elements positions become invalid if the window is moved in current automation approaches. The Sikuli scripting API is efficient in GUI automation. The authors also give six examples to show how Sikuli scripting API helps for GUI automation. The idea of using screenshot to search is interesting and helpful since it saves users some time from looking for accurate keywords to search to get target documentations. And it is also efficient when users cannot come up with correct keywords for the target documentations that they’re looking for.

Xingtian Dong 20:27:16 11/8/2017

1. Reading critique for ‘Sikuli: Using GUI screenshots for search and qutomation’ I think this paper really interesting. It’s the first time I saw a software aimed at helping users to learn how to use other software or website by recognizing the GUI of the software or website. It can help users to find help and documentation. Sikuli search for three visual features which are surrounding text, visual words and embedded text to provide search and help the users. This technique has some advantages like it can be applied into any applications which based on GUI, because it use screenshot. But also, it might have some disadvantages like it only recognize what is in the GUI or website. Sometimes some icon may just be a figure rather than a link. It might provide wrong guidance. But it definitely inspires me some new ways to use computer vision. Actually I want to enroll computer vision next term. This paper might inspire me about what project I can do. Sometime it is really hard to read a map of shopping mall or find the biggest discount if a discount newspaper. What if we can build an application to read the map or newspaper. And find where is shop you want to go or the biggest discount you might want. It might be useful and interesting.

Mingzhi Yu 21:14:42 11/8/2017

Sikuli: Using GUI Screenshots for Search and Automation: This paper first part came up with a new search system that uses the screenshot as the query and returned the description of API. The research is still at the beginning level that it only has a prototype to let users test. And the second part discussed the new automation system by using the screenshot as input. This new system interacted with the users at a flexible level. According to the description of the design ( currently it is only a design), this system can be used in a broader range of applications. And it applies to many operating system and framework. In general, the author provides an innovative idea of the use of screenshot. However, my first impression of the title and abstract is how the author recognizes the screenshot and index them. The users did not answer this. It is okay only to talk about the feasible of the idea without any technical details in this paper because the point of this paper is how the screenshot can help to improve the usability of a system. Also, it may be because at the moment of this paper published, the image recognition is not efficient as today. However, by realizing the difficulty of image recognition, I am wondering how the screenshot will figure the part of the image that the user truly want to address. For example, the user wants to address some button from a big window and the function of this button relies on the content of this window. In this case, he will have to screenshot the whole window that includes the button. In this way, how can the system realize the button and pay attention to it? The author does not provide any thinking on it. I will view this paper that describes a vision and evaluate this concept but not provides any practical technical details.

Ahmed Magooda 21:30:34 11/8/2017

Sikuli: Using GUI Screenshots for Search and Automation --------------------------------------------------------------------------------- In this paper, the authors present a visual approach to search and automate the GUI using screen shots named Sikuli. It allows users to search a large collection of online documentation by sending a screen shot. The authors carried out a study and were able to show that searching using screen shots is easy to learn and faster to specify than keywords. In their system they used SIFT for extracting visual descriptors and used OCR to extract texts from screen shots. They also introduced using scripting language Sikuli Script, this scripting approach uses a visual representation of the elements instead of their name or position, which enables the programmers to control the screen shots programmatically. It also has an editor interface for screen shot writing automation. In last the authors presented some scenarios that the technique could be applied in. The first example is creating stencils-based tutorials, and the other is automating minimal graphical help. From what the authors concluded they believe the system still suffers from some limitation (different themes in operating systems can lead to different screen shots and the results might be different. Sometimes, some elements are not visible at first because of the overlapping windows or need of scrolling which Sikuli cannot find them), however they are still optimistic about the idea.

Yuhuan Jiang 21:51:55 11/8/2017

Paper Critiques for 11/09/2017 == Sikuli: using GUI screenshots for search and automation == This paper presents two systems that involves visual patterns: (1) a system named Sikuli Search for searching documentations for GUI elements using screenshots, and (2) a scripting system named Sikuli Script, which can allow programmers to refer to GUI elements with screenshots. The intuition of the two systems comes from the how humans naturally gives command by pointing to physical objects. For the Sikuli Search system, the screenshots from online tutorials, official documentations of softwares, etc. are first indexed by the system. Upon new queries, the visual features of the input image are extracted, and compared against the index. An OCR system is also used to extra texts, which are commonly found in UI elements. Experiments find that formulating the query using screenshots is significantly faster than using keywords. For the Sikuli Script system, several functions are implemented to support locating GUI elements using screenshots. For example, the find() function takes an icon, and returns regions containing that icon. The authors also developed an editor to allow programmer to easily write these functions (e.g., a camera button is provided to fill the parameter of the find function). Examples were given to demonstrate the power and ease of the system. The visual-based systems suffers from theme variations and visibility constraints. A user (especially Widows and Linux users) may customize their OS with themes, and it will greatly affect the search results. Invisible GUI elements (on another tab, hidden, …) cannot be queried or selected.

Charles Smith 22:49:55 11/8/2017

The authors of this paper suggest a change to how we search for information. Instead of words, they suggest the more natural way of using pictures (screenshots) to find what you may need help with. This idea seems great, the users even find it easier to use and learn. However, the author leaves out the latency differences in these methods. While great strides have been made in computer vision, it is possible that the increased work could cause delays in presenting the user with adequate information.

Jonathan Albert 23:18:59 11/8/2017

This paper presents a system designed to improve searching help documents via icon-based queries. It explains the system's performance under testing and some extensions in the same vein, such as visual programming. Regarding the experimental method, I have to wonder how the test subjects conducted conventional text-based searches. Such information was not in the document as far as I could tell; however, if users had to click into another window to perform text searches, this would inflate the time taken (since the initial click would trigger the start of the timer). Having this information would help determine whether the system actually provided the quantitative benefits described. Nevertheless, I still am interested to see how essentially integrating a snipping tool into a search engine would help--though I have to notice that controlling the clipboard is not allowed in browsers, which is the only place where this application would really flourish. I likewise balked at the visual scripting application. In a perfect world with only one type of image format, the scripting language still breaks completely when program icons change, or in edge-cases when a user customizes their layout. Instead of APIs providing a degree of backwards compatibility, this dependency almost makes hue shifts breaking changes. Coupled with this is the prospective file bloat for storing all of those screen snips, in addition to IDE restrictions--if even more than one IDE for this language would exist. In other words, this type of programming unnecessarily demands more resources, and tightly couples any program or macro to a specific window configuration in a specific version. While figuring out what dialog ID 1202 means in an AutoCAD application is a befuddling task, at least programmers can know that the ID will not change for that dialog, even if its appearance does.

Mehrnoosh Raoufi 0:20:56 11/9/2017

​Sikuli: Using GUI Screenshots for Search and Automation: This paper presented Sikuli which enables users to search and automate their tasks through screenshots. For the first application of Sikuli which is searching via screenshot, the authors brought this motivation that usually people have trouble looking for what a specific tool does in an application and sometimes it is not feasible to search for its name. Thus, screenshots are really helpful to search for documentation of a specific tool and error dialogue and etc. This search covers not only official documentation but also forums and blogs. Sikuli Search has three component; screenshot engine, UI for searching screenshot and UI for annotating screenshots. The third component enables users to save their own annotation on a screenshot they want to search. The only thing user needs to do for a search is to draw a rectangle on the screen around the wanted object. This selection does not have to be on the exact border of the object. They conduct a user study for Sikuli search using within-subject design. The result of that study proved their two hypothesis about Sikuli Search; it is faster than keyword queries, its search result is accurate as keyword queries from users' point of view. The second application of Sikuli that they proposed was screenshots for automation. The core of its approach is to find target GUI patterns on a screen. They had different methods for finding small and large objects. Once users take a screenshot of a target they can invoke some mouse or keyboard functions on that. Furthermore, Sikuli represents special function for finding patterns. For instance, it enables users to choose whether the same color or any color of a selected object to be found. Then, they presented six example scripts they run on Sikuli such as minimizing all active windows and responding to message boxes automatically. For the future work, they indicated that theme variation could be supported in a later version. Moreover, current Sikuli Script has visibility constraints. It can only find patterns that are visible not those are in other tabs or scrolled out of view. It was a very interesting paper for me because I have used Sikuli Script before and enjoyed its tangible interface. the innovation of their work is admirable.

MuneebAlvi 0:54:47 11/9/2017

Critique for Sikuli Summary: This reading argues that Sikuli is an efficient and good tool for matching images to other contents stored in a database. It also shows applications of Sikuli and its benefits. I really like the approach of Sikuli. A lot of help topics are easier when another person (or program) can see what a user is having trouble with rather than having the user describe the problem in words. As they say, a picture is worth a thousand words. I think that by utilizing computer vision resources (like OpenCV), these pictures could be used to map a variety of different key value pairs. For example, when I was taking calculus at Pitt, we were forced to use LonCapa. This program required very precise entry of answers. A single character could cause input entry issues. I think a system like Sikuli could help. If the students could take screenshots of the errors, then maybe better help can be provided. Also, maybe the database entries could be crowdsourced and the students who were able to solve their problems could make a new entry in the database for their screenshot. I also wonder how companies like Google now perform their image searches. This reading was published in 2009. Google now allows pretty advanced image searches using other images as input. I wonder if they followed a similar approach or if their algorithm has far advanced the approaches in this reading. However, from personal experience, even Google can sometimes misread images and report surprising/unexpected results.

Akhil Yendluri 0:58:27 11/9/2017

Sikuli: Using GUI Screenshots for Search and Automation This method uses screenshots as a method to search for data in the search engine. It is a visual approach which is augmented using automation to improve search efficiency. The project also provides API's which can be implemented by the user to perform many scripting operations to automate GUI interactions. Some examples of the APIs provided are find(), exact(), similar(), anyColor(), anySize(). The author also demonstrates this by using it in tasks such as map navigation and bus tracking. The author has also integrated it with other systems for creating tools such as Stencils-based tutorials and Automating Minimal Graphical Help. The author also concludes by mentioning two limitations based on Theme Variations and Visibility Constraints.

Krithika Ganesh 1:10:06 11/9/2017

This paper uses GUI screenshots for presenting a framework for search and automation. Sikuli Search retrieves annotated screenshots of GUI elements and returns semantic meaning of the functionality. It does so by initiating a search in an image database which returns the information about the current GUI. The author evaluates Sikuli using user analysis and by running statistical tests. To automate GUI tasks this paper proposes another visual scripting language and shows its advantages, from being used to minimize all windows, to travel a map, and into the physical domain as well, monitoring the baby. I like the idea of this paper, as describing GUI elements by words is harder than simply using a image to describe it. It is indeed surprising that there is no commercial system that searches based on images.

Ronian Zhang 2:10:38 11/9/2017

Sikuli: Using GUI Screenshots for Search and Automation: The paper shows a visual approach to search and scripting for graphical user interfaces by using screenshots. The system uses users image input to search the collection of documents. It mainly consists of 3 parts: search engine, query interface, and interface for annotations. When extracting the features from image, the systems apply SIFT. And it combines with OCR which recognize text from screenshots and convert the result text into 3-grams. The api is very impressive. Unlike most of the script languages, the author uses a visual image as the index rather than the name and location of the element. It has a 1 to 1 mapping between a icon and a element. Even though the method seems to be magical, the author points out 2 major limitation of the approach: the themes of the os can lead to diverse icons and completely different result, also some of the elements might be invisible from the current windows and need scroll to find it. However, using experiment evaluation, the author proves that the searching method provided in the paper is faster and natural than searching by key words. From my own opinion, the search is actually very limited in the application scenario, and if the user wants to get more complex search result, the system could easily fail. There are still much space for improvement in both feature extract method and training of the database.

Sanchayan Sarkar 6:07:06 11/9/2017

CRITQUE (Sikuli: Using GUI Screenshots for Search and Automation) In this paper, the authors posit a screenshot based searching for GUI navigation. The authors assert that a visual based searching is much more superior than keyword based searching in terms of speed. Along with that, they claim that the search itself does not lose any relevance. For automation of GUI interaction, Sikuli also provides a visual scripting environment. The authors conducted a web based user sty and showed that, screenshot based searching is faster than keyboard based for map navigation, bus tracking etc. The authors conduct a user study with 12 subjects to determine the utility of this system. However, in my opinion, such a user study is quite limited as the data obtained is far less in terms of increasing generalizability. This paper finds relevance in my work on Computer Vision where image based searching and querying is becoming more and more common. In Google and even on online e-commerce platforms image based searching is becoming quite common. Therefore, this paper can be a good place to understand the motivation and design process of such a system.

Amanda Crawford 6:39:55 11/9/2017

Sikuli: using GUI screenshots for search and automation, Yeh, T., Chang, T., and Miller, R. C., In Proceedings of UIST 2009. Sikuli is a tool that allows user to search for a GUI element's documentation via screenshot images. It seeks to implement a recognition over recall strategy by bridging the needs of user to memorize the name of elements and instead using an image as the search. This system would be most ideal for developers. The search engine is an important concept in the work as it attempts to create a database that is transiently built on a multimedia type of input. Using the image, SIFT transforms the image and uses the data to generate a query. This process is complex and if given more thought, I believe that this tool could be even more powerful. 

Ruochen Liu 8:50:35 11/9/2017

Sikuli: Using GUI Screenshots for Search and Automation: For users who use Graphical-User-Interfaces, the most natural way to interact with the interface is to interact with the graphs like icons. Finding information or issuing commands involving GUI elements can be accomplished naturally by making direct visual reference to them. But in the areas of search and automation, GUI users are forced to use non-visual methods to interact with the interface, which is not natural and satisfying. In order to fix this problem, this paper presents Sikuli, a method uses GUI screenshots for search and automation. There are two prototype systems of Sikuli: Sikuli Search and Sikuli Script. Sikuli Search consists of three components: a screenshot search engine, a user interface for querying the search engine, and a user interface for adding screenshots with custom annotations to the index. It enables users to use screen shots to search a huge amount of online documents about GUI. Corresponding demonstration, an online user study and performance analysis are presented in the paper. Sikuli Script is a scripting system that enable users to automatically control the GUI elements using screenshots. Basically, it is a combination of scripting language and an interface for writing automation scripts based on screenshot. In conclusion, this paper presents the GUI interaction method which uses GUI screenshots for search and automation. Compared with conventional method, it is more natural for users and it has the potential to have better application performance.

Tahereh Arabghalizi 12:45:02 11/9/2017

Sikuli: Using GUI Screenshots for Search and Automation: This paper presented a new visual approach to search and automation and demonstrated its various benefits and capabilities. Today the traditional approach of searching is used which uses textual keywords to find the most relevant information, but Sikuli extends this paradigm for new modes of interactions and applies it in a different scenario. Given a screenshot of some graphical user interface (GUI) element, this system can take an image, attach some semantic meaning to it, and use this meaning to search in the documentation to provide some information about the specified element. In addition, another system allows users to select a GUI element by screenshot and manipulate future instances of that element through a script. Although, using it as a graphical scripting language is not innovative, but it does try to overcome the existing drawbacks of previous implementations. I think this work is significant because it provides a system that makes programming easier, more natural and visual, however, since the screenshots can be inaccurate sometimes, this approach is not certainly the best one in this field.