Intelligent User Interfaces
- Sikuli: using GUI screenshots for search and automation, Yeh, T., Chang, T., and Miller, R. C., In Proceedings of UIST 2009.
firstname.lastname@example.org 15:48:49 11/5/2014
Katsipoulakis Sikuli: Using GUI Screenshots for Search and Automation :: This paper presents a full-fledged framework for search and automation of actions using GUI screenshots. The authors have implemented and tested a search application using user’s screenshots and a visual scripting language for automating tasks. Sikuli Search is able to retrieve annotated screenshots (either with text or with geometrical shapes) of GUI elements and return information about their functionality. In my point of view, this functionality is useful, because users usually do not know how to describe properly a GUI situation. Therefore, the Help feature of modern software becomes useless and the overall user experience deteriorates. Sikuli search is an alternative approach which enables users to provide screenshots of their current GUI state. The prototype system initiates a search in an image database and information about the current GUI situation are returned. Then, the user is ready to proceed with her operation. As far as evaluation is concerned, the authors not only conduct a user analysis, but also verify their results by running statistical tests. The second contribution of this paper is a visual scripting language for automating tasks. The prototype that is proposed is able to automate GUI tasks in a high degree. Even though their prototype is limited, it comes with a high potential about different use-cases.
nro5 (Nathan Ong) 16:00:59 11/5/2014
Review of “Sikuli: Using GUI Screenshots for Search and Automation” by Tom Yeh, Tsung-Hsiang Chang, and Robert Miller The authors present a system of using screenshots for image searching and automation. Given a screenshot of some graphical user interface (GUI) element, the system can take the image, attach some semantic meaning to it, and use this meaning to search in the documentation to provide some information about the specified element. In addition, another system allows users to select a GUI element by screenshot and manipulate future instances of that element through a script. As the first research paper to seriously discuss searching through images, this is essentially a revolutionary concept that was essentially developed and released in two years (2011 was the year Google released its reverse image search service). I was surprised to read that the first image search system was meant to provide assistance in understanding a GUI element by searching through documentation and finding pages that contained similar images. I find it hard to believe that it would be difficult to describe GUI elements by eventually finding the right keywords, although obviously it would be more convenient to be able to simply use a picture rather than describing it and hoping. It seems that the researchers severely underestimated the possible demand for searching through images, especially since there is demand for location-based image search, object recognition through search, etc. I was also surprised to see that the paper also included a system for visual automation. It seems significant enough to evaluate a similar-image search algorithm, especially since no one has done it before, based on reading the paper’s related works section. It seems there is less demand for such a system, since there does not seem to be a commercial system that does similar things yet. Something that may be related is the ability to “inspect” an element in the webpage such that the selected feature points to the correct location in code.
Wenchen Wang 16:30:58 11/5/2014
<Sikuli: Using GUI Screenshots for Search and Automation> <Summary>This paper proposes a visual approach Sikuli. Sikuli allows users to take a screenshot of a GUI element and query a help system using the screenshot instead of the element’s name. <Paper Review> The motivation of screenshots for search is the lack of an efficient and intuitive mechanism to search for documentation about a GUI element. The system extracted from a wide variety of resources. But normally, there are three types of screenshots, which are surrounding text, visual features and ocr of embedded text. The Sikuli search is kind of search image by image. Another function of Sikuli is for automation. The motivation of screenshots for automation is there is not direct automation script to control the elements with low-level keyboard and mouse input. The key part of screen automation is to find a target pattern on the screen, such as web browser icon, docWizard icon. We have to recognize the icons first and then manipulate them directly. I think the baby monitoring example is the most interesting one. We can apply this approach to monitor old people at home when they are sleeping too.
Bhavin Modi 20:56:44 11/5/2014
Reading Critique on Sikuli: Using GUI Screenshots for Search and Automation The paper identifies a new approach for the search and automation functions. These approaches are compare to the existing textual approach and also to previous researches done in this field. For searching today we use the traditional approach of searching in on google which uses some textual keywords form the input to find the most relevant information. It also provides a functionality to search images to find out what they are and provide relevant citations. Such technology exists, but Sikuli extends this paradigm for new modes of interactions, and applies it in a different scenario. As we know everything is best for something and worse for something else, well Sikuli is another example of this methodology. We use screenshots of the GUI elements to find out what they are and what functions it performs. An important usefulness is mentioned in the fact, that beginners generally do not know what these elements are called, and the text mentions their usage. So how can we conduct an efficient yet easy search, Sikuli provides the answer. Using it as a graphical scripting language is not innovative per say, but it does try to overcome the existing drawbacks of previous implementations. This being said, still a detailed comparison to bring out its uniqueness would better help validate the authors hypothesis. Here he also mentions proving statistical significance with only 12 subjects, which is golden number as they say, but still not convincing enough as it does not hold for all cases. A good aspect of the paper was the scenarios mentioned where the scripting language really shows its advantages, from being used to minimize all windows, to travel a map, and into the physical domain as well, monitoring the baby. Sikuli utilizes the fact that pictures speak a thousand words, and are easier to relate to, making interaction more natural. As mentioned before, this is certainly not the best approach, using normal python scripting too one can perform the delete operation on multiple files and at the same time maintain more control. This is one major drawback, the screenshots will not always be accurate, as we see from the need of maintaining a threshold. Changing themes also has dire effects, and there is a problem of accuracy as to what is will find, to do this again increases complexity to original levels. One proposed idea is too have predefined screenshots and let the interaction be though right clicking the target to perform the operations.
phuongpham 21:10:22 11/5/2014
Sikuli: Using GUI screenshots for Search and Automation: this paper presents a new way for human communicating with computer, i.e. image regions from screenshots. The most interesting thing of the paper is a new communication channel where human can pass images as a part of their commands to computers. The technique searching similar image may be not new (at that time, because Google Images announce the similar image search function at the same time). However, what I think interesting is the later part, the automation script. As other HCI papers we have read in this course, the authors will introduce their new approach and valuated it with some application. Further analysis was given and show the strength as well as weakness of the approach. There is a real product, i.e. sikuli script, borned from this project. I really like the approach and think if the computer vision technique become better, we can enlarge the application not only for control images but for abitraty images. However, I think beside theme and visibility limitations, the authors should have paid attention to the process when a user have to take a screenshot, manipulate the image (crop the screenshot to a specific control). The application can do more than that, e.g. let the user trigger a special key to enable functionality and just ask the user to mark the interested control, other information as the parent control could be inferred by the application or with simple confirmation from the user.
Vivek Punjabi 21:13:06 11/5/2014
Sikuli: Using GUI screenshots for Search and Automation This paper presents a new approach to Search and Automation techniques by using visual elements as the basic units rather than usual text elements. Its easier to instruct or perform actions when directly pointing to certain objects rather than mentioning them with their names or properties. Using a similar system in interacting with computer is the main motivation for this paper. They have presented a system named Sikuli which allows users to specify GUI elements using screenshots as they are universally accessible for all kinds of applications and are robust. The first system is Sikuli Search which allows users to perform image search using screenshots instead of using texts to search for certain GUI elements. The architecture of the system consists of three components: first is screenshot search engine which indexes screenshots based on its surrounding text, visual features and embedded text using some existing image recognition algorithms, second is user interface for searching screenshots which allows the user to enter the search mode, select a region on screen and click search which provides results in a web browser, and third is user interface for annotating screenshots which allows user to add annotations to any GUI element and save it. A prototype is created with a database of a limited number of books with around 50k screenshots and a three-feature indexing scheme in C++. The user study supports their 2 hypothesis that querying with screenshots is significantly faster than through keywords and results produced in both cases are quite similar. The next system is using screenshots for carrying out automation tasks. As normal automation scripts require support from application developers and are based on absolute positioning of GUI elements and also fixed text labels, they may become invalid or difficult to execute in many cases. Sikuli Scripts is a visual scripting API for carrying out GUI automation. It provides flexibility in terms of visual appearance of GUI elements and also easier to create without any support. The API has several components such as find() to locate a particular element on the screen, pattern class which allows finding patterns with various attributes such as similarity, size and color, region class allows flexibility to restrict certain search regions on the screen, action commands to carry out mouse and keyboard events on the center of the required GUI element, and a visual dictionary which allows user to store images in key-value pairs to access search functionality of Sikuli. The Sikuli script editor is designed specifically to add visual elements to scripts efficiently using a number of options like screenshots and importing from local disks. A number of examples using Sikuli script are mentioned which shows their applications in common tasks such as minimizing all windows and deleting multiple documents, and also some practical complex applications such as tracking bus movements and monitoring a baby. This system can also be integrated with other existing systems such as Stencils and Graphstract. Finally, it provides some limitations like theme variations and visibility constraints and some possible solutions such as normalizing execution environments while writing scripts and to automate scrolling and tab switching. The paper provides a very creative idea for search and automation. The screenshot search engine is well designed using existing resources without making it complex. However, the search time for a pattern search is not considered and compared with keyword search. As images usually require more computing time and high speed processors, the might increase search times significantly. Also, storing images in databases can have certain limitations and overhead of managing it compared to easily manageable text databases. The scripting API is well designed including the most commonly used functions. It can be further extended by adding more functions like time() and events like window close and expand which allows waiting for a specific pattern on screen to occur rather than continuously checking for the same. The research field looks worth exploring as its scope in this paper is limited and confined.
Xiaoyu ge 21:31:38 11/5/2014
SIKULI: USING GUI SCREENSHOTS FOR SEARCH AND AUTOMATION This paper presents an approach called “Sikull” use to access help content by screenshot based search rather then traditional key word based search. The main motivation for this work was that the traditional keyword search was somehow unnatural and difficulty to associating keywords to describe parts of a user interface. Thus, rather than remember or guessing what an icon may be called, Sikull allow users to simple take a snapshot of the icon and search the visual database for it. In their experiments, the authors found that using this method was more intuitive to use for users as well as giving a better search result. The main idea of this paper is great. However in this paper the authors did not perform evaluations on the scripting API therefore, its description did not clearly follow from the first half of the paper. But, the author still includes some level of details about the system, and the detail was adequate. With such system I can image so many opportunities, however, as identified by the authors, one of the main limitations of Sikull is maintaining the accuracy of the information as the user interface changes. For user interfaces that are pretty standard with little change over several iterations, there is not much requirement on the updates. However, if the systems are constantly changing its user interfaces, then this screen shots approach might not be a good idea. One of the most popular rules for good interface design is to standardize components as much as possible. Thus, the reuse of controls can be more likely. In these kinds of situations, performing a screen short based search would become easier since for these type of system the more visual data submitted in the query, the more the accuracy we will have.
Yanbing Xue 21:36:40 11/5/2014
This paper describes a visual search system that provides an easy interface for developing automation scripts. I personally find this paper a bit excessive in some respects. It seems the authors did not focus on their problem enough which resulted in them providing a system that can meet their need, but does so in a convoluted way. If automating the detection of UI elements (or icons, or windows, etc) and then passing in commands was their goal, then they should have implemented a system where the user hovered their cursor over an example widget (we assume the user has access to this data just like in their system). The query is enriched by the image of component that user is working with and want to have information about. I see one challenge of this approach is building the database of screenshot have dual limitation: size and detail level of database. First, user may want to query for some component in a panel, or the whole panel. That mean the database may contain both the panel as a whole, and all its children components. Second, maintaining a huge database may be the most difficult for stand alone application. We may use online storage and cloud computing to resolve this problem. Their experiments showed that users found what they wanted faster through this kind of search, so it would make sense that novices would be more productive if search could be made quicker. Making novices more productive would help productivity overall, so this search could help productivity. The second part of the paper focused on a scripting language using the visual screen shot as a programmable object. The idea here is instead of naming an element and describing it, we simply use a screenshot to identify the object. So a screenshot of a button will allow us to script on that button. This is aimed to make scripting faster and easier. This approach is a outstanding resolution of this issue in our everyday environments.
Wei Guo 21:42:52 11/5/2014
Reading Critique for Sikuli: Using GUI Screenshots for Search and Automation This paper introduces Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli consists of three components: a screenshot search engine, a user interface for querying the search engine, and a user interface for adding screenshots with custom annotations to the index. The screenshot search engine presents the searching result in the following ways: using text surrounding it in the source document; visual features; and using index based on text embedded. There are still some limitations of the system such as no theme variations, and visibility constrains. This system gives us a hint that the search engines with images as keyword will be come into our life. In my understanding, the most complicated part of this Sikuli system is how to transform the input image into searching information to search for results. I guess the “visual features” mentioned in this paper is the key search method in this search engine. For each input image, the system will assign them some visual features by calculating. Based on these visual features, the search engine then can search the existing similar images, and then return the information of the images.
Longhao Li 21:44:20 11/5/2014
Critique for Sikuli: Using GUI Screenshots for Search and Automation In general, this paper introduced a new research direction, which use screen shots as key to do search and automation. This research leads a new area of research. This paper did a great contribution for the research of HCI, since this visual approach make the search and automation become much simpler than before. When people want to do search, they can just circle what they want to search. The result will pop up. Doing search by this way is easy because visual presentation is straightforward to use. People don’t need to care about how to present their problem by words. It is hard and also doing search by this way sometimes cannot find things user want to find. It may due to wrong explanation or lack of information. Visual presentation also can be used in writing script. By using the mapping of words and pictures, script can be written easily and also precisely. Users’ writing may lead to different object. But picture will never be wrong. No matter writing or reading of the script is easy to conduct so that developer’s job becomes easier by using Sikuli script, the script language present by the author. Therefore, it is easy to make some great automation. In this days’ market, there are some approach that are similar with the author’s approach. Google enabled searching picture by input another picture. This is quite similar with Sikuli, author’s search method. Also nowadays searching engine can using description to search picture instead of searching words in the webpage that picture showed. Single word can directly bring you related pictures. Based on my using experience, it works well. Speed is fast and result is comprehensive. I think based on the research result in this area, there will be more similar product show in the market. They will bring users more convenient for life.
Qihang Chen 23:26:40 11/5/2014
The authors of this paper present Sikuli as a solution to this problem. Sikuli is a system that takes a visual approach to search and automation of GUI's using screenshots. In this case, the contextual clues are provided by the screenshots and the verbal commands are represented as search queries and code. Sikuli is further divided into Sikuli search and Sikuli script. The former is simply a search engine where the input is a screenshot, I didn't think this was that novel so I won't talk about it. The latter, is where the novelty of this research is. Automating GUI interactions, or scripting a “help” demonstration is a very time consuming task as it requires knowing exact coordinates on the screen and dimensions of the area of interest. Sikuli script uses a very novel, and intuitive, approach to automation that is built on top of their image processing libraries. The scripting language, based on Python, manipulates two objects called Patterns and Regions. Patterns simply describe an arrangement of pixes, and Regions are areas the script should work within. Using these two ideas, combined with screenshots, programmers can insert images directly into the code and operate of them. The inserted images can either be patterns or regions. For example, to click “OK” in a dialogue box the region would be defined as a picture of the dialogue box and the pattern as a picture of the “OK” button.
Mengsi Lou 1:46:43 11/6/2014
Sikuli: Using GUI Screenshots for Search and Automation This paper discusses a system that can do the search and automation in visual approach using screenshot. Users can take a screenshot for a certain element, and use this element as the searching key element to query. The Sikuli system provides a visual approach to searching and automating GUI elements. When searching a documentation database about a GUI element, a user can draw a rectangle around it and take a screenshot as a query. The System consists of three components: a screenshot search engine, a user inter- face for querying the search engine, and a user interface for adding screenshots with custom annotations to the index. The Screenshot Search Engine has three ways for searching. First, we use the text surrounding it in the source document. Second, we use visual features that is a vector of values that computed to describe the visual properties of a small patch in an image. Third way is to make use of the contain text in GUI. We can index their screenshots based on embedded text. In the User Study part, first they proposal two hypothesizes. One is that screenshot queries are faster to specify than keyword queries, and the other is that results of screenshot and keyword search have roughly the same relevance as judged by users. These are both statements that can be proofed true or false. For the study method, the study was a within-subject design and took place online. Then the participants need to do specific queries by entering keywords or by selecting a screen region. Then participants will be asked for questions about the impressions. Then we get the results and get to evaluation. The next section presents Sikuli Script that is a visual approach to UI automation by screenshots. The goal of this interface is to give an existing full- featured scripting language a set of image-based interactive capabilities, the script presenting has some components, including the find function, the Pattern and Region classes, a set of action commands and the visual dictionary data type stores key-values pairs. They also develop an editor to help users write visual scripts. For example, it can also minimize the active windows, deleting documents of multiple types, navigating the map, etc.
yubo feng 2:08:48 11/6/2014
In the paper, the authors give a approach called Sikuli in order to take screenshot which could make search easy. Now Sikuli is a open source software, everyone could download and feel it. I think the most interesting idea the author talked about is GUI capture and recognize the element that user feels interested in, moreover, the author compare it with the keyboard. Keyboard is more slower input mode than screenshot, if users need too long time to consider which one to input and wait, then the interest occurred before may gone, this method help people keep this interest, that's why it is so popular.
Christopher Thomas 2:11:51 11/6/2014
Note: Dr. Wang only provided one required reading in the e-mail last night when the server was down. Thus, I am submitting the review for the reading which he e-mailed us. 2-3 Sentence Summary of Sikuli: Using GUI Screenshots for Search and Automation – The authors design an approach for searching based on screenshots. The authors demonstrate how the approach can be used to write simple scripts and provide help content to users about GUIS. Finally, the authors evaluate their system with a user study and confirm that users find it easier and simpler to use. The authors’ main idea in the paper was to be able to use GUI screenshots in a way which would enhance the user’s experience. It seems like a very simple idea at first, but it is powerful. Users needing help in an interface can take a screenshot of it and search based on that, rather than words. While it is easy to dismiss this idea as silly and impractical, Google has implemented very similar functionality on Google Image search now. Users are able to go on Google image search and upload an image of something and google will return similar images. Thus, if I take a picture of the Eiffel Tower, Google can identity it for me. In the case of the authors’ paper, it seems like a more specific application than Google’s. Google’s goal is to provide a general functionality, but the authors’ goal is to provide something more specific. I think this demonstrates a research technique that we have discussed numerous times in class. The idea is that even though it seems that an idea has been tried before, if you can find a SPECIFIC USE CASE for that idea and show how the idea can be applied in that domain, you have made a contribution. The authors did not invent the idea of searching by image, which had existed in the computer vision community for some time before that. However, they did apply that technique GUIs, something that hadn’t been done before. GUIs are particularly well suited to this technique because they are relatively constant on desktop machines from display to display. In contrast, real images suffer from all sorts of ambiguities and visual search is very ambiguous and often poor. By illustrating how this technique could help search help documentation, the authors provided a good motivation for how their idea was a contribution. Similarly, the authors invented a basic scripting language based on the concept of scripting the small icons. Users with virtually no programming knowledge could quickly learn how to write simple programs using this language using visual icons. As such, users could use the screenshots they made before to automate programs. Thus, we arrive at another important contribution – the use of a metaphor which is extended and realized. For instance, the authors began the paper by explaining how people communicate, stating that we instruct people where to put something by pointing to that particular object. When programming, it is very abstract usually, and the researchers wished to improve this, to make it more NATURAL and closer to how people actually communicate. Thus, they took the idea and showed how programing could be made more natural and visual and closer to how humans actually communicate. As such, through the user of this simple metaphor, they were able to reduce the GULF OF EXECUTION in programming greatly. Instead of having to write complicated computer vision routines, users could simply represent the object they wanted through a screenshot or image file, and write basic scripts based on that to accomplish basic tasks. In fact, the authors even demonstrated how a baby could be monitored with only a few lines of code, simply by placing a small dot on the baby’s head. The user’s then simply told the system to look for the small dot (visually represented in the programming language). The programming language then did the hard part of translating that into Python and executing it, saving the humans all the bother. Thus, I think this is also a major contribution of the researchers work. Their contribution was how to make programming more natural and visual and how we can use these images or screenshots to express what or where we are looking and then what tasks we want to do on that location (using programming). Thus, one can automate tasks using images and a small amount of code.
SenhuaChang 2:28:41 11/6/2014
This article present a system called sikuli which can be used to search for documentation about GUI elements using screenshots and also can be used to write screenshot-based automation scripts. The authors also present the results of a user study which show that users found screenshot queries to be easy to use and easy to learn. The screenshot query functionality seems to follow good design principles in HCI. The authors are trying to move the interaction of the user and the computer away from the computer side and more to the user's side, which is always a good thing, since it can result in the user having a better understanding of how to interact with the computer and can result in the user having an easier time learning about this interaction. The scripting-using-GUI-elements functionality is interesting. I think that it is a pretty cool idea. As with the screenshot query, this feature seems to follow good HCI design principles. The examples that are given are excellent demonstrations of this tool. To sum up, Sikuli is a very user friendly system, which try its best to combine the physical world with digital.
zhong zhuang 3:28:20 11/6/2014
This paper is about using screenshots to do GUI related search and automation. This is a very interesting idea to me. As the author states, traditional keyword search require user to specify the correct keyword, this sometimes is challenging even impossible. In physical world, we can specify an object by pointing to it, for example we can point to a cake and say this looks delicious, but in digital world, direct visual representation is not the way we operate objects, instead, we use an indirect way to manipulate the GUI. This paper presents an attempt of direct visual representation. By using screenshot to specify a GUI component, we can directly manipulate the component’s visual representation. The author introduced two prototype applications, GUI search and GUI automation. In the search application, the author uses three ways to index a screenshot, surrounding text, visual features and embedded text. These indexing feature basically can represent most screenshots perfectly. To use the search system, a user just need to simply select a region of interest on the screen, submit the image in the region as a query to the search engine. As shown in the user study, most users found the screenshot search system easier to use than traditional text search. The second application is GUI automation, the current approach requires support from application developers or accessible text labels for GUI elements. These features are platform dependent, by using the screenshots of GUI directly in a script to programmatically control the elements, the application can achieve not only platform independent but also application independent. The authors uses different pattern matching techniques to find the specific GUI and represent it in a Region class, which basically contains bit information, coordinate information and other metadata. The user can use the find() function to find a specific GUI and the find() will return a Region class. Then the user can do certain Actions on the Region, like click or double click. In sum, this screenshot approach is really inspiring, it seems easy to implement and efficient to use.
Brandon Jennings 4:01:42 11/6/2014
Sikuli This paper presents an approach to using screenshots to interact with the computer. This system can query help systems and create mouse and keyboard events. A system like this is important because it makes it easier for users to find solutions to tools. It can be difficult to know the name of buttons and widgets. If a user can take a screenshot of the tool they are having problems with, it saves the need to spends endless time searching for help results. Graphics are much easier for developing than standard coding. It makes programming more universal and allows for novices to create their own programs without extensive training. Using screenshots to create mouse and keyboard events is an interesting concept and presents a new way to develop custom methods and events for things that do not have keyboard shortcuts. A lot of what people do on the computer is visual and at times it can be more efficient to use images than to make a sequence of clicks and keystrokes to execute commands.
Jose Michael Joseph 6:00:56 11/6/2014
Sikuli: Using GUI Screenshots for Search and Automation This paper is about using a software that can take screenshots as inputs for search queries and produce relevant output. Thus in contrast to traditional search techniques, such a method can take as input any particular screenshot instead of the traditional way of inputting search keywords. The primary advantage of this system is that it uses a visual input as compared to a text input. This is beneficial because humans are visual in nature and thus the input would be much closer to their intuitive level of working. It also helps because most times we are unable to specify exactly what we are looking for, especially in case of objects represented by icons because then we have to somehow convert this icon into words which can often be a messy process. Such a system can be combined with a web search database such as that of Google’s to achieve fast results on a variety of situations. But the system’s offline capabilities proved even more interesting. Examples like finding out the approximate time the bus will reach in by querying its symbol is definitely something that is very useful. If such a system becomes widespread then users will find it very easy to do things that earlier required a lot of coding. A possible concern for this method, as stated by the author, is that each user can have their own personalization and this could hamper the search process. One way to correct this would be to set the PC to “default standard mode” when taking inputs for this system. But will that mode have to be kept as long as the program is running or just to initialize? Because a code that, for example, tracks the bus could run for a long time and it would not seem logical to keep the user preferences out of their PC for that long. Another concern is the processing that is involved in such a task. With a relatively large database an iterative query could take a very long time and thus the processing power for other tasks would be reduced. This would especially be a problem if the task that this program is monitoring is unable to run efficiently because of this processing load. Thus even though there are more factors to be considered before a full scale implementation of the system, such a system has a lot of potential to benefit users and I foresee it to be implemented on a large scale very soon.
yeq1 7:39:57 11/6/2014
Yechen Qiao Review for 11/6 Sikuli: Using GUI Screenshots for Search and Automation. The authors in this paper had presented a new language and toolkit that allows searching and scripting of macros using objects captured through screenshots. The method leverages the recent advancements in computer vision to allow fine similarity matching of the objects. The authors evaluated each feature differently. For searching, the authors conducted a formal user study on users in Craigslist through an experimentation and a follow up survey. The authors used T(n-1) test for statistical significance on query formulation time between this technique and keyword queries, and they have found the difference is significant. The evaluation of the macro is mainly through scenarios on how this toolkit may be used. From the examples, it is clear that this toolkit is potentially quite useful. Recall that in direct manipulation paper, the author had noted several disadvantages of direct manipulation interface. One of them is that repetitive actions are tedious to be completed. This is not the first work that tries to address this problem. Previous automation tools I have used in both Windows and Mac addresses this problem by providing a scripting language. However, the scripting language is usually too abstract for the users and has a significant learning curve. This paper is great in that it tries to combine the flexibility offered in scripting languages with the directness offered in GUI. The language allows descriptions of objects to be substituted with an object taken directly from a screenshot. This provides the theoretical benefits of a reduction of golf of evaluation when reading such code, and provides more directness in interaction. I would be happy to use a tool like this sometime to automate the task of clicking through google search. There are still rooms for improvement, however. I think as of its current state, the golf of execution can be further reduced if the users do not have to remember the basic language syntax. This may be achievable by providing a syntax structure templates, as well as providing a better visualization of the script than a string.
Xiyao Yin 8:19:31 11/6/2014
‘Sikuli: Using GUI Screenshots for Search and Automation ’ creates a new visual approach, Sikuli, to search and automation of graphical user interfaces using screenshots. Authors make the contributions on Sikuli Search, a system that enables users to search a large collection of online documentation about GUI elements using screenshots and Sikuli Script, a scripting system that enables programmers to use screenshots of GUI elements to control them programmatically. This paper is divided into describing, evaluating Sikuli Search and describing Sikuli Script and presenting several example scripts. The evaluation in this paper is good because it uses numerous graphs to show the effect of screenshots on search and automation while more data attached to this paper may be better. There are still two limitations of this approach. One is theme variations that users may add a personalized appearance which may pose challenges to a screenshot search engine. Another is visibility constraints that Sikuli Script can’t operate on invisible GUI elements. Dealing with the second problem, I think resorting to platform- or application-specific techniques to obtain the full contents of windows and scrolling panes, regardless of their visibility is better.
changsheng liu 8:44:17 11/6/2014
In Sikuli, the authors introduce an interface which allows users to automate actions based on screenshots. The idea behind this paper is using screenshots for searching. For example, when finding help about an icon, it can be difficult to search since there are no visible words. The way that the system works is by taking the image on the screen and trying to match the selected icon with an icon in the documentation. Once the region of interest is matched the user can annotate the results. The prototype was implemented by utilizing a database of over 100 computer books. To test the prototype, they found 12 people from craigslist. Another way that they use screenshots is automation. There are a few automation software, yet most of it is difficult to use. The implementation tries to identify small patterns which can be matched. The automation is created by using a scripting language which can take advantage of this pattern matching. Some of the scripts that can be implemented include minimizing all windows, deleting documents of multiple types, tracking bus movement, navigating a map, responding to message boxes automatically. I did not really like the work presented. I feel that many of these tasks could be better solved with ideas like, creating a better UI so that a the search feature is not necessary, and the automation can in most cases can be replaced with command line scripting. Another issue is that in scanning the screen for patterns, there may be security issues of sensitive data being leaked.
Qiao Zhang 9:16:07 11/6/2014
Sikuli: using GUI screenshots for search and automation This paper presents Sikuli, which allows users to take a screenshot of a GUI element and query a help system using the screenshot instead of the element's name. It also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. Sikuli Search can complete tasks similar to real world ones, such as "find information about this". It allows users or programmers to make direct visual reference to GUI elements using a screenshot as a query. It extracts 3 features: text surrounding an image in the source document, visual features such as SIFT feature descriptor, and OCR texts in 3-grams (in order to reduce noise). The user studies prove that their hypotheses of (1) screenshot queries are faster to specify than keyword queries, and (2) results of screenshot and keyword search have roughly the same relevance as judged by users. Sikuli Script is a visual approach to UI automation by screenshots. It can perform complicated tasks by using visual elements as variables in a scripting language. I am highly impressed by this application, because I have tried using a similar tool that capture and replay low-level mouse and keyboard events on a GUI element, but I found it so clumsy that it cannot detect window position change and I have to manually repeat my action. The paper describes several examples to illustrate how Sikuli Script can be used, some of which are highly desirable comparing to current approaches. Despite two limitations concluded by the authors, this system demonstrates various benefits and capabilities over searching and automation tasks.
Yingjie Tang 9:19:08 11/6/2014
This paper is novel to me and it is inspirational. It introduces Sikuli which uses GUI Screenshots for search and automation. The motivation comes from our daily life that when people communicate, we not just communicate in words but also communicate in the help of vision objects. The pattern is that we simulate what atoms actions into bits actions. Thus the author builds the Sikuli which help users for search and make a complimentary for the search area that can not use screen shots. Picture search is not a novel idea in this paper for sure but we didn’t realize that the picture can be used as a general input method as screenshots for search. Moreover, screenshots have an obvious advantage that it can be applied into any applications and any platform. Another idea the paper came up with is the automation using screenshots. It applied python language as a script language and it is much more convenient for us to use if we want to make a automation by using the property of a icon or a dialog box. The icons and the stuff in the screen shot are hard to articulate in a programming language or it may require a lot of endeavor. Currently, the programming language can take use of a lot of elements aggregated such as some directory or url. The idea of Sikuli Script inspire us with another new approach that’s the picture itself not the directory or the object. I realize that the computer vision technology has gained great success recently and we can even analyze the property or the feature of the picture. I learned a lesson from this article in the discussion part that we should go one step further besides the idea and the implementation itself and make a generalization of the research area. The paper proposed that in the screenshot way there exists some defects such as the invisible icons and the them variations which indicates the future research direction. Although the author himself point out the defects which can be addressed in the future, there also exists some disadvantage in my point of view.There are so many compatibility issues with scripting tasks on the fly. This application could do a huge swath of stuff that couldn't be accomplished with anything less than a full specific software solution. Really though, this is closer to how computer use should have been all along. It's absurd that we can ever get hung up on a technical problem like clicking 500 stars. The computer is supposed to help by automating stuff.