States
Introductionβ
In model-based GUI automation, the environment is represented by a state structure, which provides a complete map of the problem space. States are a fundamental component of this structure, representing conceptually cohesive sections of the graphical user interface. This approach moves away from fragile, sequential scripts and toward a more robust, explicit model of the GUI environment.
From the Paper: "The GUI environment, the set of all possible screens during an activity, is represented in the model by the set of GUI elements (E) organized into GUI states (S). The current scene (Ξ), the screen at a specific time, provides the real-time data necessary for the automation system to perceive, interpret, and interact with the GUI."
Since the GUI environment is conceptual and must be abstracted to finite sets, only its proxiesβelements, states, and scenesβare included in the model. This abstraction enables formal reasoning about GUI automation while maintaining practical applicability.
Building on the Overall Modelβ
In overall-model.md, we saw that the State Structure Ξ© = (E, S, T) represents the complete environment map, as part of the overall six-tuple (Ξ, Ξ©, a, M, Ο, Β§). This document focuses specifically on S (states) and how they relate to elements (E) and the visible GUI (Ξ).
Key Concept: States are subsets of elements. Each state s β S is defined by which GUI elements it contains from the global element set E.
Navigation Analogy: Think of states as cities on a map, elements as landmarks that identify each city, and the current screen (Ξ) as your current view. You know you're in a particular "city" (state) when you can see its characteristic "landmarks" (elements).
The State Structure (Ξ©)β
The formal model of the GUI environment is defined by the state structure Ξ© = (E, S, T).
Componentsβ
E = {eβ, eβ, ..., eβ}: The set of all GUI elements selected to model the environment
- Examples: images, regions, locations, text patterns
- Finite set of visual features used to identify states
- These are the "landmarks" that define your GUI map
- In Brobot:
StateImage,StateRegion,StateLocation,StateString,StateText
S: The set of all GUI states
- Each state s β S is a subset of E (formally: s β E)
- A state is a collection of related GUI elements
- Multiple states can be active simultaneously: S_Ξ β S
- States form a subset of the power set of E: S β P(E)
- In Brobot: Defined using
@Stateannotation orState.Builder
T: The set of all transitions between states
- Transitions are sequences of actions that change the set of active states
- Covered in detail in transitions.md
- In Brobot: Defined using
@Transitionannotations
Visible Elements and Active Statesβ
The relationship between what's on screen and which states are active involves two key concepts:
E_Ξ = f(Ξ) β E: The set of visible GUI elements
- Ξ is the visible GUI (the current pixel output of the screen)
- f(Ξ) is the element extraction function that identifies which elements from E are currently visible
- E_Ξ is the resulting set of visible elements
The Element Extraction Function f(Ξ): In Brobot, this function is implemented through pattern matching and computer vision:
f: Screen β P(E)
f(Ξ) = {e β E | pattern_match(e, Ξ) > threshold}
Where pattern_match(e, Ξ) uses image recognition algorithms (OpenCV, SikuliX) to determine if element e is visible in the current screen Ξ with sufficient confidence (typically 85-95% similarity).
Implementation: BrobotScreenCapture.java captures Ξ, then PatternFinder.java applies f(Ξ) to extract E_Ξ.
S_Ξ β S: The set of currently active states
- Maintained by the State Management System (M)
- Updated dynamically as elements appear and disappear
- Can contain multiple states simultaneously
- Implementation:
StateMemory.java(line 74:private Set<Long> activeStates)
State Activation: The Formal Conditionβ
The most important formal definition for understanding states is the activation condition:
Formal Definitionβ
A state s is active (s β S_Ξ) if and only if at least one of its elements is visible in the GUI:
s β S_Ξ βΊ s β© E_Ξ β β
where:
- s β S_Ξ means state s is active
- s β© E_Ξ is the intersection of the state's elements with visible elements
- β β means "is not empty" (at least one element is visible)
Equivalently, the set of active states can be expressed as:
S_Ξ = {s β S | s β© E_Ξ β β }
This reads: "S_Ξ is the set of all states s in S such that s has at least one visible element."
Intuitive Interpretation: If you can see any of a state's defining elements on your screen, that state is currently active. It's like knowing you're in a city because you can see at least one of its famous landmarks.
Visual Representationβ
Screen (Ξ) State Structure (Ξ©) Active States (S_Ξ)
ββββββββββββββββββββ
β [Logo] β ββββ HomeState = {logo, menu} HomeState
β [Menu Button] β (logo β E_Ξ)
β β
β [β Error Icon] β ββββ ErrorState = {errorIcon} ErrorState
β [Close Button] β (errorIcon β E_Ξ)
ββββββββββββββββββββ
LoginState = {loginForm} (inactive)
(loginForm β E_Ξ)
Result: S_Ξ = {HomeState, ErrorState}
Mathematical verification:
HomeState β© E_Ξ = {logo} β β
β HomeState β S_Ξ β
ErrorState β© E_Ξ = {errorIcon} β β
β ErrorState β S_Ξ β
LoginState β© E_Ξ = β
β LoginState β S_Ξ β
Practical Example: State Activation Flowβ
Scenario: User navigates from Login to Dashboard, then a popup appears.
-
Initial State: LoginPage
- Screen shows: Login form with username/password fields
- E_Ξ =
{loginButton, usernameField, passwordField} - LoginState β© E_Ξ =
{loginButton, usernameField, passwordField}β β - S_Ξ =
{LoginState}β
-
After Login Transition: Dashboard appears
- Screen shows: Dashboard with logo, menu, user profile
- E_Ξ =
{dashboardLogo, menuBar, userProfile} - DashboardState β© E_Ξ =
{dashboardLogo, menuBar, userProfile}β β - LoginState β© E_Ξ = β (login elements no longer visible)
- S_Ξ =
{DashboardState}β
-
Popup Appears: Error dialog overlays dashboard
- Screen shows: Error popup over partially visible dashboard
- E_Ξ =
{dashboardLogo, errorIcon, closeButton}(dashboard partially visible) - DashboardState β© E_Ξ =
{dashboardLogo}β β (still active!) - ErrorPopupState β© E_Ξ =
{errorIcon, closeButton}β β (now active!) - S_Ξ =
{DashboardState, ErrorPopupState}β (both active simultaneously)
-
Close Popup: Transition deactivates popup
- Screen shows: Full dashboard (popup closed)
- Transition explicitly marks ErrorPopupState as inactive
- E_Ξ =
{dashboardLogo, menuBar, userProfile} - S_Ξ =
{DashboardState}β
Key Insight: The activation condition s β S_Ξ βΊ s β© E_Ξ β β automatically handles step 3 where two states are active. The framework doesn't need special logic for overlaysβthe mathematical model naturally supports compositional GUIs.
State Activation and Deactivation Mechanismsβ
Activation: Element Found β State Activeβ
When an observation action (like action.find()) successfully locates an element, the element's parent state is marked as active.
Formal: β e β E_found : (s_e, True) β S_a
This means: "For every element found, add its state to the state information with value True."
Example:
// Action finds login button on screen
ActionResult result = action.find(loginButton);
if (result.isSuccess()) {
// loginButton β E_Ξ (element is visible)
// LoginState β© E_Ξ = {loginButton} β β
(state has visible elements)
// LoginState β S_Ξ (state becomes active)
}
Implementation Details (StateMemory.java):
// Line 188-203: addActiveState method
public void addActiveState(Long activeState) {
if (!activeStates.contains(activeState)) {
activeStates.add(activeState);
// State becomes part of S_Ξ
}
}
Deactivation: Implementation Choice in Brobotβ
Important: In Brobot's implementation, states are not automatically deactivated when elements are not found. This is a deliberate design choice, not a theoretical requirement.
From the Paper: "A state is considered active if at least one of its elements is found. It is not marked as inactive based on missing elements... Individual images can reliably signal a state's existence but not its absence; therefore, Brobot relies on transitions to deactivate states."
Rationale: Individual images can reliably signal a state's existence but not its absence. A failed search might mean:
- The element moved to a different location
- Timing issue (element not loaded yet)
- Search region was too narrow
- Element genuinely disappeared
Brobot's Solution: States are deactivated explicitly through transitions, which provides more reliable state management:
Formal: β e β E_a \ E_found : (s_e, False) β S_a
This means: "For elements not found, do NOT add their states to S_a with value False." Only transitions can deactivate.
State Management Function (from Paper Section 6.2):
f_M(S_Ξ, S_a, S_t) = (S_Ξ βͺ {s β S | (s, True) β S_a βͺ S_t}) \ {s β S | (s, False) β S_t}
where:
- States are added to S_Ξ when marked active in either S_a or S_t
- States are only removed from S_Ξ when explicitly marked inactive in S_t (transition-based updates)
- No state is removed based solely on action results (S_a never contains pairs with
False)
Example:
// Transition from Login to Dashboard
@Transition(fromState = LoginState.class, toState = DashboardState.class)
public boolean login() {
action.type(usernameField, "user");
action.type(passwordField, "pass");
boolean success = action.click(loginButton).isSuccess();
// If successful, transition explicitly:
// - Deactivates LoginState: (LoginState, False) β S_t
// - Activates DashboardState: (DashboardState, True) β S_t
return success;
}
Theoretical Generalization: The theoretical model doesn't mandate this approach. Alternative implementations could use probabilistic deactivation, timeout-based deactivation, or explicit confirmation searches. Brobot chooses transition-based deactivation for reliability.
Formal State Propertiesβ
Given the state structure Ξ© = (E, S, T), the following formal properties hold:
-
State-Element Relationship: Each state is a subset of elements
- βs β S: s β E
- Consequence: A state cannot contain elements not in E
- Example: If E = {logo, button, field}, then s = {logo, button} is valid, but s = {logo, unknownElement} is invalid
-
Power Set Constraint: States form a subset of the power set of E
- S β P(E) where P(E) is the power set of E
- This means: not every possible subset of E needs to be a state
- Example: With E = {eβ, eβ, eβ}, we have 8 possible subsets, but might only define 3 as states
-
Simultaneous Active States: Multiple states may be active at once
- S_Ξ β S where |S_Ξ| β₯ 0
- This distinguishes model-based automation from traditional finite state machines
- Enables compositional GUI representation (base + overlays)
-
Activation Condition: State activation depends on element visibility
- s β S_Ξ βΊ s β© E_Ξ β β
- This is the fundamental theorem of state activation
- Bidirectional implication: necessary AND sufficient condition
-
Finiteness: Both element and state sets are finite
- |E| < β and |S| < β
- Practical necessity: infinite sets cannot be implemented
- Enables algorithmic path finding and state management
-
Active States Subset: Active states are always a subset of all states
- S_Ξ β S at all times
- Invariant: S_Ξ can never contain states not in S
- Maintained by State Management System (M)
-
Empty Intersection Property: Inactive states have no visible elements
- s β S_Ξ βΉ s β© E_Ξ = β
- Contrapositive of the activation condition
- If a state is not active, none of its elements are visible
-
Non-Empty States: States must contain at least one element
- βs β S: s β β
- Practical constraint: empty states cannot be activated
- In Brobot: State.Builder requires at least one element
States in Brobotβ
State Definitionβ
From the Paper: "A state in model-based GUI automation is a collection of related GUI elements. State objects often are grouped spatially or appear at the same time. Objects used together in a process are likely candidates for belonging to the same state. However, these configurations are not absolute rules, and the definition of a state is subjective. A state has a meaning within the automated environment that can vary depending on the automation goals, and a state configuration should make sense in the context of the automation application."
In Brobot, a state is a collection of related GUI elements. These elements are often grouped spatially, appear together on screen, or are used together in a process. However, these groupings are not absolute rulesβthe definition of a state is subjective and should make sense for your specific automation task.
A state has meaning within the automated environment that can vary depending on automation goals. State configurations should be context-driven and aligned with your application's purpose.
Code Example: Defining a Stateβ
Updated Example Using Current Brobot API:
import io.github.jspinak.brobot.model.state.State;
import io.github.jspinak.brobot.model.state.StateImage;
import io.github.jspinak.brobot.model.action.ActionRecord;
// Define the element that identifies this state
private StateImage toWorldButton = new StateImage.Builder()
.addPatterns("toWorldButton") // Pattern file to match
.setFixedForAllPatterns(true) // Element doesn't move on screen
.withActionHistory(new ActionRecord(220, 600, 20, 20)) // Expected location
.build();
// Define the state as a collection of elements
// In this case, Home state is defined by a single button
private State homeState = new State.Builder(Name.HOME)
.withImages(toWorldButton) // Element(s) that define this state
.build();
Where Name is a user-defined enum:
public enum Name implements StateEnum {
HOME, WORLD, ISLAND, SETTINGS
}
Alternative: Multiple Elements Per State:
private StateImage logo = new StateImage.Builder()
.addPatterns("login-logo")
.build();
private StateImage usernameField = new StateImage.Builder()
.addPatterns("username-field")
.build();
private StateImage loginButton = new StateImage.Builder()
.addPatterns("login-button")
.build();
// State defined by multiple elements
private State loginState = new State.Builder(Name.LOGIN)
.withImages(logo, usernameField, loginButton) // s = {logo, usernameField, loginButton} β E
.build();
Formal Interpretation of the Codeβ
The Java code above defines a state in Brobot. In terms of our formal model:
- E_home =
{toWorldButton}: The element set for the Home state - s_home = E_home β E: The HOME state is this subset of the global element set E
- Activation: s_home β S_Ξ βΊ toWorldButton β E_Ξ
When Brobot's visual search successfully finds the toWorldButton pattern on screen (through pattern matching at coordinates (220, 600) with size 20Γ20), the element becomes part of E_Ξ, and thus HOME becomes part of S_Ξ through the activation condition.
Data Flow:
1. Screen capture β Ξ (pixel output)
2. Pattern matching β f(Ξ) extracts visible elements β E_Ξ
3. Element found β toWorldButton β E_Ξ
4. Activation condition β s_home β© E_Ξ = `{toWorldButton}` β β
5. State becomes active β s_home β S_Ξ
Practical Example from the Paper: The DoT Appβ
From the Paper Section 5.3.1: "To make the abstract components of the Overall Model (Ξ, Ξ©, a, M, Ο, Β§) more concrete, I use the DoT app, an experimental application designed to automate tasks in the mobile game Dawn of Titans. This app will serve as a running example to illustrate how the theoretical models are realized in practice."
The DoT (Dawn of Titans) application demonstrates state definition in a real-world automation context:
States Defined:
- Home State: Identified by the
toWorldButtonelement - World State: Identified by elements in the world map view
- Island State: Identified by island-specific UI elements
State Structure Example:
Ξ©_DoT = (E_DoT, S_DoT, T_DoT) where:
E_DoT = {toWorldButton, worldMapRegion, islandNameRegion, ...}
S_DoT = {
Home = {toWorldButton},
World = {worldMapRegion, ...},
Island = {islandNameRegion, ...}
}
T_DoT = {HomeToWorld, WorldToIsland, ...}
Key Insight: The DoT app uses a relatively simple state structure (3 main states) for its automation task. The paper notes that a more granular approach (e.g., adding a NewIsland state) might have improved the automation for specific use cases. This demonstrates that state granularity should match automation needs, not GUI complexity.
Connection to State Management (M)β
The State Management System (M) is responsible for maintaining S_Ξ, the set of active states. It continuously:
- Processes action results to identify newly visible elements (E_Ξ)
- Applies the activation condition s β S_Ξ βΊ s β© E_Ξ β β to determine which states should be active
- Processes explicit state changes from transitions (S_t)
- Updates S_Ξ using the state management function f_M
Formal Definition (from overall-model.md):
M = (S_Ξ) where S_Ξ β S is the current set of active states
f_M: (S_Ξ, S_a, S_t) β S'_Ξ
where:
- S_a: state information from actions (which states have visible elements)
- S_t: state information from transitions (which states to activate/deactivate)
- S'_Ξ: the updated set of active states
Implementation Architecture:
Action Execution β ActionResult
β
State Information Extraction β S_a
β
State Management (M) β S_t (from transitions)
β
Updated Active States β S'_Ξ
Source Files:
StateMemory.java(lines 74-96): Maintains S_Ξ asSet<Long> activeStatesStateMemoryUpdater.java: Implements f_M update logicStateDetector.java: Extracts S_a from action results
For complete details on the State Management System, see overall-model.md.
Best Practices for State Designβ
Cohesivenessβ
Group elements into a state that are logically related, appear together on screen, or are used together in a process.
Example:
// Good: LoginState contains all login-related elements
LoginState = {usernameField, passwordField, loginButton, forgotPasswordLink}
// Poor: Mixing unrelated elements from different screens
MixedState = {usernameField, dashboardLogo, settingsButton}
Cohesion Test: Ask "If I see element X, should I expect to see element Y?" If yes, they likely belong in the same state.
Context-Drivenβ
The definition of a state is subjective and should be tailored to your automation goals. The same GUI might be modeled differently depending on what you're trying to automate.
Example: For a spreadsheet application:
- Document editing automation: Might define states by worksheet tabs
- States: Sheet1State, Sheet2State, Sheet3State
- UI testing automation: Might define states by dialog windows and ribbons
- States: MainEditorState, FormatDialogState, ChartDialogState
- Data extraction automation: Might define states by visible cell ranges
- States: HeaderVisibleState, DataRangeVisibleState
Key Principle: States should align with your automation's decision points, not necessarily the application's internal architecture.
Modularityβ
The state-based approach allows states and transitions to be built, tested, and debugged independently. This modular design helps manage complexity and localize troubleshooting efforts.
Benefit: If LoginState has issues, you can:
- Test it in isolation with mock elements
- Modify its elements without affecting other states
- Update its transitions independently
- Verify activation condition in unit tests
Testing Example:
@Test
public void testLoginStateActivation() {
// Mock E_Ξ with login elements visible
Set<Element> visibleElements = Set.of(loginButton, usernameField);
// Verify activation condition
assertTrue(loginState.hasVisibleElements(visibleElements));
// Equivalent to: loginState β© E_Ξ β β
}
Granularity Balanceβ
Finding the right number of states is keyβneither too fine-grained nor too coarse.
Too Fine-Grained β:
ButtonVisibleState, ButtonHoverState, ButtonClickedState, ButtonDisabledState
// Problem: Explosion of states, overly complex navigation
// Result: Hundreds of states for simple GUI
Too Coarse-Grained β:
EntireApplicationState
// Problem: Can't detect intermediate steps, poor error recovery
// Result: Can't navigate or handle errors effectively
Balanced β :
LoginState, DashboardState, SettingsState, ReportState
// Right level: Represents meaningful application sections
// Result: Manageable state count, effective navigation
Granularity Guidelines:
- Start coarse: Begin with main application sections
- Refine as needed: Add states when you need to distinguish scenarios
- Test navigation: If path finding struggles, you may need more states
- Monitor complexity: If you have >50 states, consider whether some can be merged
Paper Example: The DoT app uses a simple design with HOME, WORLD, and ISLAND states. The paper notes that a more granular NewIsland state might have been better for the specific automation taskβdemonstrating that granularity should match your automation needs.
Multiple Active States: A Key Architectural Choiceβ
One of the most important distinctions of model-based GUI automation is that multiple states can be active simultaneously. This sets it apart from traditional finite state machines where only one state can be active at a time.
Why Multiple Active States?β
Real GUIs are compositional:
- Base application window (DashboardState)
- Overlay dialogs (ErrorPopupState)
- Floating toolbars (ToolboxState)
- Background processes (LoadingState)
All can be visible and active at the same time.
Real-World Analogy: Think of your computer desktop. You might have:
- Main application window (like a city you're in)
- Notification popups (like street signs you see)
- System tray icons (like landmarks always visible)
- Background processes (like weather you experience)
You're experiencing all of these simultaneouslyβmodel-based GUI automation reflects this reality.
Formal Definitionβ
S_Ξ β S where |S_Ξ| β₯ 0
This allows S_Ξ to contain:
- Zero states: S_Ξ = β (no states activeβrare, usually startup)
- One state: S_Ξ =
{DashboardState} - Multiple states: S_Ξ =
{DashboardState, ErrorPopupState, LoadingState}
Cardinality Bounds:
- Minimum: |S_Ξ| = 0 (empty set)
- Maximum: |S_Ξ| = |S| (all states activeβtheoretically possible but rare)
- Typical: 1 β€ |S_Ξ| β€ 5 (one base state plus a few overlays)
Practical Impactβ
When finding transitions during path traversal, the framework looks for transitions available from any of the currently active states:
// With S_Ξ = {DashboardState, ErrorPopupState}
// Framework can execute:
// - Transitions from DashboardState (navigate to Settings)
// - Transitions from ErrorPopupState (close error dialog)
// - Transitions from BOTH states (e.g., logout available from anywhere)
Mathematical Formulation:
Available transitions: T_available = β(s β S_Ξ) T_s
where T_s = {t β T | (s, t) β Ξ΄}
This compositional approach enables robust navigation in complex GUIs because:
- Error Recovery: Unexpected popups don't break navigationβthey just become additional active states
- Flexible Paths: More starting points (S_Ξ) means more possible paths
- Natural Modeling: Matches how real GUIs actually work
Comparison with Traditional FSMβ
Traditional Finite State Machine:
Current State: DashboardState (single state)
Popup Appears β Must transition to PopupState
Dashboard No Longer Active β Lost context
Model-Based GUI Automation:
Current States: {DashboardState} (can be multiple)
Popup Appears β Add PopupState to active set
Current States: {DashboardState, PopupState} (both active)
Dashboard Context Preserved β Can resume after closing popup
Common Mistakes When Defining Statesβ
β Mistake 1: Defining States by Actions Instead of Elementsβ
Wrong:
ClickingButtonState, TypingTextState, WaitingState
Right:
LoginPageState (identified by login elements visible)
DashboardState (identified by dashboard elements visible)
States represent what's visible, not what you're doing.
Why this matters: Actions are temporary operations; states are persistent GUI configurations. If you define states by actions, you'll create states that can never be reliably detected because "clicking" isn't a visual property.
Correct Thinking: "What elements are on screen?" not "What am I doing?"
β Mistake 2: One-to-One Mapping with Screensβ
Don't assume each state = each screen. Some screens may need multiple states, and some states may span multiple screens.
Example: A settings dialog with tabs might be:
- One state:
SettingsState(simple approach)- Works if: All tabs share common elements and you don't need to distinguish them
- Multiple states:
GeneralSettingsState,PrivacySettingsState,AdvancedSettingsState- Works if: Each tab has distinct elements and your automation needs to know which tab is active
Decision Criterion: Do your automation tasks require distinguishing these configurations? If yes, create separate states. If no, use one state.
Example Decision Tree:
Does your automation need to:
- Navigate specifically to the Privacy tab? β Multiple states
- Just open Settings (any tab is fine)? β Single state
- Perform different actions per tab? β Multiple states
- Perform same actions regardless of tab? β Single state
β Mistake 3: Ignoring Overlays and Popupsβ
Wrong: Only defining main application states
Right: Include states for:
- Error dialogs (ErrorDialogState)
- Loading screens (LoadingState)
- Tooltips (if relevant to automation) (TooltipState)
- Notification banners (NotificationState)
- Popup menus (ContextMenuState)
- Modal dialogs (ConfirmDialogState)
Why this matters: Popups can appear unexpectedly and block automation. If they're not modeled as states, the framework can't:
- Detect their presence
- Find transitions to close them
- Resume automation after handling them
Remember: Multiple states can be active, so popups are additional active states, not replacements. When a popup appears over the dashboard, both DashboardState and PopupState are active.
Real-World Impact: A common automation failure occurs when an unexpected error dialog appears. With proper state modeling, the framework detects the dialog state, finds the "close dialog" transition, executes it, and resumes the intended path. Without it, the automation fails.
Related Documentationβ
Theoretical Foundationsβ
- Introduction - Conceptual overview of model-based approach with practical examples
- Academic Foundation - Research background, empirical evidence, and citations
- Overall Model - Complete formal model (Ξ, Ξ©, a, M, Ο, Β§) with State Structure definition
- Transitions - How transitions connect states and manage state changes (T component)
- Testing the Automation - Testing states independently using mock mode
Practical Implementationβ
- Getting Started - Hands-on tutorials for creating states in Brobot applications
- AI Brobot Project Creation - Complete API reference for State and StateImage classes
Source Code Referencesβ
- State.java (
library/src/main/java/io/github/jspinak/brobot/model/state/State.java) - State model implementation - StateMemory.java (
library/src/main/java/io/github/jspinak/brobot/statemanagement/StateMemory.java) - S_Ξ maintenance - StateImage.java (
library/src/main/java/io/github/jspinak/brobot/model/state/StateImage.java) - Element definition
Mathematical Notation Summaryβ
For quick reference, key symbols used in this document:
| Symbol | Definition | Description |
|---|---|---|
| Ξ© | State Structure | Tuple (E, S, T) |
| E | Element Set | {eβ, eβ, ..., eβ} all GUI elements |
| S | State Set | All possible GUI states |
| T | Transition Set | All transitions between states |
| Ξ | Visible GUI | Current screen pixel output |
| f(Ξ) | Element Extraction | Function extracting visible elements from screen |
| E_Ξ | Visible Elements | f(Ξ) β E, elements currently on screen |
| S_Ξ | Active States | {s β S | s β© E_Ξ β β }, currently active states |
| s β E | State Definition | Each state is a subset of elements |
| s β© E_Ξ β β | Activation Condition | State is active iff it has visible elements |
| M | State Management | System that maintains S_Ξ |
| P(E) | Power Set | Set of all subsets of E |
| S_a | Action State Info | State information derived from actions |
| S_t | Transition State Info | State information from transitions |
| Ξ΄ | Transition Relation | Ξ΄ β S Γ T, which transitions are accessible from which states |
Key Theorems and Propertiesβ
- Activation Theorem: s β S_Ξ βΊ s β© E_Ξ β β
- Active States Set: S_Ξ = {s β S | s β© E_Ξ β β }
- State-Element Relationship: βs β S: s β E
- Power Set Constraint: S β P(E)
- State Management: f_M: (S_Ξ, S_a, S_t) β S'_Ξ
- Finiteness: |E| < β β§ |S| < β
These symbols provide precise mathematical definitions. For conceptual understanding, focus on the core idea: states are collections of elements, and a state is active when at least one of its elements is visible on screen.
Further Readingβ
For those interested in the theoretical foundations and empirical evidence:
- Paper Section 5.3: Formal definition of State Structure and components
- Paper Section 6.2: Brobot's implementation of state management
- Paper Section 5.3.1: Practical example using the DoT application
- Figure 6 (Paper): Visual example of three simultaneous states in a GUI
For academic citations and empirical evidence of the problems model-based automation solves, see academic-foundation.md.