Skip to content

Vision & OCR

adbflow offers visual automation through template matching, color detection, and OCR. These features require optional extras.

Setup

# Template matching and color detection
pip install adbflow[vision]

# OCR (text recognition)
pip install adbflow[ocr]

# Both
pip install adbflow[all]

Template Matching

Find images on screen by matching a template image against the current screenshot.

async def template_examples(device):
    vision = device.vision

    # Find a template on screen
    result = await vision.find_on_screen_async("button.png", threshold=0.8)
    if result:
        print(f"Found at {result.center} with confidence {result.confidence}")

    # Find all occurrences
    results = await vision.find_all_on_screen_async("icon.png", threshold=0.7)
    for r in results:
        print(f"Match at {r.center} ({r.confidence:.2f})")

    # Tap a template
    result = await vision.tap_template_async("play_button.png")

    # Wait for template and tap
    result = await vision.wait_and_tap_async(
        "loading_complete.png",
        timeout=15.0,
        threshold=0.8
    )

Threshold

The threshold parameter (0.0–1.0) controls match sensitivity:

  • 0.9+ — very strict, only near-perfect matches
  • 0.8 — good default for most use cases
  • 0.7 — more lenient, may produce false positives

MatchResult

Each match returns a MatchResult with:

  • centerPoint at the center of the match
  • boundsRect of the matched region
  • confidence — match score (0.0–1.0)

Color Detection

Find pixels of a specific color on screen:

async def color_examples(device):
    # Find all pixels matching a color (RGB)
    points = await device.vision.find_color_async(
        color_rgb=(255, 0, 0),  # red
        tolerance=20
    )
    print(f"Found {len(points)} red pixels")

    # Search within a specific region
    from adbflow.utils.geometry import Rect
    region = Rect(100, 200, 500, 600)
    points = await device.vision.find_color_async(
        color_rgb=(0, 255, 0),
        tolerance=15,
        region=region
    )

OCR (Text Recognition)

Read text from the screen using EasyOCR.

async def ocr_examples(device):
    ocr = device.ocr

    # Read all text on screen
    results = await ocr.read_screen_async()
    for r in results:
        print(f"'{r.text}' at {r.bounds} ({r.confidence:.2f})")

    # Find specific text
    result = await ocr.find_text_async("Login")
    if result:
        print(f"Found 'Login' at {result.bounds}")

    # Find text containing a substring
    results = await ocr.find_text_contains_async("Error")

    # Tap on recognized text
    result = await ocr.tap_text_async("Submit")

    # Wait for text to appear and tap
    result = await ocr.wait_for_text_async(
        "Welcome",
        timeout=10.0,
        interval=1.0
    )

Multiple Languages

async def multilingual(device):
    # Read with multiple languages
    results = await device.ocr.read_screen_async(languages=("en", "ja"))

OCRResult

Each result has:

  • text — recognized text string
  • boundsRect of the text region
  • confidence — recognition confidence (0.0–1.0)

Low-Level API

For direct image operations without ADB:

from adbflow.vision.template import TemplateMatcher

matcher = TemplateMatcher()

# Match against PIL images directly
result = matcher.match(screenshot_image, template_image, threshold=0.8)
all_results = matcher.match_all(screenshot_image, template_image, threshold=0.7)
color_points = matcher.match_color(screenshot_image, (255, 0, 0), tolerance=20)
from adbflow.ocr.engine import OCREngine

engine = OCREngine()
results = engine.recognize(pil_image, languages=("en",))
result = engine.find_text(pil_image, "Login")

Tips

  • Template images should be cropped tightly around the target element.
  • OCR is slower than UI hierarchy queries — prefer Selector-based approaches when possible.
  • EasyOCR downloads language models on first use (~100MB per language).
  • Vision and OCR features raise VisionError / OCRError if the required extras aren't installed.