Vision & OCR¶
adbflow offers visual automation through template matching, color detection, and OCR. These features require optional extras.
Setup¶
# Template matching and color detection
pip install adbflow[vision]
# OCR (text recognition)
pip install adbflow[ocr]
# Both
pip install adbflow[all]
Template Matching¶
Find images on screen by matching a template image against the current screenshot.
async def template_examples(device):
vision = device.vision
# Find a template on screen
result = await vision.find_on_screen_async("button.png", threshold=0.8)
if result:
print(f"Found at {result.center} with confidence {result.confidence}")
# Find all occurrences
results = await vision.find_all_on_screen_async("icon.png", threshold=0.7)
for r in results:
print(f"Match at {r.center} ({r.confidence:.2f})")
# Tap a template
result = await vision.tap_template_async("play_button.png")
# Wait for template and tap
result = await vision.wait_and_tap_async(
"loading_complete.png",
timeout=15.0,
threshold=0.8
)
Threshold¶
The threshold parameter (0.0–1.0) controls match sensitivity:
- 0.9+ — very strict, only near-perfect matches
- 0.8 — good default for most use cases
- 0.7 — more lenient, may produce false positives
MatchResult¶
Each match returns a MatchResult with:
center—Pointat the center of the matchbounds—Rectof the matched regionconfidence— match score (0.0–1.0)
Color Detection¶
Find pixels of a specific color on screen:
async def color_examples(device):
# Find all pixels matching a color (RGB)
points = await device.vision.find_color_async(
color_rgb=(255, 0, 0), # red
tolerance=20
)
print(f"Found {len(points)} red pixels")
# Search within a specific region
from adbflow.utils.geometry import Rect
region = Rect(100, 200, 500, 600)
points = await device.vision.find_color_async(
color_rgb=(0, 255, 0),
tolerance=15,
region=region
)
OCR (Text Recognition)¶
Read text from the screen using EasyOCR.
async def ocr_examples(device):
ocr = device.ocr
# Read all text on screen
results = await ocr.read_screen_async()
for r in results:
print(f"'{r.text}' at {r.bounds} ({r.confidence:.2f})")
# Find specific text
result = await ocr.find_text_async("Login")
if result:
print(f"Found 'Login' at {result.bounds}")
# Find text containing a substring
results = await ocr.find_text_contains_async("Error")
# Tap on recognized text
result = await ocr.tap_text_async("Submit")
# Wait for text to appear and tap
result = await ocr.wait_for_text_async(
"Welcome",
timeout=10.0,
interval=1.0
)
Multiple Languages¶
async def multilingual(device):
# Read with multiple languages
results = await device.ocr.read_screen_async(languages=("en", "ja"))
OCRResult¶
Each result has:
text— recognized text stringbounds—Rectof the text regionconfidence— recognition confidence (0.0–1.0)
Low-Level API¶
For direct image operations without ADB:
from adbflow.vision.template import TemplateMatcher
matcher = TemplateMatcher()
# Match against PIL images directly
result = matcher.match(screenshot_image, template_image, threshold=0.8)
all_results = matcher.match_all(screenshot_image, template_image, threshold=0.7)
color_points = matcher.match_color(screenshot_image, (255, 0, 0), tolerance=20)
from adbflow.ocr.engine import OCREngine
engine = OCREngine()
results = engine.recognize(pil_image, languages=("en",))
result = engine.find_text(pil_image, "Login")
Tips¶
- Template images should be cropped tightly around the target element.
- OCR is slower than UI hierarchy queries — prefer
Selector-based approaches when possible. - EasyOCR downloads language models on first use (~100MB per language).
- Vision and OCR features raise
VisionError/OCRErrorif the required extras aren't installed.