AGENTS.md - TreeHaver Development Guide
π― Project Overview
TreeHaver is a cross-Ruby adapter for AST parsing libraries - think Faraday for parsing. It provides a unified API across 10 different backends (tree-sitter, Prism, Psych, Citrus, Parslet, etc.) that works on MRI, JRuby, and TruffleRuby.
Core Philosophy: Write once, run anywhere. Learn once, write anywhere.
ποΈ Architecture: The Adapter Pattern
Backend Selection Strategy
TreeHaver uses automatic backend selection with environment-based control:
-
Priority Chain: Explicit
backend:param βTreeHaver.backendβTREE_HAVER_BACKENDenv β Auto-detection -
Auto-detection Order (MRI):
:mriβ:rustβ:ffiβ:citrusβ:parslet - Fallback Behavior: If tree-sitter runtime missing, auto-falls back to Citrus/Parslet
Engine Exclusivity: Ruby engines (MRI, JRuby, TruffleRuby) never run simultaneously. Auto-detection order adapts to the running engine - JRuby prioritizes :java and :ffi, TruffleRuby uses pure Ruby backends only.
Environment Variables:
-
TREE_HAVER_BACKEND- Force single backend (:auto,:mri,:ffi,:citrus, etc.) -
TREE_HAVER_NATIVE_BACKEND- Restrict native backends (comma-separated ornone) -
TREE_HAVER_RUBY_BACKEND- Restrict Ruby backends (comma-separated ornone)# Check what's allowed TreeHaver.allowed_native_backends # => [:mri, :ffi] or [:auto] or [:none] TreeHaver.backend_allowed?(:ffi) # => true/falseWrapping/Unwrapping Architecture
Critical Design Principle:
TreeHaver::Parserhandles ALL wrapping/unwrapping. Backends work with raw objects only.
Key files: -
WRAPPING-ARCHITECTURE.md- Complete unwrapping contract -
lib/tree_haver/parser.rb- The only place that wraps/unwraps objects
Language Object Flow:- User passes:
TreeHaver::Backends::*::Languagewrapper - Parser unwraps via
#unwrap_language(checks backend compatibility) - Backend receives appropriate raw object (MRI:
::TreeSitter::Language, Rust: String, FFI: wrapper, etc.)
Tree Object Flow: - Backend returns raw tree β Parser wraps as
TreeHaver::Tree - Incremental parsing: Parser unwraps
old_tree.inner_treebefore passing to backendPosition API Unification
See
POSITION-API-SUMMARY.mdfor details. All backends expose:
- User passes:
-
start_line,end_line(1-based, human-readable) -
source_position(hash with 1-based lines, 0-based columns) - Inheritance:
TreeHaver::Base::Nodeprovides defaults, backends override as neededπ§ Development Workflows
Running Tests
# Full suite (required for coverage thresholds) bundle exec rspec # Single file (disable coverage threshold) K_SOUP_COV_MIN_HARD=false bundle exec rspec spec/tree_haver/parser_spec.rb # FFI backend isolation (run BEFORE other tests to avoid backend pollution) bundle exec rake ffi_specsNote: Always run commands after a standalone
cdsodirenvcan load ENV. Do NOT chaincdwith&&.
Example (two separate commands):cd /home/pboling/src/kettle-rb/tree_haver bin/rspec spec/tree_haver/parser_spec.rbCritical: FFI specs run FIRST in a clean environment (
:ffi_backendtag triggers isolated mode). SeeRakefilelines 66-95 for SimpleCov merging strategy.Coverage Reports
Use
bin/rake coverage- pre-configured ENV variables for coverage reporting
Usebin/rspec- Allows customization of ENV variables for coverage reporting with specific settings
Key env vars: -
K_SOUP_COV_DO=true- Enable coverage (default in.envrc) -
K_SOUP_COV_MIN_LINE=83- Line coverage threshold -
K_SOUP_COV_MIN_BRANCH=72- Branch coverage threshold -
K_SOUP_COV_MIN_HARD=true- Fail if thresholds not met -
K_SOUP_COV_FORMATTERS="json,xml,lcov,tty"- Output formats -
K_SOUP_COV_COMMAND_NAME- Unique name for SimpleCov merging
Never review HTML reports - use JSON (preferred), XML, LCOV, or RCOV.
Usekettle-soup-cover -d- Reads SimpleCov output generated by prior command; prints human & AI digestable report.Grammar Discovery
GrammarFinderauto-discovers tree-sitter libraries across platforms:finder = TreeHaver::GrammarFinder.new(:toml) if finder.available? finder.register! # Now: TreeHaver::Language.toml endSearch order: ENV var (
TREE_SITTER_TOML_PATH) β extra_paths β base dirs (/usr/lib,/usr/local/lib, etc.)
Security: Path validation rejects../, validates extensions (.so,.dylib,.dll)
Working Examples
The examples/ directory contains fully functional scripts demonstrating TreeHaver patterns:
-
auto_json.rb- Backend auto-selection with JSON parsing -
backend_selection.rb- Testing environment variable effects on backend availability -
parser_for_citrus.rb- Citrus backend usage patterns - Real-world usage patterns for grammar registration, language loading, and tree traversal
These are excellent references for understanding how components work together in practice.
π Project Conventions
Backend Registry Pattern
External gems register their availability via BackendRegistry:
# In external gem (e.g., commonmarker-merge)
TreeHaver::BackendRegistry.register_tag(
:commonmarker_backend,
category: :backend,
require_path: "commonmarker/merge",
) { Commonmarker::Merge::Backend.available? }
This enables dynamic RSpec tag filtering without hardcoding backend knowledge.
Kettle-Dev Tooling
This project uses kettle-dev (sister project in kettle-rb org) for gem maintenance automation:
-
Templating: Lines between
kettle-dev:freeze/kettle-dev:unfreezecomments are preserved during template updates (seetree_haver.gemspeclines 3-5) - CI Workflows: GitHub Actions and GitLab CI configurations are managed by kettle-dev templates
-
Releases: Use
kettle-releasecommand for automated release process (versioning, changelog, gem publishing)
Version Requirements
- Ruby >= 3.2.0 (gemspec line 19)
-
ruby_tree_sitterv2.0+ required (exception hierarchy changed: all inherit fromException, notStandardError) - Tree-sitter runtime compatibility: Backend-specific (see README βBackend Requirementsβ)
π§ͺ Testing Patterns
Dependency Tag System
RSpec tests use dynamic tags based on backend availability:
RSpec.describe("feature", :toml_parsing, :commonmarker_backend) do # Auto-skipped if toml-rb or commonmarker unavailable endTags resolved by
lib/tree_haver/rspec/dependency_tags.rbviaBackendRegistry.Backend Conflict Protection
TreeHaver.backend_protect(default: true) prevents incompatible backend combinations (e.g., FFI after MRI). Tests may disable this withTreeHaver.backend_protect = false.Matrix Testing
spec_matrix/contains backend compatibility matrix tests with minimal helper to avoid pre-loading dependencies.π Critical Files
-
lib/tree_haver/parser.rb- Main facade, handles all wrapping/unwrapping (439 lines) -
lib/tree_haver/backend_registry.rb- Dynamic backend registration system (458 lines) -
lib/tree_haver/grammar_finder.rb- Platform-aware grammar discovery (375 lines) -
WRAPPING-ARCHITECTURE.md- Unwrapping contracts and design principles (277 lines) -
POSITION-API-SUMMARY.md- Position API unification across backends (136 lines) -
lib/tree_haver.rb- Module-level backend configuration and language registryπ Common Tasks
# Run all specs with coverage bundle exec rake spec # Generate coverage report and open in browser bundle exec rake coverage # Check code quality bundle exec rake reek bundle exec rake rubocop_gradual # Run benchmarks (skipped on CI) bundle exec rake bench # Prepare changelog for release, build and release kettle-changelog && kettle-releaseπ Integration Points
-
Backends: 10 backends in
lib/tree_haver/backends/(mri, rust, ffi, java, prism, psych, citrus, parslet, commonmarker, markly) -
External Gems: Uses
*-mergefamily (toml-merge, commonmarker-merge, etc.) via backend registry -
RSpec: Deep integration via
tree_haver/rspec.rbfor dependency tagging -
SimpleCov: Custom merging strategy for multi-task coverage (FFI specs + main specs)
π‘ Key Insights
- Backend pollution: MRI backend loads native tree-sitter, preventing FFI backend from working. Always run FFI specs first.
-
Language caching:
LanguageRegistrycaches loaded languages. Clear withLanguageRegistry.clear_cache!in tests. -
Backend compatibility: Check
TreeHaver.capabilitiesfor backend-specific features (incremental parsing, queries, etc.). -
Grammar registration: Use
GrammarFinderfor tree-sitter,CitrusGrammarFinderfor Citrus,ParsletGrammarFinderfor Parslet. -
Exception mapping: TreeHaver catches backend exceptions and converts to
TreeHaver::NotAvailablefor consistent error handling.