Class: TreeHaver::GrammarFinder

Inherits:
Object
  • Object
show all
Defined in:
lib/tree_haver/grammar_finder.rb

Overview

Generic utility for finding tree-sitter grammar shared libraries.

GrammarFinder provides platform-aware discovery of tree-sitter grammar
libraries. Given a language name, it searches common installation paths
and supports environment variable overrides.

This class is designed to be used by language-specific merge gems
(toml-merge, json-merge, bash-merge, etc.) without requiring TreeHaver
to have knowledge of each specific language.

== Security Considerations

Loading shared libraries is inherently dangerous as it executes arbitrary
native code. GrammarFinder performs the following security validations:

  • Language names are validated to contain only safe characters
  • Paths from environment variables are validated before use
  • Path traversal attempts (../) are rejected
  • Only files with expected extensions (.so, .dylib, .dll) are accepted

For additional security, use #find_library_path_safe which only returns
paths from trusted system directories.

Examples:

Basic usage

finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path
# => "/usr/lib/libtree-sitter-toml.so"

Check availability

finder = TreeHaver::GrammarFinder.new(:json)
if finder.available?
  language = TreeHaver::Language.load(finder.language_name, finder.find_library_path)
end

Register with TreeHaver

finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?
# Now you can use: TreeHaver::Language.bash

With custom search paths

finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: ["/opt/custom/lib"])

Secure mode (trusted directories only)

finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path_safe  # Only returns paths in trusted dirs

See Also:

Constant Summary collapse

BASE_SEARCH_DIRS =

Common base directories where tree-sitter libraries are installed
Platform-specific extensions are appended automatically

[
  "/usr/lib",
  "/usr/lib64",
  "/usr/local/lib",
  "/opt/homebrew/lib",
].freeze
TREE_SITTER_BACKENDS =

Backends that use tree-sitter (require native runtime libraries)
Other backends (Citrus, Prism, Psych, etc.) don’t use tree-sitter

[
  TreeHaver::Backends::MRI,
  TreeHaver::Backends::FFI,
  TreeHaver::Backends::Rust,
  TreeHaver::Backends::Java,
].freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(language_name, extra_paths: [], validate: true) ⇒ GrammarFinder

Initialize a grammar finder for a specific language

Parameters:

  • language_name (Symbol, String)

    the tree-sitter language name (e.g., :toml, :json, :bash)

  • extra_paths (Array<String>) (defaults to: [])

    additional paths to search (searched first after ENV)

  • validate (Boolean) (defaults to: true)

    if true, validates the language name (default: true)

Raises:

  • (ArgumentError)

    if language_name is invalid and validate is true



75
76
77
78
79
80
81
82
83
84
85
# File 'lib/tree_haver/grammar_finder.rb', line 75

def initialize(language_name, extra_paths: [], validate: true)
  name_str = language_name.to_s.downcase

  if validate && !PathValidator.safe_language_name?(name_str)
    raise ArgumentError, "Invalid language name: #{language_name.inspect}. " \
      "Language names must start with a letter and contain only lowercase letters, numbers, and underscores."
  end

  @language_name = name_str.to_sym
  @extra_paths = Array(extra_paths)
end

Instance Attribute Details

#extra_pathsArray<String> (readonly)

Returns additional search paths provided at initialization.

Returns:

  • (Array<String>)

    additional search paths provided at initialization



67
68
69
# File 'lib/tree_haver/grammar_finder.rb', line 67

def extra_paths
  @extra_paths
end

#language_nameSymbol (readonly)

Returns the language identifier.

Returns:

  • (Symbol)

    the language identifier



64
65
66
# File 'lib/tree_haver/grammar_finder.rb', line 64

def language_name
  @language_name
end

Class Method Details

.reset_runtime_check!Object

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Reset the cached tree-sitter runtime check (for testing)



284
285
286
# File 'lib/tree_haver/grammar_finder.rb', line 284

def reset_runtime_check!
  remove_instance_variable(:@tree_sitter_runtime_usable) if defined?(@tree_sitter_runtime_usable)
end

.tree_sitter_runtime_usable?Boolean

Check if the tree-sitter runtime is usable

Tests whether we can actually create a tree-sitter parser.
Result is cached since this is expensive and won’t change during runtime.

Returns:

  • (Boolean)

    true if tree-sitter runtime is functional



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/tree_haver/grammar_finder.rb', line 259

def tree_sitter_runtime_usable?
  return @tree_sitter_runtime_usable if defined?(@tree_sitter_runtime_usable)

  @tree_sitter_runtime_usable = begin
    # Try to create a parser using the current backend
    mod = TreeHaver.resolve_backend_module(nil)

    # Only tree-sitter backends are relevant here
    # Non-tree-sitter backends (Citrus, Prism, Psych, etc.) don't use grammar files
    if mod.nil? || !TREE_SITTER_BACKENDS.include?(mod)
      false
    else
      # Try to instantiate a parser - this will fail if runtime isn't available
      mod::Parser.new
      true
    end
  rescue NoMethodError, LoadError, NotAvailable => _e
    # Note: FFI::NotFoundError inherits from LoadError, so it's caught here too
    false
  end
end

Instance Method Details

#available?Boolean

Check if the grammar library is available AND usable

This checks:

  1. The grammar library file exists
  2. The tree-sitter runtime is functional (can create a parser)

This prevents registering grammars when tree-sitter isn’t actually usable,
allowing clean fallback to alternative backends like Citrus.

Returns:

  • (Boolean)

    true if the library can be found AND tree-sitter runtime works



234
235
236
237
238
239
240
241
# File 'lib/tree_haver/grammar_finder.rb', line 234

def available?
  path = find_library_path
  return false if path.nil?

  # Check if tree-sitter runtime is actually functional
  # This is cached at the class level since it's the same for all grammars
  self.class.tree_sitter_runtime_usable?
end

#available_safe?Boolean

Check if the grammar library is available in a trusted directory

Returns:

  • (Boolean)

    true if the library can be found in a trusted directory

See Also:



293
294
295
# File 'lib/tree_haver/grammar_finder.rb', line 293

def available_safe?
  !find_library_path_safe.nil?
end

#env_var_nameString

Get the environment variable name for this language

Returns:

  • (String)

    the ENV var name (e.g., “TREE_SITTER_TOML_PATH”)



90
91
92
# File 'lib/tree_haver/grammar_finder.rb', line 90

def env_var_name
  "TREE_SITTER_#{@language_name.to_s.upcase}_PATH"
end

#find_library_pathString?

Note:

Paths from ENV are validated using PathValidator.safe_library_path?
to prevent path traversal and other attacks. Invalid ENV paths cause
an error to be raised (Principle of Least Surprise - explicit paths must work).

Note:

Setting the ENV variable to an empty string explicitly disables
this grammar. This allows fallback to alternative backends (e.g., Citrus).

Find the grammar library path

Searches in order:

  1. Environment variable override (validated for safety)
  2. Extra paths provided at initialization
  3. Common system installation paths

Returns:

  • (String, nil)

    the path to the library, or nil if not found

Raises:

See Also:



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# File 'lib/tree_haver/grammar_finder.rb', line 147

def find_library_path
  # Check environment variable first (highest priority)
  # Use key? to distinguish between "not set" and "set to empty"
  env_var = env_var_name
  if ENV[env_var] || ENV.key?(env_var)
    env_path = ENV[env_var]

    # :nocov: defensive - ENV.key? true with nil value is rare edge case
    if env_path.nil?
      @env_rejection_reason = "explicitly disabled (set to nil)"
      return
    end
    # :nocov:

    # Empty string means "explicitly skip this grammar"
    # This allows users to disable tree-sitter for specific languages
    # and fall back to alternative backends like Citrus
    if env_path.empty?
      @env_rejection_reason = "explicitly disabled (set to empty string)"
      return
    end

    # Store why env path was rejected for better error messages
    @env_rejection_reason = validate_env_path(env_path)

    # Principle of Least Surprise: If user explicitly sets an ENV variable
    # to a path, that path MUST work. Don't silently fall back to auto-discovery.
    if @env_rejection_reason
      raise TreeHaver::NotAvailable,
        "#{env_var_name} is set to #{env_path.inspect} but #{@env_rejection_reason}. " \
          "Either fix the path, unset the variable to use auto-discovery, " \
          "or set it to empty string to explicitly disable this grammar."
    end

    return env_path
  end

  # Search all paths (these are constructed from trusted base dirs)
  search_paths.find { |path| File.exist?(path) }
end

#find_library_path_safeString?

Find the grammar library path with strict security validation

This method only returns paths that are in trusted system directories.
Use this when you want maximum security and don’t need to support
custom installation locations.

Returns:

  • (String, nil)

    the path to the library, or nil if not found

See Also:

  • For the list of trusted directories


217
218
219
220
221
222
# File 'lib/tree_haver/grammar_finder.rb', line 217

def find_library_path_safe
  # Environment variable is NOT checked in safe mode - only trusted system paths
  search_paths.find do |path|
    File.exist?(path) && PathValidator.in_trusted_directory?(path)
  end
end

#library_filenameString

Get the library filename for the current platform

Returns:

  • (String)

    the library filename (e.g., “libtree-sitter-toml.so”)



104
105
106
107
# File 'lib/tree_haver/grammar_finder.rb', line 104

def library_filename
  ext = platform_extension
  "libtree-sitter-#{@language_name}#{ext}"
end

#not_found_messageString

Get a human-readable error message when library is not found

Returns:

  • (String)

    error message with installation hints



339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
# File 'lib/tree_haver/grammar_finder.rb', line 339

def not_found_message
  msg = "tree-sitter #{@language_name} grammar not found."

  # Check if env var is set but rejected
  env_value = ENV[env_var_name]
  msg += if env_value && @env_rejection_reason
    " #{env_var_name} is set to #{env_value.inspect} but #{@env_rejection_reason}."
  elsif env_value && File.exist?(env_value) && !self.class.tree_sitter_runtime_usable?
    " #{env_var_name} is set and file exists, but no tree-sitter runtime is available. " \
      "Add ruby_tree_sitter, ffi, or tree_stump gem to your Gemfile."
  elsif env_value
    " #{env_var_name} is set but was not used (file may have been removed)."
  else
    " Searched: #{search_paths.join(", ")}."
  end

  msg + " Install tree-sitter-#{@language_name} or set #{env_var_name} to a valid path."
end

#register!(raise_on_missing: false) ⇒ Boolean

Register this language with TreeHaver

After registration, the language can be loaded via dynamic method
(e.g., TreeHaver::Language.toml).

Parameters:

  • raise_on_missing (Boolean) (defaults to: false)

    if true, raises when library not found

Returns:

  • (Boolean)

    true if registration succeeded

Raises:

  • (NotAvailable)

    if library not found and raise_on_missing is true



305
306
307
308
309
310
311
312
313
314
315
316
# File 'lib/tree_haver/grammar_finder.rb', line 305

def register!(raise_on_missing: false)
  path = find_library_path
  unless path
    if raise_on_missing
      raise NotAvailable, not_found_message
    end
    return false
  end

  TreeHaver.register_language(@language_name, path: path, symbol: symbol_name)
  true
end

#search_infoHash

Get debug information about the search

Returns:

  • (Hash)

    diagnostic information



321
322
323
324
325
326
327
328
329
330
331
332
333
334
# File 'lib/tree_haver/grammar_finder.rb', line 321

def search_info
  found = find_library_path # This populates @env_rejection_reason
  {
    language: @language_name,
    env_var: env_var_name,
    env_value: ENV[env_var_name],
    env_rejection_reason: @env_rejection_reason,
    symbol: symbol_name,
    library_filename: library_filename,
    search_paths: search_paths,
    found_path: found,
    available: !found.nil?,
  }
end

#search_pathsArray<String>

Generate the full list of search paths for this language

Order: ENV override, extra_paths, then common system paths

Returns:

  • (Array<String>)

    all paths to search



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/tree_haver/grammar_finder.rb', line 114

def search_paths
  paths = []

  # Extra paths provided at initialization (searched after ENV)
  @extra_paths.each do |dir|
    paths << File.join(dir, library_filename)
  end

  # Common system paths with platform-appropriate extension
  BASE_SEARCH_DIRS.each do |dir|
    paths << File.join(dir, library_filename)
  end

  paths
end

#symbol_nameString

Get the expected symbol name exported by the grammar library

Returns:

  • (String)

    the symbol name (e.g., “tree_sitter_toml”)



97
98
99
# File 'lib/tree_haver/grammar_finder.rb', line 97

def symbol_name
  "tree_sitter_#{@language_name}"
end

#validate_env_path(path) ⇒ String?

Validate an environment variable path and return reason if invalid

Returns:

  • (String, nil)

    rejection reason or nil if valid



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'lib/tree_haver/grammar_finder.rb', line 190

def validate_env_path(path)
  # Check for leading/trailing whitespace
  if path != path.strip
    return "contains leading or trailing whitespace (use #{path.strip.inspect})"
  end

  # Check if path is safe
  unless PathValidator.safe_library_path?(path)
    return "failed security validation (may contain path traversal or suspicious characters)"
  end

  # Check if file exists
  unless File.exist?(path)
    return "file does not exist"
  end

  nil # Valid!
end