Skip to main content

Testing Fuzzy Search - Quick Reference

Prerequisites

Make sure the migration is run:

bundle exec rails db:migrate

Verify the extension is enabled:

bundle exec rails runner "puts ActiveRecord::Base.connection.extension_enabled?('pg_trgm')"
# Should output: true

Quick Tests in Rails Console

# Start console
bundle exec rails console

# 1. Test exact match (should work with or without trigram)
Listings::Job.full_text_search("cashier").count

# 2. Test with typo (this is where trigram shines)
Listings::Job.full_text_search("casier").count

# 3. Compare results
exact = Listings::Job.full_text_search("cashier")
typo = Listings::Job.full_text_search("casier")

puts "Exact match results: #{exact.count}"
puts "Typo match results: #{typo.count}"
# Should be similar or identical

# 4. Test similarity threshold
# Calculate how similar two words are
ActiveRecord::Base.connection.execute(
"SELECT word_similarity('cashier', 'full time casier position') as score"
).first
# => {"score"=>"0.444444"} (above our 0.2 threshold, so it matches)

# 5. Test various typos
test_queries = [
"cashier", # Exact
"casier", # Missing 'h'
"cashir", # Wrong last letter
"cahsier", # Transposed letters
"cshier", # Missing letter
"casheer", # Different vowel
"keshier" # Different first letter (might not match)
]

test_queries.each do |query|
count = Listings::Job.full_text_search(query).count
puts "#{query.ljust(15)} => #{count} results"
end

Testing Different Thresholds

If you want to experiment with different threshold values:

# Create a test scope with different threshold
class Listings::Job
pg_search_scope :strict_search,
against: [:company_name, :category_name, :title, :description, :description_summary],
using: {
tsearch: {
dictionary: 'english',
tsvector_column: :search_vector,
prefix: true
},
trigram: {
threshold: 0.4, # More strict
word_similarity: true
}
}
end

# Compare
Listings::Job.full_text_search("casier").count # Default (0.2)
Listings::Job.strict_search("casier").count # Strict (0.4)

Performance Testing

require 'benchmark'

# Test performance
Benchmark.ms do
Listings::Job.full_text_search("cashier").to_a
end
# Should be < 100ms for most queries

# Compare with LIKE query (don't use in production!)
Benchmark.ms do
Listings::Job.where("title LIKE ?", "%cashier%").to_a
end
# Usually slower than full_text_search

# Check query plan
Listings::Job.full_text_search("cashier").explain
# Should show index usage

Real-World Test Scenarios

# Scenario 1: User searches for common job titles with typos
[
{ query: "waiter", typo: "waiter" },
{ query: "waiter", typo: "waiter" },
{ query: "waiter", typo: "waiiter" },
{ query: "barista", typo: "barsta" },
{ query: "barista", typo: "barrista" }
].each do |test|
exact_count = Listings::Job.full_text_search(test[:query]).count
typo_count = Listings::Job.full_text_search(test[:typo]).count

puts "#{test[:query]} -> #{test[:typo]}: #{exact_count} vs #{typo_count} results"
end

# Scenario 2: Multi-word searches
Listings::Job.full_text_search("full time cashier").count
Listings::Job.full_text_search("ful time casier").count # With typos

# Scenario 3: Company name search with typos
Listings::Job.full_text_search("mcdonalds").count
Listings::Job.full_text_search("macdonalds").count # Common typo

# Scenario 4: Category search
Listings::Job.full_text_search("food service").count
Listings::Job.full_text_search("fod servce").count # With typos

Debugging Tips

Check if extension is working

# Test the similarity function directly
ActiveRecord::Base.connection.execute(
"SELECT similarity('cashier', 'casier')"
).first
# Should return a hash with 'similarity' key

# If you get an error, the extension isn't enabled

Check which columns are being searched

# Inspect the pg_search configuration
Listings::Job.pg_search_configuration[:full_text_search]
# Should show both tsearch and trigram configurations

Check if results are ranked correctly

# Get results with their ranking scores
results = Listings::Job.full_text_search("cashier")

results.each do |job|
puts "#{job.title} - Rank: #{job.pg_search_rank}"
end

# Higher scores = better matches
# Exact matches should have higher scores than fuzzy matches

Verify search_vector is populated

# Check if the tsvector column has data
job = Listings::Job.first
job.search_vector
# Should show something like: "'cashier':4 'experienc':3 'look':1"

# If nil, the search_vector needs to be populated
# (This should be handled by your data sync process)

Common Issues

Issue: Still getting no results with typos

Check:

# 1. Is the extension enabled?
ActiveRecord::Base.connection.extension_enabled?('pg_trgm')

# 2. Is the threshold too high?
# Try a lower threshold temporarily to test

# 3. What's the similarity score?
ActiveRecord::Base.connection.execute(
"SELECT word_similarity('casier', 'cashier') as score"
).first
# If score < 0.2, it won't match with default threshold

# 4. Are you searching the right columns?
# Check if the text exists in the 'against' columns
Listings::Job.where("title ILIKE ?", "%cashier%").count

Issue: Too many irrelevant results

Solution: Increase the threshold in the model:

trigram: {
threshold: 0.3, # Increase from 0.2 to 0.3
word_similarity: true
}

Issue: Queries are slow

Check:

# 1. Are indexes being used?
Listings::Job.full_text_search("cashier").explain
# Should show "Index Scan" not "Seq Scan"

# 2. How many rows are we searching?
Listings::Job.count
# If > 100k rows, consider adding trigram indexes

# 3. How many columns are in 'against'?
# Fewer columns = faster searches

Adjusting Configuration

If you need to adjust the configuration after testing:

  1. Edit /Users/alaay/jod/repo/jodapp-api/app/domains/listings/job.rb
  2. Change the threshold value
  3. Restart the Rails console
  4. Re-test
# Before
trigram: {
threshold: 0.2,
word_similarity: true
}

# After (example: more strict)
trigram: {
threshold: 0.3,
word_similarity: true
}

Success Criteria

✅ Your implementation is working if:

  1. Exact matches return results
  2. Typos (1-2 character differences) return similar results
  3. Results are ranked (exact matches higher than typos)
  4. Query time is < 100ms for most searches
  5. No false positives (completely unrelated results)

Next Steps After Testing

  1. Monitor in production: Track which search terms users actually use
  2. Analyze results: Are users getting relevant results?
  3. Tune threshold: Adjust based on user feedback
  4. Add analytics: Track search-to-click conversion
  5. Consider indexes: Add trigram indexes if queries are slow

Quick Reference

# Basic search
Listings::Job.full_text_search("query")

# With additional filters
Listings::Job.full_text_search("cashier").where(status: :open)

# With pagination
Listings::Job.full_text_search("cashier").page(1).per(20)

# Get ranked results
Listings::Job.full_text_search("cashier").with_pg_search_rank

# Order by rank
Listings::Job.full_text_search("cashier")
.with_pg_search_rank
.order("pg_search_rank DESC")