Skip to content

SpamScore: AI-Powered NFT Collection Verification at Rarible

Overview

SpamScore is a specialized microservice subsystem within Rarible's infrastructure designed to automatically identify and flag spam and scam NFT collections. By leveraging advanced AI and machine learning techniques, it provides a quantitative assessment of whether a collection is legitimate or potentially fraudulent.

How It Works

The Scoring System

SpamScore assigns each NFT collection a numerical value between 0 and 1:

  • 0.00 - 0.79: Legitimate collection
  • ≥ 0.80: Flagged as spam/scam

Any collection scoring 0.80 or higher is considered a scam and may be filtered or flagged in the Rarible platform.

Technical Architecture

The system is built as a microservice with the following key components:

1. AI Foundation

  • Model: OpenAI's GPT-4o-mini
  • Purpose: Analyzes collection characteristics and makes intelligent classifications

2. Vector Embeddings

  • Storage: pgvector (PostgreSQL extension)
  • Model: OpenAI's text-embedding-3-small
  • Purpose: Creates semantic representations of collections for similarity matching

3. Data Collection

For each collection, the system analyzes:

  • Collection name
  • Symbol
  • Description
  • Creator address

img1

The RAG Approach

SpamScore implements a Retrieval-Augmented Generation (RAG) architecture, which combines:

Training Dataset

The system was trained on 1,000 reference collections:

  • 500 "good" collections: Verified legitimate collections with SpamScore = 0
  • 500 "bad" collections: Known spam/scam collections with SpamScore ≥ 0.80

Process Flow

  1. Vectorization: All reference collections are converted into embeddings and stored in pgvector

  2. New Collection Evaluation: When a new collection is detected:

    • Extract its name, symbol, description, and creator
    • Generate embeddings for the collection
    • Retrieve similar collections from the vector database
    • Use GPT-4o-mini to analyze similarities and patterns
    • Assign a spam score based on the analysis
  3. Continuous Monitoring: Every new collection on the platform is automatically evaluated

Why RAG?

The RAG approach offers several advantages:

  • Context-Aware: Compares new collections against known good and bad examples
  • Semantic Understanding: Goes beyond keyword matching to understand meaning and intent
  • Scalable: Can handle high volumes of new collections efficiently
  • Adaptive: Can be updated with new examples as spam tactics evolve

Impact

By automating spam detection, SpamScore helps:

  • Protect users from scam collections
  • Maintain platform quality
  • Reduce manual moderation overhead
  • Provide a better user experience on Rarible

Technical Stack Summary

ComponentTechnology
AI ModelGPT-4o-mini
Embeddingstext-embedding-3-small
Vector Storagepgvector
ArchitectureMicroservice + RAG
EvaluationReal-time, automated

Note: This system represents a practical application of modern AI techniques (LLMs, embeddings, RAG) to solve real-world problems in the Web3 space.