SpamScore: AI-Powered NFT Collection Verification at Rarible
Overview
SpamScore is a specialized microservice subsystem within Rarible's infrastructure designed to automatically identify and flag spam and scam NFT collections. By leveraging advanced AI and machine learning techniques, it provides a quantitative assessment of whether a collection is legitimate or potentially fraudulent.
How It Works
The Scoring System
SpamScore assigns each NFT collection a numerical value between 0 and 1:
- 0.00 - 0.79: Legitimate collection
- ≥ 0.80: Flagged as spam/scam
Any collection scoring 0.80 or higher is considered a scam and may be filtered or flagged in the Rarible platform.
Technical Architecture
The system is built as a microservice with the following key components:
1. AI Foundation
- Model: OpenAI's GPT-4o-mini
- Purpose: Analyzes collection characteristics and makes intelligent classifications
2. Vector Embeddings
- Storage: pgvector (PostgreSQL extension)
- Model: OpenAI's text-embedding-3-small
- Purpose: Creates semantic representations of collections for similarity matching
3. Data Collection
For each collection, the system analyzes:
- Collection name
- Symbol
- Description
- Creator address

The RAG Approach
SpamScore implements a Retrieval-Augmented Generation (RAG) architecture, which combines:
Training Dataset
The system was trained on 1,000 reference collections:
- 500 "good" collections: Verified legitimate collections with SpamScore = 0
- 500 "bad" collections: Known spam/scam collections with SpamScore ≥ 0.80
Process Flow
Vectorization: All reference collections are converted into embeddings and stored in pgvector
New Collection Evaluation: When a new collection is detected:
- Extract its name, symbol, description, and creator
- Generate embeddings for the collection
- Retrieve similar collections from the vector database
- Use GPT-4o-mini to analyze similarities and patterns
- Assign a spam score based on the analysis
Continuous Monitoring: Every new collection on the platform is automatically evaluated
Why RAG?
The RAG approach offers several advantages:
- Context-Aware: Compares new collections against known good and bad examples
- Semantic Understanding: Goes beyond keyword matching to understand meaning and intent
- Scalable: Can handle high volumes of new collections efficiently
- Adaptive: Can be updated with new examples as spam tactics evolve
Impact
By automating spam detection, SpamScore helps:
- Protect users from scam collections
- Maintain platform quality
- Reduce manual moderation overhead
- Provide a better user experience on Rarible
Technical Stack Summary
| Component | Technology |
|---|---|
| AI Model | GPT-4o-mini |
| Embeddings | text-embedding-3-small |
| Vector Storage | pgvector |
| Architecture | Microservice + RAG |
| Evaluation | Real-time, automated |
Note: This system represents a practical application of modern AI techniques (LLMs, embeddings, RAG) to solve real-world problems in the Web3 space.