SpamScore: AI-Powered NFT Collection Verification at Rarible

Overview

SpamScore is a specialized microservice subsystem within Rarible's infrastructure designed to automatically identify and flag spam and scam NFT collections. By leveraging advanced AI and machine learning techniques, it provides a quantitative assessment of whether a collection is legitimate or potentially fraudulent.

How It Works

The Scoring System

SpamScore assigns each NFT collection a numerical value between 0 and 1:

0.00 - 0.79: Legitimate collection
≥ 0.80: Flagged as spam/scam

Any collection scoring 0.80 or higher is considered a scam and may be filtered or flagged in the Rarible platform.

Technical Architecture

The system is built as a microservice with the following key components:

1. AI Foundation

Model: OpenAI's GPT-4o-mini
Purpose: Analyzes collection characteristics and makes intelligent classifications

2. Vector Embeddings

Storage: pgvector (PostgreSQL extension)
Model: OpenAI's text-embedding-3-small
Purpose: Creates semantic representations of collections for similarity matching

3. Data Collection

For each collection, the system analyzes:

Collection name
Symbol
Description
Creator address

The RAG Approach

SpamScore implements a Retrieval-Augmented Generation (RAG) architecture, which combines:

Training Dataset

The system was trained on 1,000 reference collections:

500 "good" collections: Verified legitimate collections with SpamScore = 0
500 "bad" collections: Known spam/scam collections with SpamScore ≥ 0.80

Process Flow

Vectorization: All reference collections are converted into embeddings and stored in pgvector
New Collection Evaluation: When a new collection is detected:
- Extract its name, symbol, description, and creator
- Generate embeddings for the collection
- Retrieve similar collections from the vector database
- Use GPT-4o-mini to analyze similarities and patterns
- Assign a spam score based on the analysis
Continuous Monitoring: Every new collection on the platform is automatically evaluated

Why RAG?

The RAG approach offers several advantages:

Context-Aware: Compares new collections against known good and bad examples
Semantic Understanding: Goes beyond keyword matching to understand meaning and intent
Scalable: Can handle high volumes of new collections efficiently
Adaptive: Can be updated with new examples as spam tactics evolve

Impact

By automating spam detection, SpamScore helps:

Protect users from scam collections
Maintain platform quality
Reduce manual moderation overhead
Provide a better user experience on Rarible

Technical Stack Summary

Component	Technology
AI Model	GPT-4o-mini
Embeddings	text-embedding-3-small
Vector Storage	pgvector
Architecture	Microservice + RAG
Evaluation	Real-time, automated

Note: This system represents a practical application of modern AI techniques (LLMs, embeddings, RAG) to solve real-world problems in the Web3 space.

SpamScore: AI-Powered NFT Collection Verification at Rarible ​

Overview ​

How It Works ​

The Scoring System ​

Technical Architecture ​

1. AI Foundation ​

2. Vector Embeddings ​

3. Data Collection ​

The RAG Approach ​

Training Dataset ​

Process Flow ​

Why RAG? ​

Impact ​

Technical Stack Summary ​