Internet Domain Database
This project maintains two separate tables for internet domain data:
domain_root: Top-Level Domains (TLDs) from IANAdomain_suffix: Public Suffix List from Mozilla
Database Connection
- Host: l2
- Port: 3306
- User: root
- Database: sp_spider
Table Structure
domain_root
Contains IANA TLD data with unique root domains and soft delete functionality.
CREATE TABLE domain_root (
id INT AUTO_INCREMENT PRIMARY KEY,
root VARCHAR(63) NOT NULL UNIQUE,
removed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX idx_root (root),
INDEX idx_removed (removed)
);
domain_suffix
Contains Public Suffix List data with unique suffixes and soft delete functionality.
CREATE TABLE domain_suffix (
id INT AUTO_INCREMENT PRIMARY KEY,
suffix VARCHAR(255) NOT NULL UNIQUE,
removed BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
INDEX idx_suffix (suffix),
INDEX idx_removed (removed)
);
Setup
- Install dependencies:
pip install -r requirements.txt
- Create the tables:
mariadb -h l2 -u root -p0000 --ssl=FALSE sp_spider < create_separate_tables.sql
- Add the removed column (for existing installations):
mariadb -h l2 -u root -p0000 --ssl=FALSE sp_spider < add_removed_column.sql
- Populate the tables:
python populate_separate_tables.py 0000
- Update tables with soft delete functionality:
python update.py 0000
Data Sources
-
TLD Data: https://data.iana.org/TLD/tlds-alpha-by-domain.txt
- Contains official Top-Level Domains
- Currently: 1,436 TLDs
-
Public Suffix List: https://publicsuffix.org/list/public_suffix_list.dat
- Contains public domain suffixes including TLDs, ccTLDs, and private domain suffixes
- Currently: 10,067 suffixes
Usage
Query active TLDs:
SELECT * FROM domain_root WHERE removed = FALSE AND root = 'com';
Query removed TLDs:
SELECT * FROM domain_root WHERE removed = TRUE;
Query active suffixes:
SELECT * FROM domain_suffix WHERE removed = FALSE AND suffix LIKE '%.com';
Query removed suffixes:
SELECT * FROM domain_suffix WHERE removed = TRUE;
Get statistics:
SELECT
'domain_root' as table_name,
COUNT(*) as total,
SUM(CASE WHEN removed = FALSE THEN 1 ELSE 0 END) as active,
SUM(CASE WHEN removed = TRUE THEN 1 ELSE 0 END) as removed
FROM domain_root
UNION ALL
SELECT
'domain_suffix' as table_name,
COUNT(*) as total,
SUM(CASE WHEN removed = FALSE THEN 1 ELSE 0 END) as active,
SUM(CASE WHEN removed = TRUE THEN 1 ELSE 0 END) as removed
FROM domain_suffix;
Project Files
create_separate_tables.sql- Table creation scriptadd_removed_column.sql- Script to add removed column for soft delete functionalitypopulate_separate_tables.py- Initial data population scriptupdate.py- Update script with soft delete functionalityrequirements.txt- Python dependenciesREADME.md- This documentation
Soft Delete Functionality
The update.py script provides soft delete functionality that:
- Marks entries as removed: Sets
removed = TRUEfor entries no longer found in source data - Adds new entries: Inserts new entries from source with
removed = FALSE - Restores entries: Sets
removed = FALSEfor previously removed entries that reappear in source - Provides statistics: Shows counts of active, removed, new, and restored entries
Update Process
The update script:
- Fetches latest data from IANA TLD and Public Suffix List
- Compares with current database entries
- Performs batch updates for efficiency
- Handles duplicate entries gracefully with
INSERT IGNORE - Updates
updated_attimestamp for all changes
Run the update script periodically to keep the database synchronized with source data.
Description