# Internet Domain Database This project maintains two separate tables for internet domain data: - `domain_root`: Top-Level Domains (TLDs) from IANA - `domain_suffix`: Public Suffix List from Mozilla ## Database Connection - **Host**: l2 - **Port**: 3306 - **User**: root - **Database**: sp_spider ## Table Structure ### domain_root Contains IANA TLD data with unique root domains and soft delete functionality. ```sql CREATE TABLE domain_root ( id INT AUTO_INCREMENT PRIMARY KEY, root VARCHAR(63) NOT NULL UNIQUE, removed BOOLEAN DEFAULT FALSE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, INDEX idx_root (root), INDEX idx_removed (removed) ); ``` ### domain_suffix Contains Public Suffix List data with unique suffixes and soft delete functionality. ```sql CREATE TABLE domain_suffix ( id INT AUTO_INCREMENT PRIMARY KEY, suffix VARCHAR(255) NOT NULL UNIQUE, removed BOOLEAN DEFAULT FALSE, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, INDEX idx_suffix (suffix), INDEX idx_removed (removed) ); ``` ## Setup 1. Install dependencies: ```bash pip install -r requirements.txt ``` 2. Create the tables: ```bash mariadb -h l2 -u root -p0000 --ssl=FALSE sp_spider < create_separate_tables.sql ``` 3. Add the removed column (for existing installations): ```bash mariadb -h l2 -u root -p0000 --ssl=FALSE sp_spider < add_removed_column.sql ``` 4. Populate the tables: ```bash python populate_separate_tables.py 0000 ``` 5. Update tables with soft delete functionality: ```bash python update.py 0000 ``` ## Data Sources - **TLD Data**: https://data.iana.org/TLD/tlds-alpha-by-domain.txt - Contains official Top-Level Domains - Currently: 1,436 TLDs - **Public Suffix List**: https://publicsuffix.org/list/public_suffix_list.dat - Contains public domain suffixes including TLDs, ccTLDs, and private domain suffixes - Currently: 10,067 suffixes ## Usage Query active TLDs: ```sql SELECT * FROM domain_root WHERE removed = FALSE AND root = 'com'; ``` Query removed TLDs: ```sql SELECT * FROM domain_root WHERE removed = TRUE; ``` Query active suffixes: ```sql SELECT * FROM domain_suffix WHERE removed = FALSE AND suffix LIKE '%.com'; ``` Query removed suffixes: ```sql SELECT * FROM domain_suffix WHERE removed = TRUE; ``` Get statistics: ```sql SELECT 'domain_root' as table_name, COUNT(*) as total, SUM(CASE WHEN removed = FALSE THEN 1 ELSE 0 END) as active, SUM(CASE WHEN removed = TRUE THEN 1 ELSE 0 END) as removed FROM domain_root UNION ALL SELECT 'domain_suffix' as table_name, COUNT(*) as total, SUM(CASE WHEN removed = FALSE THEN 1 ELSE 0 END) as active, SUM(CASE WHEN removed = TRUE THEN 1 ELSE 0 END) as removed FROM domain_suffix; ``` ## Project Files - `create_separate_tables.sql` - Table creation script - `add_removed_column.sql` - Script to add removed column for soft delete functionality - `populate_separate_tables.py` - Initial data population script - `update.py` - Update script with soft delete functionality - `requirements.txt` - Python dependencies - `README.md` - This documentation ## Soft Delete Functionality The `update.py` script provides soft delete functionality that: 1. **Marks entries as removed**: Sets `removed = TRUE` for entries no longer found in source data 2. **Adds new entries**: Inserts new entries from source with `removed = FALSE` 3. **Restores entries**: Sets `removed = FALSE` for previously removed entries that reappear in source 4. **Provides statistics**: Shows counts of active, removed, new, and restored entries ### Update Process The update script: - Fetches latest data from IANA TLD and Public Suffix List - Compares with current database entries - Performs batch updates for efficiency - Handles duplicate entries gracefully with `INSERT IGNORE` - Updates `updated_at` timestamp for all changes Run the update script periodically to keep the database synchronized with source data.