1. Data Table Structure Assumption
Assume we have a table named users with a column named email that stores users' email addresses.
2. SQL Query
To count the frequency of each unique domain, we can use the following SQL query:
sqlSELECT SUBSTRING_INDEX(email, '@', -1) AS domain, COUNT(*) AS count FROM users GROUP BY domain ORDER BY count DESC;
3. Query Explanation
- SUBSTRING_INDEX(email, '@', -1): This function extracts the domain from the
emailfield.@serves as the delimiter, and-1specifies extracting from the right until the@is encountered, thereby isolating the domain portion for each email address. - COUNT(*): This function counts the number of email addresses for each domain.
- GROUP BY domain: This clause groups results by the domain, ensuring identical domains are aggregated so that
COUNT(*)can accurately tally the email addresses per domain. - ORDER BY count DESC: This sorting orders the output in descending order of the count, enabling quick identification of the most frequently occurring domain.
4. Example
Assume the users table contains the following data:
shell| email | |--------------------| | user1@example.com | | user2@example.com | | user3@test.com | | user4@example.com | | user5@test.com |
After executing the above query, the result will be:
shell| domain | count | |------------|-------| | example.com| 3 | | test.com | 2 |
This result indicates that example.com appears 3 times and test.com appears 2 times.
By implementing this approach, we can efficiently extract and count unique domains from email addresses in large datasets, which is valuable for analyzing user email service provider distributions.
2024年6月29日 12:07 回复