Data Profiler
Upload a file and get an instant health report — column types, missing values, uniqueness, outliers, and more.
Drag & drop your file here
or click to browse
What is a Data Profiler?
A data profiler gives you an instant health report on your dataset. It inspects column types, missing value percentages, uniqueness stats, duplicate detection, and outlier identification. Profiling is the first step in any data scrubbing workflow because it tells you exactly what needs fixing.
Why Profile Your Data First
You cannot fix what you cannot see. A profiler reveals hidden issues that silently corrupt your analysis: columns with 30% missing values, text mixed into number columns, date formats that do not parse, and duplicate rows you never knew existed. Running the Data Profiler before any other tool saves time and prevents mistakes.
How to Use the Data Profiler
- Upload your file — drag and drop or click to browse.
- Review the instant quality score — this gives you a quick snapshot of overall data health.
- Check per-column stats — see types, completeness, uniqueness, and distributions.
- Identify issues — look for mixed types, high missingness, and outliers.
- Jump to the right tool — use the Profiler findings to pick the best cleaning tool.
What the Profiler Detects
- Column types — number, text, date, email, boolean, or mixed.
- Completeness — percentage of missing values per column.
- Uniqueness — how many distinct values exist and how often they repeat.
- Duplicate rows — exact duplicates detected across all columns.
- Numeric stats — min, max, mean, and median for number columns.
- Outliers — values outside 1.5x IQR flagged automatically.
- Top value distributions — the most common values in each column.
Tips for Better Profiling
- Always profile before cleaning — you need a baseline before making changes.
- Watch for "mixed" type columns — these almost always need attention before analysis.
- Check the quality score trend — if it drops after a transformation, something went wrong.
Frequently Asked Questions
What does the quality score mean?
The quality score is a number from 0 to 100 that summarizes overall data health. It factors in completeness (missing values), consistency (mixed types), and uniqueness (duplicates). A score above 80 is good, 50 to 80 is fair, and below 50 needs attention.
How are outliers detected?
Outliers are detected using the Interquartile Range (IQR) method. Values below Q1 minus 1.5 times IQR or above Q3 plus 1.5 times IQR are flagged. This is a standard statistical approach that works well for most datasets.
Can I profile large files?
Yes. DataScrub processes everything in your browser, so file size is limited by your device memory. Files with up to a few hundred thousand rows typically profile without issues on modern hardware.
What does 'mixed' type mean?
A column is labeled mixed when it contains values of different types — for example, numbers and text in the same column. This usually indicates a data entry problem or inconsistent formatting that should be resolved before analysis.