I was wondering if there is a way to find out the number of rows or number of files for tables registered in Unity Catalog. Is there a system table or a built-in function that provides this information more reliably?
You can try to Check for Empty Tables or Tables with No Files in Unity Catalog .
Unity Catalog doesn't have a built-in system table that directly shows the number of rows or files in each table. But you can write a notebook or script to get that information by combining metadata from system.information_schema.tables
with details pulled using the DESCRIBE DETAIL
command.
Below is the PySpark script that filters and prints only the tables that have either Zero records or zero fileSize:
I have created three tables, one with records and other two keeping empty to test the python code.
from pyspark.sql.functions import col
from delta.tables import DeltaTable
catalog = "my_catalog"
schema = "my_schema"
tables_df = spark.sql(f"""
SELECT table_name
FROM {catalog}.information_schema.tables
WHERE table_schema = '{schema}'
""")
tables = [row["table_name"] for row in tables_df.collect()]
for table in tables:
full_name = f"{catalog}.{schema}.{table}"
detail = spark.sql(f"DESCRIBE DETAIL {full_name}").collect()[0]
num_files = detail['numFiles']
size_bytes = detail['sizeInBytes']
if (num_files == 0 or num_files == 0):
print(f"{full_name}: numFiles={num_files}, sizeInBytes={size_bytes}")
So this code does the following:
-It lists all tables in a specified Unity Catalog catalog (my_catalog) and schema (my_schema) by querying the information_schema.tables view.
-For each table, it runs DESCRIBE DETAIL to retrieve metadata, specifically looking at the number of files (numFiles) and the total size on disk (sizeInBytes).
-It prints the table name and size details only if the table has zero files (numFiles == 0), which typically indicates that the table has no data stored.
Output: