If you have been following along with our DIY AI project, you should now have a database that regularly scans the Desktop to make a list of the files. This will help us a lot with getting organized, and it’s likely that you already started organizing the files yourself, which is a good thing.
In this guide, we’ll add a few fields to our database table and modify the script to keep track of when we access the files and how often. This addition will help organize files and provide information that our AI can use to learn about our habits and file usage, which will be helpful later.
Before you begin
It’s a good idea to start at the beginning of this guide because we cover a lot of topics, including installing the main Python libraries, setting up the development environment, and creating the schedular script.
In the last guide, we created a database table to store file information and Python, a script that regularly scans the Desktop (or any file) and stores any changes in that table. If you find that the scanning process is taking too long, it’s a good idea to try moving some files to a better location to lighten the workload.
The new fields
accessed_at
In this guide, we will add two fields to our database table. The first is the accessed_at field, which tracks the last time you accessed a file and helps you identify recently used files. This information will be useful for determining active files versus idle files. Use “Date & Time (DATETIME)” in Microsoft Access for the accessed_at field.
Usage_count
The second field we need to add is the usage_count. This field will be a “Number (INTEGER)” in Microsoft Access. The usage_count field is a powerful metric that tracks how frequently a file has been accessed or modified. It can help you (and the AI) understand which files are important to your workflow. Files with high usage_count are likely critical to your daily tasks or long-term projects, while files with low or zero usage_count might be candidates for archiving or deletion.
You can prioritize files with high usage_count for deeper analysis, previews, or metadata extraction, and you can use this information to focus your resources better when scanning, training, etc.
Updating your script
Once you have the fields added to your table, updating the system requires changing a few lines of code in your intermittent_scan.py file. You won’t even need to redo a complete scan.
Complete code
Here is the updated code, with some small changes in lines 55 – 85.
import os
from datetime import datetime
from pathlib import Path
import pyodbc
from common_utilities import get_connection, compute_hash, file_metadata, DIRECTORY_TO_SCAN
# Directory to skip
SKIP_DIR = Path(r"C:\Users\yourname\Desktop\Directory_To_Skip")
def sanitize(value: str) -> str:
# Double up any single quotes to avoid SQL errors
return value.replace("'", "''")
if __name__ == "__main__":
conn = get_connection()
cur = conn.cursor()
now = datetime.now()
now_str = now.strftime("%Y-%m-%d %H:%M:%S")
now_str_sanitized = sanitize(now_str)
# Get known files
cur.execute("SELECT path, size_bytes, last_modified, hash FROM Your_Database_Table")
known_files = {}
for row in cur.fetchall():
# row.path, row.size_bytes, row.last_modified, row.hash are TEXT
# Store them in a dict
known_files = (row.size_bytes, row.last_modified, row.hash)
current_files = []
for root, dirs, files in os.walk(DIRECTORY_TO_SCAN):
root_path = Path(root)
# Remove SKIP_DIR if encountered
dirs[:] =
for fname in files:
file_path = root_path / fname
current_files.append(str(file_path))
file_name, ext, size, last_mod = file_metadata(file_path)
# Convert to strings
file_name_str = sanitize(file_name)
ext_str = sanitize(ext)
size_str = str(size)
last_mod_str = last_mod.strftime("%Y-%m-%d %H:%M:%S")
last_mod_str_sanitized = sanitize(last_mod_str)
file_hash = compute_hash(file_path)
file_hash_str = sanitize(file_hash)
path_str = sanitize(str(file_path))
if str(file_path) in known_files:
old_size, old_last_mod, old_hash = known_files
# Compare as strings
if (size_str != old_size) or (last_mod_str != old_last_mod):
# File changed, re-hash done above
# Update record
sql = f"""
UPDATE Your_Database_Table
SET size_bytes='{size_str}',
last_modified='{last_mod_str_sanitized}',
hash='{file_hash_str}',
accessed_at='{now_str_sanitized}',
usage_count = usage_count + 1,
scanned_at='{now_str_sanitized}'
WHERE path='{path_str}'
"""
cur.execute(sql)
else:
# Unchanged, just update scanned_at
sql = f"""
UPDATE Your_Database_Table
SET scanned_at='{now_str_sanitized}'
WHERE path='{path_str}'
"""
cur.execute(sql)
else:
# New file, insert
sql = f"""
INSERT INTO Your_Database_Table (
path, file_name, extension, size_bytes, last_modified, hash, created_at, scanned_at, accessed_at, usage_count
) VALUES (
'{path_str}', '{file_name_str}', '{ext_str}', '{size_str}', '{last_mod_str_sanitized}', '{file_hash_str}', '{now_str_sanitized}', '{now_str_sanitized}', '{now_str_sanitized}', 1
)
"""
try:
cur.execute(sql)
except pyodbc.IntegrityError:
# If file path already exists, skip
pass
# Optional: handle deleted files if needed
# missing_files = set(known_files.keys()) - set(current_files)
# For now, do nothing with deleted files.
conn.commit()
conn.close()
Summary
After making these changes to the code, we’ll let it run several more times to start collecting the information we need next.