DIY AI Part 9: Train your AI to analyze file activity and boost productivity

Matic Zorman/GettyImages

If you have been following along with our DIY AI project, you’ve come pretty far, and we’re about ready to start adding some real artificial intelligence training and put it to use. From there, you will be able to modify it in many ways that suit your needs.

Where we’re at

Hopefully, you have been following along with the guide, and you have a script running that calls another script to scan your Desktop to make note of any changes and store important information about the files.

If you let the scanning script run several times, there should be enough information to use to train AI. In this case, we are going to train it to look for file activity to learn when we are actively using our computer. It’s a simple idea that can help with scheduling and more.

What do we do next?

In this last part, we will download the table we have been updating,

Transform the data so the AI can use it

Train our AI

Have the AI make an informed prediction based on the training.

Download the table

1. Retrieve data from the File-tracking table to use as a dataset for training your model.

2. Open your database in Microsoft Access.

3. Go to the Create tab and click on Query Design.

4. Switch to SQL View in the Query Design window.

5. Paste the SQL query above into the SQL editor:

6. SELECT path, size_bytes, last_modified, accessed_at, usage_count FROM Your_Table_Name WHERE accessed_at IS NOT NULL;

7. Run the query (click the red exclamation mark in the ribbon).

8. Save the results:

9. Go to External Data > Export > Text File.

10. Save the query result as a CSV file (e.g., file_activity.csv).

Transform the data

In this step, we will prepare the exported data for use in a machine-learning model. The following script will extract the hour and day from the accessed_at field. It will mark you as active at the time of file access and mark hours that you did not create or access files as inactive.

data_transformation.py

import pandas as pd

# Convert accessed_at to datetime

data['accessed_at'] = pd.to_datetime(data['accessed_at'])

To insert a text box click here

# Extract hour and day features

data['hour'] = data['accessed_at'].dt.hour

data['day'] = data['accessed_at'].dt.weekday

To insert a text box click here

# Define activity labels: 1 = Active, 0 = Inactive

data['active'] = (data['usage_count'] > 0).astype(int)

To insert a text box click here

# Save the transformed data for model training

data.to_csv("transformed_file_activity.csv", index=False)

print("Data transformation complete. Saved to 'transformed_file_activity.csv'.")


To insert a text box click here

Train the Model

To insert a text box click here

Next, you will run this model training script on the data that you just transformed to train the AI on your specific information. Before running this script, you will need to install scikit-learn to your Python library with the following code:

To insert a text box click here

pip install scikit-learn

To insert a text box click here

model_train.py

To insert a text box click here

from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split

import pandas as pd

import joblib

# Load the transformed data

data = pd.read_csv(r"Path\to\transformed_file_activity.csv")

# Prepare features (X) and labels (y)

X = data[['hour', 'day', 'usage_count']] # Features: hour, day, usage_count

y = data['active'] # Label: active (1 = Active, 0 = Inactive)

# Split data into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the Random Forest model

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

# Evaluate the model (optional but recommended)

accuracy = model.score(X_test, y_test)

print(f"Model accuracy on test data: {accuracy:.2f}")

# Save the trained model

joblib.dump(model, "work_activity_predictor.joblib")

print("Model training complete. Model saved as 'work_activity_predictor.joblib'.")

Finally, we can have the AI predict when we will be busy in the coming days by running the following script:

predictions.py

import pandas as pd

import joblib

import pyodbc

from datetime import datetime

# Database connection string

DB_PATH = r"Path\to\database\MyDatabase.accdb" # Update with your file path

CONN_STR = f"DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={DB_PATH};"

def fetch_data_from_db():

"""

Fetches real data from FileTrack table for prediction.

"""

conn = pyodbc.connect(CONN_STR)

query = """

SELECT size_bytes, last_modified, accessed_at, usage_count

FROM Your_Table_Name

WHERE accessed_at IS NOT NULL

"""

data = pd.read_sql(query, conn)

conn.close()

# Convert datetime columns to appropriate formats

data['accessed_at'] = pd.to_datetime(data['accessed_at'])

data['hour'] = data['accessed_at'].dt.hour

data['day'] = data['accessed_at'].dt.weekday

return data[['hour', 'day', 'usage_count']] # Return features for prediction

# Load the trained model

model = joblib.load(r"Path\to\work_activity_predictor.joblib")

# Fetch data from the database

real_data = fetch_data_from_db()

# Make predictions

real_data['active'] = model.predict(real_data)

# Display the results

print("Predictions on real data:")

print(real_data)

# Save the results to a CSV for review

real_data.to_csv("real_data_predictions.csv", index=False)

print("Predictions saved to 'real_data_predictions.csv'.")

What next

You will want to repeat the entire process every few weeks. Doing so will allow your AI to learn about you over time, which will make it more accurate in predicting when you will be creating and accessing files, which, for many of us, means we are working. While it won’t be as accurate as a time clock, it can provide amazing insight into what you do each day and what projects you tend to spend the most time on. It also only touches on what you can do with the information you are already collecting.

Hopefully, you will get some ideas on ways to expand the project, an. We will likely add to it here at GeekSided as well, but since this is a modular project, we won’t need to start at the beginning each time.

We hope you learned something and had some fun.

Follow GeekSided for more great projects in programming.