If you have been following along with our DIY AI project, you’ve come pretty far, and we’re about ready to start adding some real artificial intelligence training and put it to use. From there, you will be able to modify it in many ways that suit your needs.
Where we’re at
Hopefully, you have been following along with the guide, and you have a script running that calls another script to scan your Desktop to make note of any changes and store important information about the files.
If you let the scanning script run several times, there should be enough information to use to train AI. In this case, we are going to train it to look for file activity to learn when we are actively using our computer. It’s a simple idea that can help with scheduling and more.
What do we do next?
In this last part, we will download the table we have been updating,
Transform the data so the AI can use it
Train our AI
Have the AI make an informed prediction based on the training.
Download the table
1. Retrieve data from the File-tracking table to use as a dataset for training your model.
2. Open your database in Microsoft Access.
3. Go to the Create tab and click on Query Design.
4. Switch to SQL View in the Query Design window.
5. Paste the SQL query above into the SQL editor:
6. SELECT path, size_bytes, last_modified, accessed_at, usage_count FROM Your_Table_Name WHERE accessed_at IS NOT NULL;
7. Run the query (click the red exclamation mark in the ribbon).
8. Save the results:
9. Go to External Data > Export > Text File.
10. Save the query result as a CSV file (e.g., file_activity.csv).
Transform the data
In this step, we will prepare the exported data for use in a machine-learning model. The following script will extract the hour and day from the accessed_at field. It will mark you as active at the time of file access and mark hours that you did not create or access files as inactive.
data_transformation.py
import pandas as pd
# Convert accessed_at to datetime
data['accessed_at'] = pd.to_datetime(data['accessed_at'])
To insert a text box click here
# Extract hour and day features
data['hour'] = data['accessed_at'].dt.hour
data['day'] = data['accessed_at'].dt.weekday
To insert a text box click here
# Define activity labels: 1 = Active, 0 = Inactive
data['active'] = (data['usage_count'] > 0).astype(int)
To insert a text box click here
# Save the transformed data for model training
data.to_csv("transformed_file_activity.csv", index=False)
print("Data transformation complete. Saved to 'transformed_file_activity.csv'.")
To insert a text box click here
Train the Model
To insert a text box click here
Next, you will run this model training script on the data that you just transformed to train the AI on your specific information. Before running this script, you will need to install scikit-learn to your Python library with the following code:
To insert a text box click here
pip install scikit-learn
To insert a text box click here
model_train.py
To insert a text box click here
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import joblib
# Load the transformed data
data = pd.read_csv(r"Path\to\transformed_file_activity.csv")
# Prepare features (X) and labels (y)
X = data[['hour', 'day', 'usage_count']] # Features: hour, day, usage_count
y = data['active'] # Label: active (1 = Active, 0 = Inactive)
# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the Random Forest model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Evaluate the model (optional but recommended)
accuracy = model.score(X_test, y_test)
print(f"Model accuracy on test data: {accuracy:.2f}")
# Save the trained model
joblib.dump(model, "work_activity_predictor.joblib")
print("Model training complete. Model saved as 'work_activity_predictor.joblib'.")
Finally, we can have the AI predict when we will be busy in the coming days by running the following script:
predictions.py
import pandas as pd
import joblib
import pyodbc
from datetime import datetime
# Database connection string
DB_PATH = r"Path\to\database\MyDatabase.accdb" # Update with your file path
CONN_STR = f"DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={DB_PATH};"
def fetch_data_from_db():
"""
Fetches real data from FileTrack table for prediction.
"""
conn = pyodbc.connect(CONN_STR)
query = """
SELECT size_bytes, last_modified, accessed_at, usage_count
FROM Your_Table_Name
WHERE accessed_at IS NOT NULL
"""
data = pd.read_sql(query, conn)
conn.close()
# Convert datetime columns to appropriate formats
data['accessed_at'] = pd.to_datetime(data['accessed_at'])
data['hour'] = data['accessed_at'].dt.hour
data['day'] = data['accessed_at'].dt.weekday
return data[['hour', 'day', 'usage_count']] # Return features for prediction
# Load the trained model
model = joblib.load(r"Path\to\work_activity_predictor.joblib")
# Fetch data from the database
real_data = fetch_data_from_db()
# Make predictions
real_data['active'] = model.predict(real_data)
# Display the results
print("Predictions on real data:")
print(real_data)
# Save the results to a CSV for review
real_data.to_csv("real_data_predictions.csv", index=False)
print("Predictions saved to 'real_data_predictions.csv'.")
What next
You will want to repeat the entire process every few weeks. Doing so will allow your AI to learn about you over time, which will make it more accurate in predicting when you will be creating and accessing files, which, for many of us, means we are working. While it won’t be as accurate as a time clock, it can provide amazing insight into what you do each day and what projects you tend to spend the most time on. It also only touches on what you can do with the information you are already collecting.
Hopefully, you will get some ideas on ways to expand the project, an. We will likely add to it here at GeekSided as well, but since this is a modular project, we won’t need to start at the beginning each time.
We hope you learned something and had some fun.