Python + Dataverse Series – #06: Data preprocessing steps before running Machine Learning Algorithms

Hi Folks,

If you were already a Power Platform Consultant and new to working with Python, then I would encourage to start from the beginning of this series.

Now in this series, we entered an interesting part where Machine learning algorithms were run to analyze Dataverse Data and in this post we will understand why to scaling, feature scaling is a critical preprocessing step for many machine learning algorithms because it ensures that all features contribute equally to the model’s outcome, prevents numerical instability, and helps optimization algorithms converge faster to the optimal solution

Primarily before running any Machine Learning Algorithm, we need to do some data preprocessing like scaling the data, in this case we will use a formula which is used to scale using min–max normalization (feature scaling to the [0, 1] range).

#preprocessing step before running machine learning algorithms
from azure.identity import InteractiveBrowserCredential #using Interactive Login
from PowerPlatform.Dataverse.client import DataverseClient #installing Python SDK for Dataverse
import numpy as np #import Numpy Library to perform calculations
# Connect to Dataverse
credential = InteractiveBrowserCredential()
client = DataverseClient("https://ecellorsdev.crm8.dynamics.com", credential) #Creates Dataverse Client
# Fetch account data as paged batches
account_batches = client.get(
"account",
select=["accountid", "revenue"],
top=10,
) #Fetches top 10 accounts with accountid, revenue columns
revenues = []
for batch in account_batches:
for account in batch:
if "revenue" in account and account["revenue"] is not None:
revenues.append(account["revenue"])
revenues = np.array(revenues)
#Normalize the revenue
if len(revenues) > 0:
min_rev = np.min(revenues)
max_rev = np.max(revenues)
normalized_revenues = (revenues – min_rev) / (max_rev – min_rev)
print("Normalized Revenues:", normalized_revenues)
#visualize the result
import matplotlib.pyplot as plt
plt.plot(normalized_revenues, marker='o')
plt.title('Normalized Revenues from Dataverse Accounts')
plt.xlabel('Account Index')
plt.ylabel('Normalized Revenue')
plt.grid()
plt.show()

You can download the Python Notebook below if you want to work with VS Code

https://github.com/pavanmanideep/DataverseSDK_PythonSamples/blob/main/Python-PreProcessingStepBeforeMachineLearning.ipynb

Hope you found this useful…

Cheers,

PMDY