Using Power BI dataflows for parsing and manipulating JSON data can be a daunting task due to various nested levels, evolving schemas, and sometimes large volumes of data. Tools for using JSON data are made available in Power BI in the form of Power Query, but these tools, when dealing with large data sizes, tend to fail. For proper management of JSON data, the following considerations are recommended:
Power Query First: The JSON structures can be difficult to work with. However, it is simple to use the power query editor available in Power BI. This is accomplished by first establishing a connection to the JSON source and applying the Transform JSON command to turn nested structures into their parts. After this, data cleaning and even transformations can take place using either the GUI interface of power query or the advanced functions of M programming. However, this can become resource-expensive due to the weight of the dataset and consequently lead to clogging of the resources in the Power BI desktop.
Leverage Azure Data Factory: Rather than attempting to fit all transformation logic inside Power BI natively, consider encoders as a service using tools like Azure Data Factory (ADF). This tool, ADF, aims at orchestrating data within a system and, therefore, has the ability to handle large JSON files. Enabling ADF helps to perform transformations such as flattening the nested JSON components and applying the required transformations, in addition to adjusting with schema drifts, before the data is stored in a raw format in cloud services like Azure SQL Database, Data Lake, or blob storage. This way, Power BI consumes only pre-processed data that is already uploaded to the cloud and, therefore, relieves the computing power of the installed machine.
Make Recurrent Updates Efficient with Incremental Refresh: When creating dataflows in Power BI that will use JSON data, it is advisable to set up an incremental refresh to limit the amount of data handled in every refresh cycle. This is particularly beneficial for large and frequently updated datasets. Make sure that the source JSON has a time stamp or identifier column for this purpose.
Rather than trying to overwhelm Power BI Desktop with large and complicated JSON sources, incorporate heavy ETL on Azure Data Factory or other comparable tools for prep work and then use lighter data transformations within Power BI dataflows.