Data Processing and Mining with Dynamo for Civil 3D
A JOURNEY FROM BEING CONSIDERED THE UGLY DUCKLING TO A SCALABLE DATA PROCESSING AND MINING BEAST
Data Processing and Data Mining are extraction workflows with distinct applications and purposes. Data Processing is the procedure of converting unrefined data into a well-organized and structured format, rendering it suitable for analysis, interpretation, and decision-making. Data Mining, on the other hand, is the process of analyzing data using algorithms and techniques to discover trends, patterns, correlations, relationships, and potential anomalies (see Figure 1).
Figure 1 – Data Mining in Power BI
Over the past several years I’ve seen a lot of folks, myself included, jumping into Dynamo for Civil 3D (D4C3D) headfirst to automate daily workflows and tasks. Looking to increase efficiency and improve production they struggle on the wide-spread adoption front…until recently that is. Sometimes it takes a methodical approach with a lot of trial and error and lessons learned, to gain the confidence and trust of others to start looking at D4C3D as a scalable and sustainable solution for design automation and beyond.
So, what does data processing and mining have to do with D4C3D you may be asking? In this article, we’ll run through my thought process of a recent D4C3D script I initially developed as a Proof of Concept (POC) that turned into a scalable solution supporting our data processing and mining efforts that have provided many value-adds to our projects.
Civil 3D Reporting Methods
Data extraction and reporting tools are nothing new to folks in the BIM | CIM world; however, products and integrations that connect to our data continue to evolve, providing much more visually appealing options to us. Microsoft’s Power BI is a perfect example as one of the (relatively) more recent options that the BIM | CIM communities are flocking to. Developing connected solutions between design authoring and reporting tools that allow for seamless data migrations with minimal to no data loss can sometimes be a struggle though. The following breaks down some of the capabilities, limitations, and options we have available to us within Civil 3D specifically:
The Good – Out-of-the-Box Tools and Capabilities
Civil 3D has gone through several iterations and varying levels of reporting capabilities. Some are available through our Toolbox in Toolspace, some one-off capabilities are available through object modeling tools, and most are now gravitating towards using Project Explorer to review, analyze and extract data associated with our modeled objects in our Civil BIM designs (see Figure 2). More options and capabilities to customize extraction reports within Project Explorer can certainly be of extreme benefit to modelers (senders) and reviewers (receivers) of this information.
Figure 2 – Project Explorer Dialog Box
The Bad – Data Points and Customization Limitations
Although we have several great reporting and data extraction tools available to us directly within Civil 3D, we are still somewhat limited by how many data points each tool can access and extract (see Figure 3). Additionally, we’ll need to identify common data points between datasets that will allow us to build relationships and links between datasets for correlations, trends, anomalies, etc. to become visible. In most cases, the data points that these tools are extracting are good enough for normal production, analysis, and reporting purposes. If there are additional data points that you were hoping to touch and extract, in lieu of hoping the next update or release of Civil 3D will include that capability, we will need to start exploring what other options we have available to us.
Figure 3 – Project Explorer Data Points
The Ugly (Duckling) – Customizing Reports with Dynamo for Civil 3D (D4C3D)
Here is where D4C3D comes in to save the day! I jokingly call D4C3D the “Ugly Duckling” because our scripts can look super complicated and ugly as we develop them but certainly provide us with beautiful solutions in the end. D4C3D is a great Proof of Concept (POC) tool if nothing else. We have been clamoring for Autodesk to provide us with more Civil automation tools and capabilities similarly offered on the Revit side. In Civil 3D v2020 our wish had been granted, although many have struggled to find ways to use it consistently across organizations and make solutions that are scalable.
For anyone who has followed me on LinkedIn or subscribes to my YouTube channel, you have certainly witnessed my efforts in trying to make good use of D4C3D. Through feedback from many of my presentations, demonstrations, postings of videos, writing blogs, etc., I continue to engage with folks who struggle more so on the widespread acceptance and adoption of this tool and scripts being developed. Recently, in talking to a colleague of mine, the realization became clear that many folks across the industry view D4C3D, and Dynamo for Revit, as nothing more than a great POC tool due to the lack of sustainability and scalability of scripts being developed. That said, once many folks are able to prove their concept works, they will turn to more complex code-based solutions to build new tools and plug-ins with those capabilities.
We have an opportunity to dispel this perception and develop a reporting solution (see Figure 4) for our POC that is not only customizable and flexible to access more data points in our Civil | BIM designs, but more importantly, provides us with a scalable solution that we can distribute across an entire organization, without needing to turn to those more complex code-based solutions mentioned!
Figure 4 – Snippet of our D4C3D POC Reporting Solution
Challenge Accepted!
As we decide the data processing and mining goals, we wish to achieve through our development of this POC tool using D4C3D, we’ll want to start by creating an outline (see Figure 5) that answers the following:
- What types of objects do we want to make connections to?
- Which data points do we want to extract and analyze?
- How do we want to organize all data being extracted from our models/files?
- What should our final extraction format be?
- Are there any other products we need to integrate our data with?
- How do we plan to analyze the data points to discover trends, patterns, and relationships to make informed decisions and apply corrective actions on our Civil BIM designs (corrective actions will ultimately improve performance and efficiency across our projects)?
Figure 5 – Data Processing and Mining POC Outline
End Phase (First) - Although the outline (see Figure 5) displays our Start to End process in our expected workflow, we want to start with the End in mind so we can determine how best to get there. In our scalable POC that we’re developing, we’re going to use Microsoft Power BI to connect to our pools of data extracted from our Civil BIM Designs. Additionally, there are several ways we can slice and dice our data to surface valuable information and apply conditional formatting, depending on our use cases. Keeping it high level, the dashboards can provide a multitude of benefits and uses like:
- Improving Production Staff’s Drafting Habits and Trends
- Identifying Skill Development and Software Training Opportunities
- Augmenting Workflows with Automation
- Alternative Design Analysis
- Streamlined Collaboration
- Design QA/QC
- Model Health Analysis
- And the list goes on…
Output Phase (Second) – Continuing to work our way backward from the End Phase, we already know that Power BI can connect to many data sources and formats. One of the most basic formats, and super common whenever data extraction and reporting are being considered, is Microsoft Excel. These Excel files will be created by our D4C3D scripts and used to aggregate and store the data being extracted (see Figure 6).
Figure 6 – Example Excel File that Lists of Data Extracted from our Model
Development Phase (Third) – With the End and Output identified, we can start looking into developing our scripts. This is a bit trickier as we need to consider all phases and will require trial and error along the way. Ultimately, the goal is to develop a D4C3D script that will not only extract data from our Civil BIM Designs and export to Microsoft Excel, but also be organized and cleaned for our downstream uses (see Figure 7).
During this phase, we want to identify all the data points to interrogate and extract from our Civil BIM designs. With the End in mind, we know to properly analyze trends, correlations, and anomalies within Power BI, we need to identify commonalities in our Excel data sources so relationships can be built. To make our POC scalable, we will want to apply some additional input parameters in our D4C3D script allowing us to organize the data being extracted. We can add as many layers to this as appropriate, but adding some input parameters like Project Name, Project Number, File Name, and File Location, gives us a simple start.
Figure 7 – Organizing and Keeping our Data Clean in D4C3D
Start Phase (Last) – Having our data points identified and our data source/storage configured to connect to our Power BI dashboard, we can now put our POC through the paces and to good use. As we built out our scripts in the Development Phase, we added input parameters to give the folks running these scripts more control related to the organization and separation of data being extracted. There are a couple of parameters mentioned like File Name and File Location that can be queried and automatically applied in our D4C3D script, which would eliminate those from the user input category. Things like Project Name, Project Number, and potentially the Excel File and Sheet Naming convention may not always be as straightforward. As shown in Figure 8, we can see how applying certain nodes within our D4C3D scripts as input will display in Dynamo Player.
Figure 8 – D4C3D Inputs Displayed in Dynamo Player
Everything is set up and ready to go at this point! With our D4C3D script developed, custom input parameters applied, our data storage format and location established, and our final dashboard configured allowing us to mine our data being processed, we now have a successful framework in place to distribute and scale up our new automation tool! Scaling up new solutions can often be labor-intensive in its own marketing right. With our tool having the ability to be applied across several different scenarios and uses, we can begin our communication and adoption campaigns by highlighting the flexibility and many benefits our new automation tool can offer.
In Summary
The journey of data processing and mining through D4C3D has transformed it from an initial POC perception into a formidable and scalable tool. This expedition showcases the power of data processing, where raw information is refined for analysis and decision-making, and mining, where data can uncover patterns and correlations. The path from POC to a practical solution underscores the value of frameworks and methodical approaches in gaining trust and confidence for adopting scalable automation solutions. The integration of Microsoft Power BI, the evolution of Civil 3D reporting capabilities, and the role of D4C3D in customizing and scaling reports all contribute to this narrative, emphasizing the importance of flexibility and benefits in enhancing efficiency and collaboration within various project scenarios. In dismantling the notion of D4C3D as a mere POC tool, a new era of adaptable and transformative reporting solutions emerges.