Extracting specific words and numbers from xml files

Firstly, excuse me for the akward terminologies, I’m no programmer, but have lots of experience with gh.
Short description of the end goal:

I want to read an xml file (containing information on a list of materials) with gh, and extract the properties of the materials, so that gh can automatically read these xml files and create materials and constructions from that.

In the xml the materials are listed by index, so every material will start with:
BuildingMaterial Idx=“3” Name=“insulation”

…then it will contain a list of information, like so:
EmbodiedCarbon>1.08</EmbodiedCarbon

The information cannot be recognized by which #row it will be in, as this will be different for each document.
I’m sure there’s a way to do a python script that will read and extract, and I hope for your help with figuring out how!

Cheers

If you can share part of such a file or better yet an entire file we’ll be able to help better.

Here’s the full file and the part in focus extracted. Hope this can help to understand

One composite - only building materials.xml (4.5 KB) One composite.xml (22.4 KB)

Best way to approach this is to use an XML parsing lib that allows you to search for the <BuildingMaterial> tags and extract the information you need from it.

From the document you’d get all XML nodes called <BuildingMaterial>, then for each node you look at its attributes and child nodes.

Since you are looking to do this in Rhino Python I suggest using the System.Xml namespace.

Could you give me an example?
I think I understand what you say, but I wouldn’t be able to do that.

Found this searching based on what @nathanletwory suggested:

Brief example, but it should get you started.

-Kevin

You can find docs on the System.Xml namespace here:

Looking at these docs and the example from the link I posted above I came up with this:

import clr
clr.AddReference("System.Xml")
import System.Xml

filename = r"/Users/Kevin/Desktop/xml parsing/One composite.xml"
xmldoc = System.Xml.XmlDocument()
xmldoc.Load(filename)

elemList = xmldoc.GetElementsByTagName("BuildingMaterial")
for elem in elemList:
    print(elem.InnerXml)
    print('=====================')

This should get you started (you will have to change the file path to match the file location on your system).

-Kevin

1 Like

Thanks very much, Kevin!