Turn text input from a file to chunks (looking for a better way)

I’m working on a definition that imports text from a file, looks at a certain value in the file, and then uses this value to turn the text into chunks/new branches each time it is found. The value itself is put at the start of each chunk and has its own line, except the first chunk since the pattern value doesn’t appear at the start of the file.

I have been looking into this for a couple of days and my knowledge is lacking in how to do it in an efficient way. I have found a working way and simplified this so I could make it more clear on the forum what I’m trying to do and how I think it should be done. This is meant to be a question where I (and hopefully others) can learn from.

There is probably a way easier method to do this but I haven’t found it, so that’s why I’m asking here:
How could this definition be simplified? Are there tricks or techniques to handle this?
Am I looking at the correct components?

File here:
text_split_1.gh (15.2 KB)

ps I simplified the file so forum users can see more clearly what the desired end result would be, and how I managed to reach the desired result, I now see I left in a useless suirify and some simplified outputs that were not necessary. I forgot to remove those, so you can ignore them. Also, in my original script I work with data that is already divided into branches, so normally I would use a merge instead of an entwine. To keep the branch data intact.

this uses Elefront’s Create Tree component:


text_split_1_Re.gh (22.7 KB)

[edit] no plugins, only vanilla components:


text_split_1_Re_Re.gh (19.0 KB)

1 Like

There we go. I did check all the plugins I had for these kind of operations, and also checked Elefront, but I didn’t knew that create tree could do this. I will make happy use of this. Thanks!

Also, very interesting to see the use of Mass Addition, I never realized it could be used to create a path structure. I understand what it does, it’s telling which indexed line should go to which path, but I do not entirely understand how the Partial Results results in the list the way it does. It looks like it counts the first four false into four 0’s, then one true plus two false combined into three 1’s, etc. But how does it know to put differentiate between the first part and the second part? Does it look at the True/False statements as being one’s and zero’s?

And did you find the Easter egg (sorry)?

ps getting strange results using the Elefront solution on my own dataset, but the native solution still works on my own dataset (which consists of five different branches). Have other matters to attend to right now, but I will try to create a new dataset that is more exemplary of what I’m using to see what could be the issue. I’ll upload a new file soon!


It seems to turn the path structure into {X} instead of the desired {X;X} structure.

exactly

True / False are automatically converted to 1s and 0s by Mass Addition
Partial Result will show you result in steps, where each item represents the sum of all the previous ones (of course: this item and all the previous ones taken from the input List, 0s and 1s )

:slight_smile:

yes, sorry I didn’t consider that, Elefront’s Create Tree wants full paths so it won’t work on branches

but I think the vanilla version will work nicely even on multiple branches:

1 Like

Thanks for explaining. I have decided to use the native solution for future projects as this will work the easiest. Despite member index taking 205ms to compute on my current file (current input file is still only around 1200 lines long, so I hope it won’t take much longer with a bigger input file).

The first solution only takes around 33ms to compute somehow, so I’m going to use that for this specific project as I expect the file sizes to be bigger/longer than my test file. I think that eventually probably using Python or C# for this project might help speed up things even more. But that will come after I learn more about those languages. I’ll try to remember to update/add to this topic if I discover any other way.

Long messy solution in OP:

Native solution with Mass addition and Sets:

Difference between Sets and Elefront Create Tree:

the main thing is to provide the Create Tree with the correct Paths
I think this could work, but maybe there are better ways to get to the same final result:

[here I’ve just entwined the same original list 3 times to create some branching]


text_split_1_Re_Re_Re.gh (12.4 KB)

you’ll notice I didn’t close the parenthesis } …looks like Create Tree is happy even without that :slight_smile:

[edit] looks good :slight_smile:

1 Like