Datatrees in Grasshopper 2.0

CDatatrees are difficult and annoying, everyone agrees on that. Unfortunately they are here to stay (at least until Grasshopper 3) as we don’t know how else to solve certain problems that arise when managing data within a node-based developer environment. See this post on the Grasshopper FAQ page if you want to know about our reasons and justifications.

That having been said, there are a lot of things that can be done to make trees better and more accessible while sticking to the paradigm. The main goal of this discussion is to collect feedback regarding improvement and to discuss the new implementation of Datatrees in Grasshopper 2.


The classes have been redesigned from scratch and there’s a fair number of differences in both nomenclature (which was never consistent in GH1) and implementation details, but there are three major differences which need to be mentioned up front:

  • The IGH_Goo constraints will disappear. Instead of data needing to be wrapped up into classes which implement the IGH_Goo interface, trees in GH2 will have no constraints on their types. Rather, the functionality currently provided by IGH_Goo (serialization, preview, formatting, conversion, baking, duplication, …) will have to be provided by separate classes. This also means that there will no longer be a distinction between DataTree<T> and GH_Structure<T>. There will only be a single type in GH2 which will be used by compiled components and script components alike.
  • Grasshopper2 trees will support metadata dictionaries for all individual items. Any number of key/value pairs can be assigned to any datum in any tree. Since we dropped the IGH_Goo constraint this metadata cannot be stored as part of the goo implementation (which would have been the natural place for it), but instead, all trees maintain a separate but synchronised metadata collection.
  • Trees will be immutable, meaning that once constructed, they never change. Any method which modifies a tree in any way always returns a new tree.

Nomenclature for Grasshopper 2.0:

  • Path: (similar to GH_Path) A collection of at least one positive integers that identify twigs within trees. Paths are formatted using curly brackets and semi-colons: {4;7;0;0}
  • Site: A combination of a path and a twig index. A site uniquely identifies a single item in a tree. Sites use path notation with an appended index in square brackets: {4;7;0;0}[2]
  • Pair: An item + metadata combination. Although metadata is internally stored separately from items in trees to avoid wasting memory, accessor methods often use Pairs.
  • Leaf: A combination of pair and site. A leaf identifies a specific item within a tree while also containing all data relevant to that location.
  • MetaData: An immutable dictionary with support for a dozen or so immutable data types. I’ve made two assumptions while designing the metadata api: (1) most items will not have metadata associated with them, (2) metadata will often be shared amongst many different items of many different types.
  • Twig: An indexed and ordered collection of items within a tree. Twig<T> is very similar to List<T>, except it’s immutable, uses marginally less memory, has support for metadata, and is able of storing null-ness for valuetypes.
  • Tree: A sorted, indexed and keyed collection of twigs. Trees are just an immutable kind of SortedList<Path, Twig<T>> with a whole bunch of methods and properties to access that collection in meaningful ways.

Many of the above types are generic (i.e. Tree<T>, Twig<T>, Pair<T>, …) but they are all based on non-generic interfaces (i.e. ITree, ITwig, IPair, …) that make it possible to do at least some operations on trees even if you don’t know of what type they are. In some cases, instead of methods with generic arguments, methods with dynamic arguments are provided.


This post is already becoming uncomfortably long, so I’ll type up some example code in responses below.

5 Likes

Example A: How to create a tree containing some data.

Tree<int> tree = new Tree<int>();
Path path0 = new Path(0, 0);
Path path1 = new Path(0, 1);
tree = tree.Add(path0, 14);
tree = tree.Add(path0, 16);
tree = tree.Add(path0, 18);
tree = tree.AddRange(path1, 55, 56, 59, 72, 88);
Rhino.RhinoApp.WriteLine(
  "Tree contains {0} twigs and {0} items", 
  tree.Count, 
  tree.ItemCount);

The above code creates a new, empty tree, then adds three values associated with path \{0;0\} and five values associated with path \{0;1\}. Note that since all trees are immutable, adding values to trees always yields new trees, rather than modified existing ones.

Populating a tree this way is fairly inefficient because of all the intermediate trees that are involved. More efficient but more complicated ways do exist for creating big trees from scratch.

The last line will write "Tree contains 2 branches and 8 items" to the Rhino command history.

3 Likes

Example B: How to get data out of a tree.

This example assumes the tree from Example A.

Tree<int> tree = ...; // See Example A
int valueA = tree.Item[(0,0), 0];
int valueB = tree.Item[new Locus((0,0), 1)];
RhinoApp.WriteLine("A={0}, B={1}", valueA, valueB);

Twig<int> twig = tree.Twig[(0,1)];
int valueD = twig[0].Item;
int valueE = twig.GetItem(4);
RhinoApp.WriteLine("D={0}, E={1}", valueD, valueE);

This code will print "A=14, B=16" and "D=55, E=88" to the command history.

Paths can instantiated implicitly from individual integers or tuples of integers. Thus, any method F which requires a path as an argument could either be invoked as F(new Path(0, 2)) or as F((0, 2)). These two instructions amount to the same thing.

valueD requires twig[0].Item rather than just twig[0], because the indexer returns a Pair<int> which contains both the integer and the metadata and the nullness for that value.

3 Likes

Example C: How to remove items from a tree.

This example assumes the tree from Example A.

Tree<int> tree = ...; // See Example A
tree = tree.RemoveAt(new Locus((0,0), 1); 
// 16 is now gone.

tree = tree.RemoveBetween(
             new Locus((0,1), 0),
             new Locus((0,1), 4), 
             false); 
// 55 to 88 are now gone, {0;1} is an empty twig;

tree = tree.RemoveEmptyTwigs();
// {0;1} is gone, only {0;0} remains with the values 14 and 18.

There are many more removal methods, but they should all be fairly obvious to those who are used to .NET semantics.

1 Like

Example D: How to enumerate over the items in a tree. INCORRECT due to changes in the api

Tree<Polyline> tree = ...;
foreach (Path path in tree.Paths)
  RhinoApp.WriteLine(path.ToString());

foreach (Polyline pline in tree.NonNullItems)
  if (!pline.IsClosed)
    pline.Add(pline[0]);

foreach (Branch<Polyline> branch in tree.Branches)
{
  branch.RemoveNullItems();
  if (branch.Count < 2)
    continue;
  double[] lengths = new double[branch.Count];
  for (int i = 0; i < branch.Count; i++)
    lengths[i] = branch.GetItem(i).Length;
  branch.Sort(lengths);
}

There are lots of enumerators build into Tree<T> and Branch<T> and they all use the yield approach so there’s no up-front cost to foreach.

The Sort() example here is an overload which takes an array of doubles to be used as sorting keys. Overloads also exist for int and string arrays, as well as custom IComparer<T> instances.

Example E: How to duplicate and convert trees. INCORRECT due to changes in the api

Tree<int> tree1 = ...;
Tree<int> tree2 = tree1.ShallowDuplicate();
tree2.Flatten();
tree2.Graft(); // tree1 hasn't changed at all so far.

// Create a tree which has the exact same topology as tree1, 
// but instead of integers it contains doubles one-hundreth the value.
Tree<double> tree3 = tree1.Convert(IntToDouble);
...
private double IntToDouble(int value)
{
  return value * 0.01;
}


// Creating full duplicates (deep copies) of trees containing mutable reference types,
// a custom duplication delegate is required.
Tree<Brep> tree4 = ...;
Tree<Brep> tree5 = tree4.Duplicate(DuplicateBrep);
...
private Brep DuplicateBrep(Brep value)
{
  if (value == null)
    return null;
  return value.DuplicateBrep();
}

Example F: How to assign customdata to items in trees. INCORRECT due to changes in the api

Tree<Curve> tree = new Tree<Curve>();
Branch<Curve> branch = tree.Ensure(new Path(0));

CustomKey key = new CustomKey("Rhino", "Object");
foreach (ObjRef objRef in objects)
{
  Curve curve = objRef.Curve();
  if (curve == null)
    continue;

  CustomData data = new CustomData();
  data = data.Add(key, objRef.ObjectId);
  // CustomData is immutable, so adding a key/value to it returns a new instance.
  branch.Add(curve, data);
}

With all the recent Grasshopper 2 rumblings, any news regarding the development of DataTrees? That seems like a pretty core subject for discussion and feedback :evergreen_tree:

I’m not sure what qualifies as ‘news’, I recently redesigned them from scratch again to make them immutable. So the major differences between GH_Structure<T> and DataTree<T> in GH1 vs. Tree<T> in GH2 are now:

  • There’s only one type.
  • It no longer requires an interface like IGH_Goo to store data, data can be stored directly, as is.
  • When ValueTypes are stored in a tree, they can still be null without the use of the Nullable<T> struct.
  • Meta data is supported for every element in a tree.
  • Trees and branches are immutable, although of course you could always choose to put mutable instances in a tree, so some care still needs to be taken.
  • A lot of operations on trees are internally multi-threaded.
  • A lot more functionality is available directly on trees, in a more consistent manner than before.
  • It is much easier to convert from trees to standard .NET collection types (arrays, lists, nested arrays and lists, enumerators, collections, dictionaries)
  • It is also much easier to create trees from standard .NET collection types.
  • (De)serialization will be handled by the trees themselves, provided there are registered type assistants for the specific data types in the tree.
  • A lot of memory optimizations, basically the immutability of trees and branches allowed me to write a dozen or so different types of branches (empty, only containing a single item, only containing null items, no meta data, valuetypes with no nulls, valuetypes with nulls, sparse lists, sealed vs. abstract vs. interface type constraints, and combinations of the above). So I can pick which type is sufficient to store the data and thus won’t waste any memory on meta data or null masks that won’t be needed anyway. All this is hidden behind a single, public IBranch<T> interface so you’ll never experience this complexity as a programmer, but it’s all there.

The only big thing still unfinished is that I’ll have to come up with some sort of TreeBuilder<T> helper type since it can be inefficient to iteratively populate an immutable collection. The builder pattern is reasonably well established, although clearly C# and VB were originally designed with mutable collections in mind.

5 Likes

I’d say it does, great news at that. As a GHPython’er (primarily) I try not to implement DataTrees too much (except for reading/writing them), but this all sounds good to me. Cheers on the update.

1 Like

A fundamental aspect of the way we work with trees is how they are matched. A lot of people mistakenly assume that branch path indices are somehow considered in the matching of branches, when in fact they are just matched one-by-one blindly. (And i understand why it works this way!)

Without a real sense of where GH2 development stands currently, I’m not sure if it’s the right time to have this conversation - but I think there are some datatree- and matching-related issues that GH2 should probably try to address in order to make the concept more intuitive and direct. Some of these may already be irrelevant given your latest thinking on trees - but wanted to do a brain dump anyway.

My wishlist/suggestion list would include the following:

  1. A mechanism to perform replication at different levels of the tree. This might function something like @dave_stasiuk’s “propagate ancestors” or like Dynamo’s “List@Level”. Given two trees with more than two levels of difference in their hierarchies: {0} (3) and {0;0}{0;1}{1;0}{1;1}{1;2}{2;0}{2;1}{2;2}, for instance, if I want to have the item at i=0 in the first tree affect branches {0;*}, currently I have to use a plug-in or multiple stages of duplicate/longest list + graft. This is something I see beginners and students CONSTANTLY frustrated with - getting the stuff in list A to apply to list B in the right way.
  2. “Shift paths -1” as a first-class tree operator that can be right-click applied to any param. I use this operation more than any other tree component, and as the exact opposite of a “graft” operation it belongs at the level of basic tree concepts like graft + flatten.
  3. A visual way to inspect the history of a data tree. What I mean by this is some graphic mechanism to back-trace the ancestors of a given param - to help understand why it has the structure it has by virtue of the data + trees upstream. Here is my crude sketch:
    This could function like the ctrl-alt click does to quickly give some useful visual feedback as a temporary overlay.
  4. Better visual coding of the “Access” levels of given params. Understanding this is pretty crucial to being able to manipulate trees, and it wasn’t until I was writing my own C# scrips that I began to understand it. The replication behavior of a component is governed by the “Access” its params have - so a color or symbolic code to denote item, list, and tree access on the component itself would be of great help to users (could also be an on/off thing like fancy wires).
  5. “Groups” but for all data types. MetaHopper currently implements a concept of “Wrap” and “Unwrap” data that functions more or less like GeometryGroups except it accepts non-geometric data. This makes an operation like “reverse the order of the lowest level of branches” suddenly easy - you just turn your branches into objects by “wrapping” them, reverse them with a normal list operator, and then unwrap. I think this could be a useful base feature.
  6. This one sounds like an implementation challenge, or potentially confusing, but I can imagine it being immensely useful - the ability to employ grasshopper’s native replication behaviors to clusters as though they were functions, by specifying Access (item/list/tree) on cluster inputs explicitly. Take the function below:

    This function works great with a single curve. If you provide it a list of curves, it does “the wrong thing” - results in a single loft. Now, this particular case is easy to fix - I just graft my input - but if I was able to do this:

    The consequence would be that I could design my reusable functions/clusters for the simplest data tree case instead of the worst case. I could flatten and do all kinds of destructive operations internal to the cluster, with the knowledge that when in use the replication/looping would be handled for me.

Now that you’ve opened the #serengeti:grasshopper2 category you’re probably going to get a lot more demanding users making wishlists :slight_smile: but I hope at least some of the above seems useful or valuable to implement. Data Trees take a lot of flack but I actually think they solve a number of problems in an extremely elegant way, and I look forward to seeing how they evolve!

13 Likes

I’ll have to get back to you on this once I’m fully sober. Although as a quick note to (3), adding debugging features is super high on the list of new things to work on. Organisation/documentation and debugging of files is such a pain in the neck at the moment that it becomes the major impediment to creating larger files. I imagine there will be lots of useful overlays modes both in the Rhino viewports and the GH canvas that could really help with (a) understanding how a file works and (b) understanding why a file doesn’t work.

5 Likes

I do not plan to make this an option again. You’ll see fancy wires all the time and you’ll like it. Although the styles won’t be as visually overtly different since Eto doesn’t allow for complicated pens like the striped+dashed+capped style for tree wires.

I do agree it would be good if there was a visual feedback for the access level per parameter. Ideally something subtle and easily ignored, so I don’t have to make it optional.

6 Likes

Hi David,

While I can’t comment on any of the coding aspects, and I don’t have Andy’s depth of reasonning, I can try to point out some of the issues I struggled with in respect to data trees, and splutter how I’d like to see them dealt with :

-Components adding branches “just in case” the input becomes a list of more than one element is one of those good intensions that turn out to be a nuisance. Now that I have understood how “Simplify” was not my friend here, I speckle my definitions with “Trim tree” components to keep the trees at the minimal branching level when I know that, by design, I will only have lists of singletons as inputs.
I suggest (as I did in a specific thread of the GH forum) that an option be added on the outputs, along with “Graft”, “Flatten”, etc, which could be called “Don’t add branches if not necessary given the current input”… or something much more sexy like “Auto-Groom”.

-I often struggle to match lists of objects and their properties, so the idea of adding custom data to an item is excellent.
I hope that it will be possible to “attach” multiple custom data points to each item, and that there will be a simple way to edit them.

-The path mapper looks like the killer tool for path management, but it’s not because it is not dynamic, and the “Replace path” component can be made dynamic, but lacks the “automation” of the path mapper.
Is there a “Mega path manager” tool in the works that would be the best of both worlds ?

-Data matching… Not sure how to improve how it works, but it sure could use some visual feedback.
Here’s my silly attempt at trying to solve this :

  1. Right-click a component’s output of your choice
  2. Hit the new “Show Genealogy” (copyright Olivier Suire 2017) option
  3. Panels pop-up showing this output and also it’s “ancestor” inputs with graphic cues showing how the data was matched.
    Here are two examples :

-The way component “Help” is currently formatted is extremely austere and opaque which is in stark contrast with the rest of the GH experience. Just providing a few examples and more granular explanations in the “Help” boxes would go a long way. Despite the great support found on the forum, these help boxes should be the first place to find clear answers.

1 Like

Yep, as much custom data as you want. The design right now is that you can attach the following types of data:

  • Booleans
  • 32-bit integers
  • Arbitrary sized integers
  • Dates and times
  • Durations or timespans
  • Double precision floating point numbers*
  • 128-bit globally unique identifiers
  • Text
  • Colours
  • 3d points*
  • 3d vectors*
  • 3d planes*
  • 4x4 transform matrices*

[*] means that the data can be marked ‘transformable’, meaning (if everyone agrees to play nice) it will be transformed along with the base data.

I’m still trying to decide whether it makes sense to also support byte-arrays. It would make storing certain kinds of data easier/more efficient, but byte arrays can always be converted into and from textual strings.

Each entry in a meta data dictionary is stored under a meta key, which is a collection of strings. Think of it as the folders in a file path. This allows you to create deep structure without the need for metadata to become hierarchical.

1 Like

I would like to merge a lot of different components into fewer, better ones. Especially the data management category (and, yes, especially anything to do with trees) is terribly bloated. I’m planning some aggressive mergers and hostile take-overs.

3 Likes

:smile:

This is going to be a huge area of focus, and it’s also the part where two other people in McNeel have already done work. We want a very flexible system that can be easily added to or translated without the need to write any code, and this system must contain useful information written in an intelligible fashion. I would not be surprised if this part takes the most time and effort out of all aspects of GH2.

3 Likes

This is great!

Byte array support would be fantastic, too…certainly from a development point of view.

Would it be possible to add helper functions for converting:
arrays of arrays, nested listis, Ienumerable of Ienumerables,
arrays of nested lists,
arrays of arrays of arrays of…
to tree?

Currently for C sharp libraries I always have utilities methods for that.

For short:
Extension method for converting c sharp collections of type to trees would be very helpful.