Python variables and arguments best practices

I’ve gotten my fair share of the error: var referenced before assignment, so I wanted to know what is the best way to deal with global and local variables as well as when and when not to use arguments. FYI, I use GhPython for the most part.

  1. Are variables initiated outside functions automatically global variables? I don’t have an example right now, but I’ve found myself in situations where the “referenced before assignment” error is given when I have a function that references global variables (these global variables are often defined at the bottom of the script just before all the functions are executed). This however is often fixed when I pass that variable through as an argument and overwrite the original global with the returned value.

  2. Am I always supposed to provide arguments? I prefer not having so many variations of the same variable name, especially when I have functions in functions that use the same variables. In the case of functions in functions, can/should an argument have the same name as the local variable you are passing through (to avoid having so many variable names)?

  3. The reason why I am interested in using global variables is that it makes using ghpythonlib.parallel() easier when you have may lists. If we don’t use global variables, how do we pass through multiple lists (like a list of curves you want to project onto a list of surfaces)?

I’ve got an example of the problem detailed in the first part:

#case1
var1=10

def func1():
    var1=20

def func2():
    print(var1)

func1()
func2()

It does not print 20, but prints 10.

#case2:
global var1
var1=10

def func1():
    var1=20

def func2():
    print(var1)

func1()
func2()

Doing global does nothing.

#case3
var1=10

def func1():
    global var1
    var1=20

def func2():
    print(var1)

func1()
func2()

It finally prints 20.

#case4
var1=10

def func1():
    global var1,var2
    var1=20
    var2=30

def func2():
    var1+=1
    print(var1)
    print(var2)

func1()
func2()

var2 prints 30, but var1 is “referenced before assignment.”

It appears global allows the function to read and write to the global variable. And global seems to create global variables when inside a function. But global variables cannot be manipulated inside a function with the assumption that it exists with some value (var1+=1).

I am not a programmer by trade, but what I have heard is that global variables are to be avoided whenever possible - the main reason being that its very easy to make a mess of it and hard to debug when any line of code anywhere in the program can alter the value of a global.

I prefer to work in the following way: If there are many subroutine definitions, I create a “wrapper” main routine that first collects the inputs from the user or document, manages the lists to output, plus all the calls to the subroutines. The main routine sends out input arguments to the various subroutines and receives the return values to continue to process and fill any output lists… In that way, variables only have scope of their particular definition and debugging/tracking the data flow becomes much easier.

#subroutine call example
import rhinoscriptsyntax as rs

def SubA(a):
    return a+1

def SubB(b,x):
    return b+x

def SubC(c,y):
    return c*y

def Main():
    mylist=[]
    seed=rs.GetInteger("Pick a number from one to ten",minimum=1,maximum=10)
    if seed==None: return
    s_a=SubA(seed)
    mylist.append(s_a)
    s_b=SubB(s_a,seed)
    mylist.append(s_b)
    s_c=SubC(s_b,seed)
    mylist.append(s_c)
    print "You chose the number {}".format(seed)
    print "Output values are: {}, {}, {}".format(mylist[0],mylist[1],mylist[2])

Main()
1 Like

I’m also not a professional programmer, but just to reiterate, I would minimize the number of global variables. Here’s some reactions to your points:

  1. Your examples are very interesting, but I would think that your problems will disappear when you’ll start passing and returning arguments. The solution in #case3, where a global is declared and modified inside a function seems like a dangerous practice in terms of producing errors down the road.

  2. In terms of the naming, I’d say that it’s perfectly fine to have many different functions that accept an argument with the same name. So, yes, you definitely CAN pass through the same name, and in fact I do that all the time. I would say that, in that case, you also SHOULD, since it’s the same argument that’s being passed.

  3. Having a list with all the stuff I have created is one of my main uses of global variables. The other one is for parameters. Parameters I never modify inside the script, and the lists with stuff I only modify on the “global” level (with arguments returned by functions). That way, I tend to have no problems with unassigned variables etc.

In terms of your example, here’s how I would do it (without any global):

def func1():
    var1 = 20
    var2 = 30
    return var1, var2

def func2(var1, var2):
    var1 += 1
    print(var1)
    print(var2)

var1, var2 = func1()
func2(var1, var2)

Hi all,

A possible workflow for these issues (passing variables and tracking results) would be to create objects that hold or store the information.


def SettingsMethod():
    #store settings in a dictionary and return it
    settings_dictionary = {}
    settings_dictionary['height'] = 12
    settings_dictionary['offset'] = 0.1
    return settings_dictionary

class SettingsClass(object):
    #store settings in a class object
    def __init__(self):
        self.height = 12
        self.offset = 0.1
        

def DoStuff():
    
    settings_dict = SettingsMethod()
    
    height = settings_dict['height']
    offset = settings_dict['offset']
    print height,offset
    
    settings_object = SettingsClass()
    
    height = settings_object.height
    offset = settings_object.offset
    print height,offset
    
 
DoStuff()

To elaborate on classes a little, they are great for passing results around as well:

import Rhino
import rhinoscriptsyntax as rs

class MyCurve(object):

    # a class that is initialized with a curve_id
    # all in __init__ is run at the creation/initialization of a new instance
    def __init__(self,curve_id):
        self.curve_id = curve_id   #curve_id is stored in the class
        self.length = rs.CurveLength(curve_id )  #length is stored in the class
       
    
    # methods inside a class can be called via class_instance.Method(args)
    # note that self is referring to the class instance itself and is not passed as an argument when run outside the class
    def SetToLength(self,new_length):
        #method to set the curve_id to e new length
        extension_length = new_length - self.length
        rs.ExtendCurveLength(self.curve_id,2,1,extension_length)
        #update the length stored in the class instance
        self.length = rs.CurveLength(self.curve_id)


def SetCurvesToMeanLength():
    
    curves = rs.GetObjects('get curves to mean length of', 4)
    
    #create a list of MyCurve class instances based on the selected curves
    my_curves = [MyCurve(curve) for curve in curves]
    
    
    for my_curve in my_curves:
        print my_curve.length
    
    #get all lengths in a list via a python generator
    lengths = [my_curve.length for my_curve in my_curves]
    
    #calculate mean length from the list of lengths
    mean_length = sum(lengths) / len(lengths )
    print 'mean :',mean_length
    
    #set all curves to the mean length
    for my_curve in my_curves:
        my_curve.SetToLength(mean_length)
        print my_curve.length
    

SetCurvesToMeanLength()

I’ve always wanted to post something about the use of classes on discource as they are very useful once you get your head around them. Hope this can serve as a useful introduction example in the Rhino context.

More on classes here:
https://docs.python.org/2.7/tutorial/classes.html
https://en.wikibooks.org/wiki/A_Beginner's_Python_Tutorial/Classes
http://www.jesshamrick.com/2011/05/18/an-introduction-to-classes-and-inheritance-in-python/

-Willem

3 Likes

it’s not printing 20 because you don’t have print inside func1… when you call func1, the variable is changing to 20 since you reassigned it to 20… it’s just that you you don’t have feedback coming out of the function which i think is leading you wrongly assume the variable isn’t changing to value 20.

in all of your examples, stuff is happening in func1… but you’re not printing/returning any values and have no feedback as to what’s happening… you’re only watching what happens in func2.

to see that the variable is 20 in func1, do this:



var1 = 10

def func1():
    var1 = 20
    print (var1)

def func2():
    print (var1)

func1()
func2()

or this:

var1 = 10

def func1():
    var1 = 20
    return var1

def func2():
    return var1

print func1()
print func2()

both of these will print:
20
10


(not meant as an example of ‘best way’ etc… just saying why 20 isn’t printing in your examples)

Hi Jeff, all

In other words:
var1 inside func1() is a local variable, and it’s not the same variable as var1 outside func1.

Trying this

var1 = 10

def func1():
    var1 = 20
    print (var1)
    print globals()[ 'var1' ]

func1()

we can see that two var1 variables exist: one local and one global, but obviously the global variable is hidden by the local one.

Without the global statement, the var1 variable assigned to is local
With global, the var1 variable assigned to is global

Thanks for all the responses.

I guess the convenience of having global variables is that you don’t have to pass through so many arguments- especially when the function references many variables. I have been using classes more and it definitely has helped to reduce the need for separate lists that contain attributes (PersonHeight, PersonAge, PersonGender).

If I don’t use global variables, how do I use ghpythonlib.parallel() in cases where I have to reference multiple variables? Doesn’t this method just take one list of variables that get passed on individually as an argument? Lets say I have two lists of points (list_a and list_b) and I want to find the closest point from list_b to list_a, how do I pass both lists through here? More specifically, I need to pass one from list_a and all of list_b for each iteration.

I like to have global variables as constants that I can reference. It just seems like a lot of work to have to list out every variable you may reference in a function.

Hi Lawrence

I think here ( into a GH Python component ) using so called ‘global’ variables is fine.
Actually I don’t know if ‘global’ is the right word, since here we are into a component that gets its input from the component parameters … anyway.
Generally speaking, if you should have to avoid global variables, I think you could use a list of lists, something like:

arg = []
for pnt_a in list_a:
    arg.append( [ pnt_a, list_b ] )

and then pass arg as the list argument to ghpythonlib.parallel.run()

Thanks for the response. If you compress pnt_a and list_b into a list, doesn’t that mean you’ll have to unpack it later in the function? Or by doing so can you do:

def myfunc(local_pnt_a,local_list_b):

so you don’t have to unpack the contents of the list of list in the function?

I wrote a small script just to test out global variables. This script is just a test and is not part of anything larger. Each point searches for its closest neighbor and curve-closest-point, and draws a line to whichever is closer.



If I don’t Duplicate() the ptcloud, the global ptcloud seems to be directly affected by the RemoveAt(selfindex) command. You can see how the _ptcloud.Count goes down as it is print in the second screenshot of the script.

In case #3 from my second post var1’s value only carried out of the function because I did global var1 inside of the function, but in this case I did not do global ptcloud, so why did its global value/version get affected? My assumption is that unless I type global ptcloud , shouldn’t ptcloud be read-only, I did not expect RemoveAt() to write to the global version of ptcloud.

160419 Test 1.gh (3.9 KB)

Yes, I think you have to do that.
ghpython.parallel.run() expects a function with one argument, not two.

Trying to get my head around classes…

Does this my_curve.length…

…reference back to this in the class initialisation?

So there is no need to create a “length” function within the class, self.length covers it?

Currently, self.length_ is initialised with the length of the curve when the my_curve object is first created.
This fixed value is no problem, as long as you are sure that the length of the curve will not change down the road. (At which point my_curve.length will still return the old value.) Generally, it would be safer to write a function that will always return the correct value, and not to initialise a length at the object creation.

2 Likes

Thank you for the explanation, very helpful.

Hey @Ncik,

a helpful feature for properties like length is the @property annotation. Find more about it here..

You can write a function within the MyCurve() Class that calculates the length of the curve and returns it:

@property
def length(self):
      return rs.CurveLength(self.curve_id)

this way my_curve.length will always yield the correct length.
Just a helpful trick when working alot with derived properties :slight_smile:

Cheers,
Max

1 Like