Help with omp


#1

Hi there,
I need help with the following problem:
Multiplication of a matrix by a vector: Ap = q.
This may sound trivial but in my case it is not simply because the dimensions of A are 250,000,000 by 250,000,000. The variables in my algorithm are:
dataMatrix (50,000,000 x 30) is a matrix that contains indices (addresses) for the matrix A
dataRecord is a row of dataMatrix
nrow is the index to the corresponding Rinv matrix, e.g Rinv(5,9,9)
the call to the function getASddress creates two matrices address containing the indices of A, say A(3,5) and coefficient contains the corresponding values of A(3,5).
p is an input vector and q is the output vector
My attempt to parallelise the algorithm failed with a crash of the program. Any help to fix the problem and improve the speed of the program will be greatly appreciated. Below is the code (note that I am using Blitz arrays):

int numOfThreads = 5;
#pragma omp parallel for num_threads(numOfThreads)
for (int ir = 0; ir < dataMatrix.rows(); ir++) {
  dataRecord(Range::all()) = dataMatrix(ir, Range::all());
  nrow = getRowForRinv(dataRecord);
// find address and design matrix coefficients
  getAddress();
// calculate contributions  	
  t1 = 0.0;
  for (int i = 0; i < newNumFactors; i++) {
    for (int k = 0; k < numberOfTraits; k++) {
      if (address(i,k) == missingValue) {continue;}
      t1(k) += coefficient(i,k)*p(address(i,k));
    }
  }
  t2 = 0.0;
  for (int k = 0; k < numberOfTraits; k++) {
    for (int l = 0; l < numberOfTraits; l++) {
        t2(k) += Rinv(nrow,k,l)*t1(l);
    }
  }
  for (int i = 0; i < newNumFactors; i++) {
    for (int j = 0; j < numberOfTraits; j++) {
      if (address(i,j) == missingValue) {continue;}
      q(address(i,j)) += coefficient(i,j)*t2(j);
    }
  }
}     // end of data loop

(Dale Fugier) #2

It does not appear that you are making any calls into Rhino or openNURBS. Can you make this work in a standalone application?

Just curious, why the big matrix?


#3

Rinv is a global matrix so no need for calls. Why the big matrix? We are
in the business of prediction gene effects. In this case the matrix is of
moderate size. As I mentioned
in this example we are trying to multiply a matrix by vector as a part of
PCG algorithm to solve huge linear system of equations. This step we call
iteration on data. There is no other way to solve
the equations even if we use sparse matrix techniques because of the size
of the problem.

kony


(Dale Fugier) #4

Can you make this work in a standalone application?


#5

It works without OMP but takes 220 seconds per iteration which is a lot
given that the number of iterations to reach convergence is about 2,500 -
3,000.
The idea is to reduce the time by 1/2. Stand-alone application is not
impossible but will require some work to substitute function getAddress()
with something that calculates the address
and coefficients within the main loop and make the matrix Rinv local (read
it from a file or something). Lots of work! What is the reason for that?

kony


(Steve Baer) #6

Is this Rhino related? It seems like a question that you may get a better answer for on StackOverflow.


#7

what do you mean by “related”. Rinv is a global matrix that is needed for
the calculations. The function works without OMP directives. So my
conclusion is that OMP
directives caused the problem of StackOverflow.

kony


(Steve Baer) #8

I meant that this seems like it is not a programming problem specific to Rhino and may be better answered on the site http://stackoverflow.com/


#9

thanks, I’ll try that. But still I would like to know how to parallelise
this problem. Conceptually.the task is very simple:
we have a huge matrix (huge loop). We want to chop the loop into peaces
and distribute the peaces to different processors
and within each perform three small matrix vector products. Can you give
some clue how to do that.


(Andrew le Bihan) #10

Kony

Since it’s impossible to run your code, and since the data structures are not specified it’s virtually impossible to answer your question except in the most general terms. I would suggest that you write a completely standalone version the function that can be run and understood by any developer so that we can give you an answer.

We don’t mind helping with general coding questions if they are clearly and concisely stated. Obviously we prefer to spend our time dealing with problems related to the Rhino SDK, but even so we will try to help with general coding questions too.

The most likely cause is that the data the loop is operating on is dependent on non-invariant data elsewhere in the data structure. In other words, if you are calculating a value for array[x1][y1] and the answer is dependant on array[x2][y2], then parallelization will not work…because it is possible that another thread can operate on array[x2][y2] either before or after you work on array[x1][y1].

Parallization is not easy. You can’t just slap OMP pragmas everywhere - you do, unfortunately, still have to know what’s going on.

  • Andy

#11

Andy

I appreciate everything you’ve said. Here is a new code that mimics what I
need to do. It has the same problem. If you can have a look at the code
would be great.
If not possible so be it… Thanks.

Kony

#include
#include
#include
#include
#include
#include
#include
#include <blitz/array.h>
#include
#include
#include
#include
#include
#include
#include
#include
#include “gestimer.h”

using namespace std;
using namespace blitz;

// const char* delimiters = " \t\n;()"<>:{}[]±=&#.,/\~?";
const char
delimiters = " \t\n";
const char* numbers = “0123456789”;

struct Parents {
int sire;
int dam;
int knownParent;
};

class Count {
int i;
public:
Count() : i(0) {}
void operator++(int) { i++; } // Post-increment
int& val() { return i; }
};

typedef map<int, Count> WordMap;
typedef WordMap::iterator WMIter;

void
getTokens(string& stringToModify, vector& vec) {
// Skip delimiters at beginning.
string::size_type lastPos =
stringToModify.find_first_not_of(delimiters, 0);
// Find first “non-delimiter”.
string::size_type pos = stringToModify.find_first_of(delimiters,
lastPos);

while (string::npos != pos || string::npos != lastPos)
{
    // Found a token, add it to the vector.
    vec.push_back(stringToModify.substr(lastPos, pos - lastPos));
    // Skip delimiters.  Note the "not_of"
    lastPos = stringToModify.find_first_not_of(delimiters, pos);
    // Find next "non-delimiter"
    pos = stringToModify.find_first_of(delimiters, lastPos);
}

}

void
rewind(ifstream& dataFile, string dataName) {
//rewind the data file - close and open the file
dataFile.close();
dataFile.clear();
dataFile.open(dataName.c_str());
}

//=====================================================================
// File: gestimer.h
// Purpose: Header file for the GESTimer class.
//=====================================================================

// Adapted from Rogue Wave’s RWTimer class
//
// Author: Martin Schweitzer

#ifndef GES_TIMER_H
#define GES_TIMER_H

class GESTimer {

double startTime_;
double stopTime_;
bool isStopped_;

static double absoluteTime();

public:
GESTimer();

double elapsedTime() const;
void reset();
void start();
void stop();
};

#endif /* GES_TIMER_H */

//=====================================================================
// File: gestimer.cpp
// Purpose: Member function definitions for the GESTimer class.
//=====================================================================

#include
#include
//#include “gestimer.h”

GESTimer::GESTimer() :
startTime_(0),
stopTime_(0),
// isStopped_(TRUE)
isStopped_(true) // changed by KK
{
}

double
GESTimer::elapsedTime() const
{
return (isStopped_ ? stopTime_ : absoluteTime()) - startTime_;
}

void
GESTimer::reset()
{
startTime_ = 0;
stopTime_ = 0;
// isStopped_ = TRUE;
isStopped_ = true;
}

void
GESTimer::start()
{
startTime_ = absoluteTime() - elapsedTime();
// isStopped_ = FALSE;
isStopped_ = false;
}

void
GESTimer::stop()
{
stopTime_ = absoluteTime();
// isStopped_ = TRUE;
isStopped_ = true;
}

double
GESTimer::absoluteTime()
{
return (double)clock() / CLOCKS_PER_SEC;
}

int main() {

Array<int,2> dataMatrix, address;
Array<int,1> dataRecord;
Array<double,2> coefficient;
Array<double,3> Rinv;
Array<double,1> p, q, t1, t2;
dataMatrix.resize(5000000,30);
dataRecord.resize(30);
int newNumFactors = 20;
int missingValue = -1;
int numberOfTraits = 9;
int nrow;
address.resize(newNumFactors,numberOfTraits);
coefficient.resize(newNumFactors,numberOfTraits);
Rinv.resize(1000,numberOfTraits,numberOfTraits);
p.resize(10000000);
q.resize(10000000);
t1.resize(numberOfTraits);
t2.resize(numberOfTraits);
for (int i = 0; i < 1000; i++) {
Rinv(i,Range::all(),Range::all()) = 5.0;
}
GESTimer stopwatch;

    stopwatch.reset();
    stopwatch.start();

    dataMatrix = 1;
    p = 4.0;
    q = 0.0;
      int numOfThreads = 5;
      //#pragma omp parallel for num_threads(numOfThreads)

#pragma omp parallel
{
#pragma omp for
for (int ir = 0; ir < dataMatrix.rows(); ir++) {
dataRecord(Range::all()) = dataMatrix(ir, Range::all());
nrow = 63;
// find address and design matrix coefficients
// getAddress();
for (int i = 0; i < newNumFactors; i++) {
for (int k = 0; k < numberOfTraits; k++) {
address(i,k) = i + k + 100;
coefficient(i,k) = 0.707;
}
}

// calculate contributions
t1 = 0.0;
for (int i = 0; i < newNumFactors; i++) {
for (int k = 0; k < numberOfTraits; k++) {
if (address(i,k) == missingValue) {continue;}
t1(k) += coefficient(i,k)*p(address(i,k));
}
}

    t2 = 0.0;
    for (int k = 0; k < numberOfTraits; k++) {
    for (int l = 0; l < numberOfTraits; l++) {
    t2(k) += Rinv(nrow,k,l)*t1(l);
    }
    }
    for (int i = 0; i < newNumFactors; i++) {
    for (int j = 0; j < numberOfTraits; j++) {
    if (address(i,j) == missingValue) {continue;}
    q(address(i,j)) += coefficient(i,j)*t2(j);
    }
    }
    }     // end of data loop
        }


    stopwatch.stop();
    cout << "Time taken to calculate A*p = q is: " <<

stopwatch.elapsedTime() << endl;

return EXIT_SUCCESS;

}

#include
#include
#include
#include
#include
#include
#include
#include <blitz/array.h>
#include
#include
#include
#include
#include
#include
#include
#include
#include “gestimer.h”

using namespace std;
using namespace blitz;

// const char* delimiters = " \t\n;()"<>:{}[]±=&#.,/\~?";
const char
delimiters = " \t\n";
const char* numbers = “0123456789”;

struct Parents {
int sire;
int dam;
int knownParent;
};

class Count {
int i;
public:
Count() : i(0) {}
void operator++(int) { i++; } // Post-increment
int& val() { return i; }
};

typedef map<int, Count> WordMap;
typedef WordMap::iterator WMIter;

void
getTokens(string& stringToModify, vector& vec) {
// Skip delimiters at beginning.
string::size_type lastPos =
stringToModify.find_first_not_of(delimiters, 0);
// Find first “non-delimiter”.
string::size_type pos = stringToModify.find_first_of(delimiters,
lastPos);

while (string::npos != pos || string::npos != lastPos)
{
    // Found a token, add it to the vector.
    vec.push_back(stringToModify.substr(lastPos, pos - lastPos));
    // Skip delimiters.  Note the "not_of"
    lastPos = stringToModify.find_first_not_of(delimiters, pos);
    // Find next "non-delimiter"
    pos = stringToModify.find_first_of(delimiters, lastPos);
}

}

void
rewind(ifstream& dataFile, string dataName) {
//rewind the data file - close and open the file
dataFile.close();
dataFile.clear();
dataFile.open(dataName.c_str());
}

//=====================================================================
// File: gestimer.h
// Purpose: Header file for the GESTimer class.
//=====================================================================

// Adapted from Rogue Wave’s RWTimer class
//
// Author: Martin Schweitzer

#ifndef GES_TIMER_H
#define GES_TIMER_H

class GESTimer {

double startTime_;
double stopTime_;
bool isStopped_;

static double absoluteTime();

public:
GESTimer();

double elapsedTime() const;
void reset();
void start();
void stop();
};

#endif /* GES_TIMER_H */

//=====================================================================
// File: gestimer.cpp
// Purpose: Member function definitions for the GESTimer class.
//=====================================================================

#include
#include
//#include “gestimer.h”

GESTimer::GESTimer() :
startTime_(0),
stopTime_(0),
// isStopped_(TRUE)
isStopped_(true) // changed by KK
{
}

double
GESTimer::elapsedTime() const
{
return (isStopped_ ? stopTime_ : absoluteTime()) - startTime_;
}

void
GESTimer::reset()
{
startTime_ = 0;
stopTime_ = 0;
// isStopped_ = TRUE;
isStopped_ = true;
}

void
GESTimer::start()
{
startTime_ = absoluteTime() - elapsedTime();
// isStopped_ = FALSE;
isStopped_ = false;
}

void
GESTimer::stop()
{
stopTime_ = absoluteTime();
// isStopped_ = TRUE;
isStopped_ = true;
}

double
GESTimer::absoluteTime()
{
return (double)clock() / CLOCKS_PER_SEC;
}

int main() {

Array<int,2> dataMatrix, address;
Array<int,1> dataRecord;
Array<double,2> coefficient;
Array<double,3> Rinv;
Array<double,1> p, q, t1, t2;
dataMatrix.resize(5000000,30);
dataRecord.resize(30);
int newNumFactors = 20;
int missingValue = -1;
int numberOfTraits = 9;
int nrow;
address.resize(newNumFactors,numberOfTraits);
coefficient.resize(newNumFactors,numberOfTraits);
Rinv.resize(1000,numberOfTraits,numberOfTraits);
p.resize(10000000);
q.resize(10000000);
t1.resize(numberOfTraits);
t2.resize(numberOfTraits);
for (int i = 0; i < 1000; i++) {
Rinv(i,Range::all(),Range::all()) = 5.0;
}
GESTimer stopwatch;

    stopwatch.reset();
    stopwatch.start();

    dataMatrix = 1;
    p = 4.0;
    q = 0.0;
      int numOfThreads = 5;
      //#pragma omp parallel for num_threads(numOfThreads)

#pragma omp parallel
{
#pragma omp for
for (int ir = 0; ir < dataMatrix.rows(); ir++) {
dataRecord(Range::all()) = dataMatrix(ir, Range::all());
nrow = 63;
// find address and design matrix coefficients
// getAddress();
for (int i = 0; i < newNumFactors; i++) {
for (int k = 0; k < numberOfTraits; k++) {
address(i,k) = i + k + 100;
coefficient(i,k) = 0.707;
}
}

// calculate contributions
t1 = 0.0;
for (int i = 0; i < newNumFactors; i++) {
for (int k = 0; k < numberOfTraits; k++) {
if (address(i,k) == missingValue) {continue;}
t1(k) += coefficient(i,k)*p(address(i,k));
}
}

    t2 = 0.0;
    for (int k = 0; k < numberOfTraits; k++) {
    for (int l = 0; l < numberOfTraits; l++) {
    t2(k) += Rinv(nrow,k,l)*t1(l);
    }
    }
    for (int i = 0; i < newNumFactors; i++) {
    for (int j = 0; j < numberOfTraits; j++) {
    if (address(i,j) == missingValue) {continue;}
    q(address(i,j)) += coefficient(i,j)*t2(j);
    }
    }
    }     // end of data loop
        }


    stopwatch.stop();
    cout << "Time taken to calculate A*p = q is: " <<

stopwatch.elapsedTime() << endl;

return EXIT_SUCCESS;

}

#include
#include
#include
#include
#include
#include
#include
#include <blitz/array.h>
#include
#include
#include
#include
#include
#include
#include
#include
#include “gestimer.h”

using namespace std;
using namespace blitz;

// const char* delimiters = " \t\n;()"<>:{}[]±=&#.,/\~?";
const char
delimiters = " \t\n";
const char* numbers = “0123456789”;

struct Parents {
int sire;
int dam;
int knownParent;
};

class Count {
int i;
public:
Count() : i(0) {}
void operator++(int) { i++; } // Post-increment
int& val() { return i; }
};

typedef map<int, Count> WordMap;
typedef WordMap::iterator WMIter;

void
getTokens(string& stringToModify, vector& vec) {
// Skip delimiters at beginning.
string::size_type lastPos =
stringToModify.find_first_not_of(delimiters, 0);
// Find first “non-delimiter”.
string::size_type pos = stringToModify.find_first_of(delimiters,
lastPos);

while (string::npos != pos || string::npos != lastPos)
{
    // Found a token, add it to the vector.
    vec.push_back(stringToModify.substr(lastPos, pos - lastPos));
    // Skip delimiters.  Note the "not_of"
    lastPos = stringToModify.find_first_not_of(delimiters, pos);
    // Find next "non-delimiter"
    pos = stringToModify.find_first_of(delimiters, lastPos);
}

}

void
rewind(ifstream& dataFile, string dataName) {
//rewind the data file - close and open the file
dataFile.close();
dataFile.clear();
dataFile.open(dataName.c_str());
}

//=====================================================================
// File: gestimer.h
// Purpose: Header file for the GESTimer class.
//=====================================================================

// Adapted from Rogue Wave’s RWTimer class
//
// Author: Martin Schweitzer

#ifndef GES_TIMER_H
#define GES_TIMER_H

class GESTimer {

double startTime_;
double stopTime_;
bool isStopped_;

static double absoluteTime();

public:
GESTimer();

double elapsedTime() const;
void reset();
void start();
void stop();
};

#endif /* GES_TIMER_H */

//=====================================================================
// File: gestimer.cpp
// Purpose: Member function definitions for the GESTimer class.
//=====================================================================

#include
#include
//#include “gestimer.h”

GESTimer::GESTimer() :
startTime_(0),
stopTime_(0),
// isStopped_(TRUE)
isStopped_(true) // changed by KK
{
}

double
GESTimer::elapsedTime() const
{
return (isStopped_ ? stopTime_ : absoluteTime()) - startTime_;
}

void
GESTimer::reset()
{
startTime_ = 0;
stopTime_ = 0;
// isStopped_ = TRUE;
isStopped_ = true;
}

void
GESTimer::start()
{
startTime_ = absoluteTime() - elapsedTime();
// isStopped_ = FALSE;
isStopped_ = false;
}

void
GESTimer::stop()
{
stopTime_ = absoluteTime();
// isStopped_ = TRUE;
isStopped_ = true;
}

double
GESTimer::absoluteTime()
{
return (double)clock() / CLOCKS_PER_SEC;
}

int main() {

Array<int,2> dataMatrix, address;
Array<int,1> dataRecord;
Array<double,2> coefficient;
Array<double,3> Rinv;
Array<double,1> p, q, t1, t2;
dataMatrix.resize(5000000,30);
dataRecord.resize(30);
int newNumFactors = 20;
int missingValue = -1;
int numberOfTraits = 9;
int nrow;
address.resize(newNumFactors,numberOfTraits);
coefficient.resize(newNumFactors,numberOfTraits);
Rinv.resize(1000,numberOfTraits,numberOfTraits);
p.resize(10000000);
q.resize(10000000);
t1.resize(numberOfTraits);
t2.resize(numberOfTraits);
for (int i = 0; i < 1000; i++) {
Rinv(i,Range::all(),Range::all()) = 5.0;
}
GESTimer stopwatch;

    stopwatch.reset();
    stopwatch.start();

    dataMatrix = 1;
    p = 4.0;
    q = 0.0;
      int numOfThreads = 5;
      //#pragma omp parallel for num_threads(numOfThreads)

#pragma omp parallel
{
#pragma omp for
for (int ir = 0; ir < dataMatrix.rows(); ir++) {
dataRecord(Range::all()) = dataMatrix(ir, Range::all());
nrow = 63;
// find address and design matrix coefficients
// getAddress();
for (int i = 0; i < newNumFactors; i++) {
for (int k = 0; k < numberOfTraits; k++) {
address(i,k) = i + k + 100;
coefficient(i,k) = 0.707;
}
}

// calculate contributions
t1 = 0.0;
for (int i = 0; i < newNumFactors; i++) {
for (int k = 0; k < numberOfTraits; k++) {
if (address(i,k) == missingValue) {continue;}
t1(k) += coefficient(i,k)*p(address(i,k));
}
}

    t2 = 0.0;
    for (int k = 0; k < numberOfTraits; k++) {
    for (int l = 0; l < numberOfTraits; l++) {
    t2(k) += Rinv(nrow,k,l)*t1(l);
    }
    }
    for (int i = 0; i < newNumFactors; i++) {
    for (int j = 0; j < numberOfTraits; j++) {
    if (address(i,j) == missingValue) {continue;}
    q(address(i,j)) += coefficient(i,j)*t2(j);
    }
    }
    }     // end of data loop
        }


    stopwatch.stop();
    cout << "Time taken to calculate A*p = q is: " <<

stopwatch.elapsedTime() << endl;

return EXIT_SUCCESS;

}


(Andrew le Bihan) #12

I think you can see for yourself that it is not going to be possible to deal with this. If I’m going to look into the problem for you, I will need:

a) Something I can actually load into developer studio and run.
b) Something that has already been pared down to a minimum set of instructions that repeats the problem.

You will probably find that doing b) will actually resolve the problem.

  • Andy

#13

OK Andy, Thanks for your support.

kony

Notice:

This email and any attachments may contain information that is personal,
confidential, legally privileged and/or copyright. No part of it should be
reproduced, adapted or communicated without the prior written consent of the
copyright owner.

It is the responsibility of the recipient to check for and remove viruses.

If you have received this email in error, please notify the sender by return
email, delete it from your system and destroy any copies. You are not authorised
to use, communicate or rely on the information contained in this email.

Please consider the environment before printing this email.