Matrix

From emergent
Jump to: navigation, search
Reference info for type taMatrix: Wiki | Emergent Help Browser

A matrix is a multidimensional data structure that can hold data items of a single type (floating point numbers, integers, etc). They are used to store the data in each column of a DataTable, and the Math libraries provide a large number of commonly-used functions to operate on matrix data (statistics, linear algebra, etc).

See Matrix css for powerful syntax for matrix access in css, and Matrix from DataTable for information on how to access/modify Matrix information within DataTable objects.

Operating on Matrix objects in a Program

Here is some very important information for working with Matrix objects inside a Program:

  • If you want a persistent (saved, loaded, always available) Matrix object, add it to the objs -- this will create the Matrix object there, and a vars variable that points to the objs guy that you can use in your program code to refer to the actual object.
  • Otherwise, if you are reading Matrix values from a DataTable or another source, or are creating temporary Matrix objects, you should use a LocalVars local variable, which automatically manages the memory allocation and deallocation properly. In contrast, the global args/vars variables persist as long as the project is open, meaning that they can hold onto a Matrix* guy even after its overall column has been deleted or modified. This can lead to problems.
  • Keep in mind that all "Object*" program variables (e.g., for Matrix objects) are pointers -- they refer to an actual Matrix object, but they are not actually the object itself. Thus, any matrix program variable must be initialized either by pointing to a cell in a datatable (as explained in Matrix from DataTable), or by creating a new matrix object from scratch (see code example below, or use the NEW_OBJ flag).
  • The best practice is to put the LocalVars (loc vars) containing the Matrix* object pointer inside the same set of code where you will be using the matrix. For example, if this is happening within a loop, put the loc vars at the top of the loop_code. This way, the memory management will all be resolved once the loop or other relevant scoping is done, and you won't end up with any "dangling pointers."

Example Code: Using a DataTable

args
  DataTable* data
vars
  int i
  float cell_avg
prog_code
  for(i=0;i<data.rows;i++) {
    ProgVars:
      float_Matrix cell_mat
    cell_mat = data->GetValAsMatrixColName("MyData", i)     // meth() -- i is row number -- gets data for cell from column "MyData"
    cell_avg = taMath_float::vec_mean(cell_mat)                         // math()
    data->SetVal(cell_avg, "Means", i)                                           // meth() -- store into another column named "Means"

Example Code: Creating from Scratch, Applying Matrix Math, etc

This is the listing of the code from the data_proc/matrix_demo.proj Demo project, which you are encouraged to load and explore (see link for location

matrix demo prog code.png

Matrix syntax in Css and Programs

See Matrix css for full syntax for writing expressions involving Matrix objects in the css scripting language, which is the language used in Program code.

Basic DataTable access with css is explained in DataTable_css

It is very important to remember that a LocalVars matrix variable is a pointer to a Matrix object, not an actual Matrix itself. Therefore, if you want to use it as a freestanding object (instead of e.g., pointing to a matrix in a DataTable or something like that), you must click on the NEW_OBJ flag on the local ProgVar. For example, if you have this code:

  LocalVars:
    int_Matrix mymat; // read this as int_Matrix* mymat = NULL -- a pointer to an int_Matrix
  …
  mymat = [1,2,3]; // this will set the mymat pointer to point to a temporary matrix representing the [1,2,3] values

you will get error messages about reference counts being off, and likely a crash. The reason is that when mymat goes out of scope, it will delete the temporary matrix that it was pointing to, representing [1,2,3], and this will cause problems..

If you instead click the NEW_OBJ flag, then you get this:

  LocalVars:
    int_Matrix mymat; // read this as int_Matrix* mymat = NULL -- a pointer to an int_Matrix
    mymat = new int_Matrix; // this is what NEW_OBJ does (you can see this by looking at the css listing of the program)
  …
  mymat = [1,2,3]; // this is now a copy of the 1,2,3 values from the temporary matrix into the mymat matrix -- not a pointer assignment

Copying and Replicating Sub-Parts of Matricies

The Matrix css syntax allows for some very powerful ways of copying sub-matricies, which can also be used in C++ as well -- there are some fairly awkward methods in the taMatrix and DataTable for doing this as well, but the new "slicing" way of specifying subsets of matrix elements is extremely powerful.

For example, imagine you have an input data DataTable that has an "Input" column that has a cell type with a 4D geometry (for unit groups in a layer), with the unit group (inner 2 dimensions) being 2x3, and the outer 2 dimensions (arrangement of unit groups in the layer) are 4x5. You can set the values of one unit group in one row with the following css commands (at the console -- similar code can be used in Programs):

emergent> print .StdInputData["Input"][:,:,0,0,0]
[ [0,0,0,0,0]: 0, [1,0,0,0,0]: 0, [0,1,0,0,0]: 0, [1,1,0,0,0]: 0, [0,2,0,0,0]: 0, [1,2,0,0,0]: 0 ]
emergent> .StdInputData["Input"][:,:,0,0,0] = [.1, .2, .3];
emergent> print .StdInputData["Input"][:,:,0,0,0]
[ [0,0,0,0,0]: 0.1, [1,0,0,0,0]: 0.2, [0,1,0,0,0]: 0.3, [1,1,0,0,0]: 0.1, [0,2,0,0,0]: 0.2, [1,2,0,0,0]: 0.3 ]

So we're selecting the Input column from the data table, and the :,: specifies all of the first two dimensions (2x3) and only the first (0) of the remainder (last one is the row number).

Note that we only specified 3 specific values to assign to in the right-hand-side expression -- the copy process wraps-around, so that it repeats the sequence twice. Taking further advantage of this wrap-around process, we can now replicate this one unit-group pattern to all other unit groups:

emergent> .StdInputData["Input"][:,:,:,:,0] = .StdInputData["Input"][:,:,0,0,0];

All the other unit groups will now copy the values from that first unit group. As you can see, the possibilities are endless -- you can replicate one unit group across rows, etc.

Furthermore, you can use the conditional filtering functionality to selectively assign values based on their current value:

emergent> .StdInputData["Input"].ar[(.StdInputData["Input"][:,:,:,:,0]>.2)] = [.7, .8, .9];

this assigns all values from the first row that are greater than .2 to the values of .7, .8, .9 as a repeating sequence..

Reference Information for Sub-Types

Subtypes are: int_Matrix, float_Matrix, double_Matrix, String_Matrix, Variant_Matrix, byte_Matrix, and slice_Matrix.

Reference info for type int_Matrix: Wiki | Emergent Help Browser
Reference info for type float_Matrix: Wiki | Emergent Help Browser
Reference info for type double_Matrix: Wiki | Emergent Help Browser
Reference info for type String_Matrix: Wiki | Emergent Help Browser
Reference info for type Variant_Matrix: Wiki | Emergent Help Browser
Reference info for type byte_Matrix: Wiki | Emergent Help Browser
Reference info for type slice_Matrix: Wiki | Emergent Help Browser

Overview of API (somewhat more technical)

Frames

A "frame" is a set of data comprising the outermost dimension; for example, for a 2-d matrix, each frame is one row (for 1-d, a frame is the same as a cell). Frames are used for DataTable columns, where the frame = row of datatable.

Accessors

Most routines provide three accessor variants:

  • Xxx(d0, d1, d2, d3, d4, d5, d6) this is the most common way to access the data -- d's higher than the actual dimension of the matrix are ignored
  • XxxN(const MatrixIndex&) -- for any dimensionality -- it is unspecified whether the dims may be higher, but there must be at least the correct amount
  • Xxx_Flat(int idx) -- treats the elements as a flat 1-d array -- storage is in *row-major order*, i.e., the innermost dimension changes most rapidly
  • "Safe" accessors do bounds checks on the individual indices, as well as the final flat index -- a blank value is returned for out-of-bound values -- it is acceptable (and expected) for out-of-bounds indexes to occur
  • "Fast" accessors do not check bounds and may not check flat indexes -- they must only be used in "guaranteed" index-safe code (i.e., where index values are being driven directly from the matrix itself.)

Slicing

A "slice" is a reference to one frame of a matrix. Slices are used for things like viewing the cell content of a Matrix Column in a DataTable, passing a single data pattern as an event to a network, and so on. A slice is created by making a new instance of the parent matrix type, and initializing it with a fixed data pointer to the parent data, and the appropriate geometry (which is always 1-d less than the parent.) Each slice adds one to the ref count of its parent, so as long as correct ref semantics are used, it is not possible to delete a parent prior to its slice children.

Slices are updated when a parent matrix is resized or redimensioned. HOWEVER it is important that slice clients are aware of when parent resizing may occur, and insure they are not in the process of iterating the data that is being replaced.

Sub-matrix or filterered access via el_view and el_view_mode

The el_view and el_view_mode members specify how Matrix objects support the system-wide subset access "filtering" functionality that is supported by other container classes such as List and Array. These provide the support for the fancy python and matlab-style functionality documented in Matrix css. They use another matrix object (the el_view) to provide one level of indirection in accessing the contents of the Matrix -- primarily either a list of coordinates to show or a boolean mask.

IMPORTANT: only the iterator-based TA_FOREACH_INDEX macro and foreach keyword in css support the el_view access modes in general -- the direct access functions do not, except for the IDX_FRAMES (see below). Thus, it is generally recommended that all general processing of Matrix data use the iterator system. If you want to make sure you're using the view'd access in the most efficient way possible, use Flatten on a view'd matrix to flatten out the view to raw data.

  • IDX_COORDS -- el_view is an int_Matrix of coordinates -- most general view mode (can specify any arbitrary subset of elements), but is somewhat expensive for high-dimensional arrays compared to a mask
  • IDX_MASK -- el_view is a byte_Matrix of boolean 0's and 1's -- 1's indicate where data should be visible, and 0' where it is hidden. This is very fast and useful for logical expression filtering, but the one place it is slow is in computing how many items are visible (requires summing over entire mask). In contrast, this is directly avail in other cases.
  • IDX_FRAMES -- el_view is a 1d int_Matrix of frame numbers -- these correspond to row numbers in the data table columns, where this is used. All of the underlying Frame-based code automatically uses this form of indirection when it is set, including SafeElndex -- the _Flat interfaces do NOT use the frame view.

Resizing

A matrix can be expanded or shrunk in units of frames.

A matrix object can be redimensioned, however this is discouraged -- the supported paradigm is that a matrix should retain a specific geometry for its lifetime. If a matrix is redimensioned, all slices will be "collapsed", meaning the will be set to 0 size.

Generic vs. Strongly Typed Access

Matrix objects are always strongly typed, and can be accessed using strongly typed accessor functions -- the "Fast" versions of these are particularly fast, and "_Flat" can be extremely efficient.

All matrix objects can also use Variant and String accessors to access values generically, or polymorphically. Variant values will use the underlying type, where possible (ex int_Matrix::GetVar return int Variant).

The String value is used for streaming and file save/load.

Notifications

Matrix objects maintain two parallel notification mechanisms:

  • the standard taBase siglink-based notification
  • Qt AbstractItemModel notifications

Changes to data do *not* automatically cause data notifications (this would add an unacceptable penalty to most code.) However, data changes that are mediated by the Qt model do, so other grid views will automatically stay updated.

We also use Struct/Data Begin/End to communicate changes. When a mat has slices, we recursively propogate those notifies to the slices. Note that this are almost invariably gui-driven, so don't entail overhead for low-level data processing. But if you want your low-level updates to cause gui changes, then you must wrap them in a Begin/End block.

Matrix vs. Array

  • 'Array' classes are 1-d vectors, and don't have any explicit support for higher-dimensional access.
  • Array supports dynamic operations, like inserting, sorting, etc.
  • Matrix is ref-counted, and intended for sharing/moving raw data around.
  • Matrix explicitly supports dimensionality and dimensional access.
  • Matrix supports advanced tabular editing operations.
  • Matrix forms the basis for most emergent data processing constructs.