OpenDX - Documentation
Full Contents QuickStart Guide User's Guide User's Reference
Previous Page Next Page Table of Contents Partial Table of Contents Index Search

Categorize

Category

Transformation

Function

Categorizes components of a field

Syntax

output = Categorize(input, name, sort);

Inputs
Name Type Default Description
input field none field to categorize
name string or string list "data" component to categorize
sort flag 1 0: don't sort the added lookup component
1: sort the added lookup component

Outputs
Name Type Description
output field with additional lookup components

Functional Details

input

is the field containing the components to categorize

name

is the name or names of the components to categorize

sort specifies whether the newly created lookup component should be sorted or left in insertion order when it is created.

The Categorize module converts a component of any type to an integer array that references a newly created "lookup" component, which is a list of the unique values in the original component. This serves to

  1. reduce the size of a component that contains duplicate values,
  2. allow conversion of string or vector data to "categorical" data,
  3. detect repeated values in a component, and
  4. create a sorted list of the unique values in a component for inspection.

Each component that is categorized will yield its own lookup component named "compname lookup", where compname is the name of the categorized component.

For example, if the component name is "state" and its values are {"MO", "CA", "MO", "NH", "AK", "NH"} then Categorize(field, "state", 1) would convert component state to: {2, 1, 2, 3, 0, 3} and produce a new component, "state lookup" containing the values {"AK", "CA", "MO", "NH"} or Categorize(field, "state", 0) would convert component state to: {0, 1, 0, 2, 3} and produce a new component, "state lookup" containing the values {"MO", "CA", "NH", "AK"}

Notes:

  1. Categorize works on scalar, string, or vectors of any type, with the lookup component sorted in order of x, y, z, ... If the lookup component has fewer items than the original component, then there are duplicate values in the original component. If the lookup component has 256 or fewer items, the categorized component will be of type unsigned byte; otherwise it will be of type int.

  2. Categorical data can be converted back to its original values using either the Lookup module or Map. If the lookup component is of type string, it can be input as the labels parameter of Plot, ColorBar, or AutoAxes to label the values 0, 1, .. n-1 with the corresponding strings. This helps automate the labelling of categorical plots. Data imported by ImportSpreadsheet can be categorized on import directly by specifying the components to categorize. Statistics on the categorized component, and another associated component, can be found with CategoryStatistics. Include can be used to remove data by category.

Components

Modifies the components specified by name, replacing it by a list of indices. Adds a new component with the name "name lookup" which is a lookup table for component name.

Example Visual Programs

Duplicates.net
Categorical.net          (Categorize is called on import by ImportSpreadsheet)

See Also

CategoryStatistics, ImportSpreadsheet


Full Contents QuickStart Guide User's Guide User's Reference

[ OpenDX Home at IBM | OpenDX.org ]