Real Time Analytics Engine

My current project at work is to architect a solution to capturing network packet data sent over UDP, aggregate it, analyze it and report it back up to our end users. At least, that’s the vision. It’s a big undertaking and there are several avenues of approach that we can take. To that end, I’ve started prototyping to create a test harness to see the write performance/read performance of various software at different levels in our software stack.

Our criteria are as follows (taken from a slide I created):

Our Criteria
Our Criteria

Based on these four requirements pillars, I’ve narrowed down the platform choices to the following:

SERVER

nodejsStrengths

      • Single-threaded, event loop model
      • Callbacks executed when request made (async), can handle 1000s more req than tomcat
      • No locks, no function in node directly performs I/O
      • Great at handling lots of small dynamic requests
      • Logo is cool

Weaknesses

      • Taking advantage of multi-core CPUs
      • Starting processes via child_process.fork()
      • APIs still in development
      • Not so good at handling large buffers

logo-white-big

Strengths

        • Allows writing code as single-threaded (Uses Distributed Event Bus, distributed peer-to-peer messaging system)
        • Can write “verticles” in any language
        • Built on JVM and scales over multiple cores w/o needing to fork multiple servers
        • Can be embedded in existing Java apps (more easily)

Weaknesses

        • Independent project (very new), maybe not well supported?
        • Verticles allow blocking of the event loop, which will cause blocking in other verticles. Distribution of work product makes it more likely to happen.
        • Multiple languages = debugging hell?
        • Is there overhead in scaling over multiple nodes?

PERSISTENCE

For our data, we are considering several NoSQL databases. Our data integrity is not that important, because our data is not make-or-break for a company. However, it is essential to be highly performant, with upwards of 10-100k, fixed-format data writes per second but much fewer reads.

mongo

Architecture

        • Maps in memory space directly to disk
        • B-tree indexing guarantee logarithmic performance

Data Storage

        • “Documents” akin to objects
        • Uses BSON (binary JSON)
        • Fields are key-value pairs
        • “Collections” akin to tables
        • No conversion necessary from object in data to object in programming language

Data-Replication

        • Mongo handles sharding for you
        • Uses a primary and secondary hierarchy (primary receives all writes)
        • Automatic failover

Strengths

        • Document BSON structure is very flexible, intuitive and appropriate for certain types of data
        • Easily query-able using mongo’s query language
        • Built-in MapReduce and aggregation
        • BSON maps easily onto JSON, makes it easy to consume in front end for charting/analytics/etc
        • Seems hip

Weaknesses

        • Questionable scalability and write performance for high volume bursts
        • Balanced B-Tree indices maybe not best choice for our type of data (more columnar/row based on timestamp)
        • NoSQL but our devs are familiar with SQL
        • High disk space/memory usage

 

cassandra
Architecture

        • Peer to peer distributed system, Cassandra partitions for you
        • No master node, so you can read-write anywhere

Data Storage

        • Row-oriented
        • Keyspace akin to database
        • Column family akin to table
        • Rows make up column families

Data Replication

        • Uses a “gossip” protocol
        • Data written to a commit log, then to an in-memory database (memtable) then to disk using a sorted-strings table
        • Customizable by user (allows tunable data redundancy)

Strengths

        • Highly scalable, adding new nodes give linear performance increases
        • No single point of failure
        • Read-write from anywhere
        • No need for caching software (handled by the database cluster)
        • Tunable data consistency (depends on use case, but can enforce transaction)
        • Flexible schema
        • Data compression built in (no perf penalty)

Weaknesses

        • Data modeling is difficult
        • Another query language to learn
        • Best stuff is used by Facebook, but perhaps not released to the public?

 

467px-Redis_Logo.svg

            TBD

 

        FRONT-END/REST LAYER

For our REST Layer/Front-end, I’ve built apps in JQuery, Angular, PHP, JSP and hands-down my favorite is Angular. So that seems like the obvious choice here.

AngularJS-large

Strengths

      • No DOM Manipulation! Can’t say how valuable this is …
      • Built in testing framework
      • Intuitive organization of MVC architecture (directives, services, controllers, html bindings)
      • Built by Google, so trustworthy software

Weaknesses

      • Higher level than JQuery, so DOM manipulation is unadvised and more difficult to do
      • Fewer 3rd party packages released for angular
      • Who knows if it will be the winning javascript framework out there
      • Learning curve

Finally, for the REST API, I’ve also pretty much decided on express (if we go with node.js):

express

Strengths

      • Lightweight
      • Easy integration into node.js
      • Flexible routing and callback structure, authorization & middleware

Weaknesses

      • Yet another javascript package
      • Can’t think of any, really (compared to what we are using in our other app – java spring

These are my thoughts so far. In the following posts, I’ll begin describing how I ended up prototyping our real time analytics engine, creating a test harness, and providing modularity so that we can make design decisions without too much downtime.

Advertisements

Mapping Google Datatable JSON format to Sencha GXT ListStores for charting

By now, if you’ve been keeping up with my blog, you’ll notice that we make extensive use of charting in our application. We decided to go with Sencha GXT charting because we were under time pressure to finish our product for a release. However, we weren’t so crunched for time that I decided to model the data in an unintelligent way. Since charts essentially graphically display tabular data, I wanted to ensure that if we were to move to a different graphing package (which we are, btw) in the future, our data was set up for easy consumption.

After tooling around a bit, I decided on google datatable JSON formatting for our response data from queries made to our server. Google datatable formatting essentially just mimics tabular data in JSON formatting. It’s also easily consumable by google or angular charts. Having used angular charts in the past, I knew that the data format would map nicely to charts. However, the immediate problem still loomed, which is how to convert from google datatable JSON format to consumable ListStores for consumption by GXT charts?

Solution: In the end, I ended up writing an adapter with specific implementations for consuming our backend charting data. I needed to write a Sencha/GXT specific one. Here’s an example of the data we receive from our REST query.

//Note: some data omitted for readability
{
    "offset": 0,
    "totalCount": 1,
    "items": [
        {
            "title": "Registration Summary",
            "data": {
                "cols": [
                    {
                        "id": "NotRegistered",
                        "label": "Not Registered",
                        "type": "number"
                    },
                    {
                        "id": "Registered",
                        "label": "Registered",
                        "type": "number"
                    }
                ],
                "rows": [
                    {
                        "l": "9608",
                        "c": [
                            {
                                "v": "5",
                                "f": null
                            },
                            {
                                "v": "4",
                                "f": null
                            }
                        ]
                    },
                    {
                        "l": "9641",
                        "c": [
                            {
                                "v": "2",
                                "f": null
                            },
                            {
                                "v": "5",
                                "f": null
                            }
                        ]
                    },
                    {
                        "l": "9650SIP",
                        "c": [
                            {
                                "v": "3",
                                "f": null
                            },
                            {
                                "v": "0",
                                "f": null
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

And here’s an example of how it’s converted into a Sencha ListStore:

[
    {
        v1=4,
        label0=NotRegistered,
        v0=5,
        label1=Registered,
        id1=Registered,
        type1=number,
        id0=NotRegistered,
        type0=number,
        f1=null,
        l=9608,
        f0=null
    },
    {
        v1=5,
        label0=NotRegistered,
        v0=2,
        label1=Registered,
        id1=Registered,
        type1=number,
        id0=NotRegistered,
        type0=number,
        f1=null,
        l=9641,
        f0=null
    },
    {
        v1=0,
        label0=NotRegistered,
        v0=3,
        label1=Registered,
        id1=Registered,
        type1=number,
        id0=NotRegistered,
        type0=number,
        f1=null,
        l=9650SIP,
        f0=null
    }
...
]

Step 1: Define the ChartModel I/F to map to the JSON datatable format.

Here’s an example of a ChartModel java object that maps to JSON object.

public abstract class ChartModel {
	
	private static ChartFactory factory = GWT.create(ChartFactory.class);
	private static JsonListReader<ChartObjects, ChartObject> reader = new JsonListReader<ChartObjects, ChartObject>(factory, ChartObjects.class );
	
	/**
	 * These constants are fixed per the value retrieved from JSON
	 */
	public static final String POINT = "point";
	public static final String COL_LABEL = "label";
	public static final String COL_TYPE = "type";
	public static final String COL_ID = "id";
	public static final String CELL_VALUE = "v";
	public static final String CELL_FORMATTED_VALUE = "f";
	public static final String ROW_LABEL = "l";
	
	/*
	 * Define the data model
	 */
	
	public interface ChartObjects extends ModelList<ChartObject>{}
	
	public interface ChartObject {
		String getTitle();
		ChartData getData();
	}
	
	public interface ChartData {
		List<ChartCol> getCols();	//represents # series
		List<ChartRow> getRows();	//an array of array of cells, i.e. all the data
	}
	
	public interface ChartCol {
		String getId();
		String getLabel();
		String getType();
	}
	
	public interface ChartRow {
		List<ChartCell> getC();	//an array of chart cells, one per column
		String getL();	//label for this group of chart cells (e.g. a timestamp)
	}
	
	public interface ChartCell {
		String getV();	//get value
		String getF();	//get formatted value
	}

	public abstract ListStore<DataPoint> getListStore(ChartObject chartObject) throws Exception;
...
}

You can see that I also added an abstract getListStore method that requires any consumer of the ChartObject to define how the ChartObject gets modeled to the ListStore.

Step 2. Create an Adapter that extends the ChartModel and implements the function getListStore.

Here’s an example of a StackedBarChart model (the models turned out to be different for multiple series in a stacked bar chart and for example, a pie chart. Line charts and stacked bar charts turned out to be the same.


public class StackedBarChartModel extends ChartModel {
	@Override
	public ListStore<DataPoint> getListStore(ChartObject chartObject) throws Exception {
		return new GXTChartModelAdapterImpl(chartObject).getSeries(ChartType.STACKED_BAR);
	}
}

You can see that this method instantiates a new adapter instance for GXT charts, passes in the chart object (modeled to the datatable JSON format) and returns the appropriate chart series (in this case a BarSeries GXT object).

Step 3. Map the chart object to a ListStore.

public class GXTChartModelAdapterImpl implements GXTChartModelAdapter {


	@Override
	public ListStore<DataPoint> getSeries(ChartType type) throws Exception {
		switch(type) {
			case LINE:
				return getLineSeriesListStore();
			case STACKED_BAR:
				return getStackedBarSeriesListStore();
			case BAR:
				return getLineSeriesListStore();
			case PIE:
				return getPieSeriesListStore();
			default:
				throw new Exception("Unknown data series type.");
		}
	}

/**
	 * Returns a ListStore for a BarSeries GXT object to add to a chart
	 * @return
	 */
	private ListStore<DataPoint> getStackedBarSeriesListStore() {
		ListStore<DataPoint> _listStore = new ListStore<DataPoint>(new ModelKeyProvider<DataPoint>() {
			@Override
			public String getKey(DataPoint item) {
				return String.valueOf(item.getKey());
			}
		});
		
		int numPoints = this.getNumPoints();  //The number of rows in the rows obj in ChartObject
		int numSeries = this.getNumSeries();  //The number of rows in the col obj in ChartObject
		
		for (int i = 0; i < numPoints; i++) {	
			//This key must be unique per DataPoint in the store
			DataPoint d = new DataPoint(indexStr(ChartModel.POINT,Random.nextInt()));
			
			for (int index = 0; index < numSeries; index++) {
				d.put(indexStr(ChartModel.COL_LABEL, index), chartObject.getData().getCols().get(index).getLabel());
				d.put(indexStr(ChartModel.COL_TYPE, index), chartObject.getData().getCols().get(index).getType());
				d.put(indexStr(ChartModel.COL_ID, index), chartObject.getData().getCols().get(index).getId());
			}
			
			ChartRow row = chartObject.getData().getRows().get(i);	//get the i-th point
			d.put(ChartModel.ROW_LABEL, row.getL());
			for (int j = 0; j < numSeries; j++) {
				if (row.getC().get(j) != null) {	//if ith-point is not blank
					d.put(indexStr(ChartModel.CELL_VALUE, j), row.getC().get(j).getV());
					d.put(indexStr(ChartModel.CELL_FORMATTED_VALUE, j), row.getC().get(j).getF());
				} else {	//otherwise, assume 0 value
					d.put(indexStr(ChartModel.CELL_VALUE, j), "0");
					d.put(indexStr(ChartModel.CELL_FORMATTED_VALUE, j), "");
				}
			}
			_listStore.add(d);
		}
		
		return _listStore;
	}
...
}

A few things to note here.

  • A DataPoint is basically just a map of keys and value pairs. Since the DataPoint object is specific to the ListStore, the implementation is in the interface GXTChartModelAdapter. To do this, I used the instructions here: https://www.sencha.com/blog/building-gxt-charts/
  • Next, you can see that for each bar I have (not each stack), I create a new DataPoint object that contains a the corresponding value of the specific row in the ChartObject.rows object. I know that each column corresponds to a stacked bar, so I increment the “v” value by 1 and use this as the key / value pair to the DataPoint.
  • Finally, because the number of columns defines the number of stacks I expect in each bar, if there isn’t a value (because our backend didn’t provide one), I assume the value is 0. This enforces that the v(n) in the first bar will correspond to v(n) in any subsequent bars.
  • I added an “l” value to each row in the ChartObject.rows obj which corresponds to a label for the entire stacked bar. I know that in the datatables JSON format you can specify the first row in each row obj to be a String and correspond to the label for each bar. However, it’s harder to enforce this type of thing in java and we expected numeric data back each time.


Step 4. Pass in the data into a Sencha GXT Chart!

Now the easy part. Essentially the data is transformed from the datatables format to a ListStore with each DataPoint key: v0 – vN corresponding to a stack in a bar.

Thus when instantiating the chart, this is all I had to do:

private BarSeries<DataPoint> createStackedBar() {

		BarSeries<DataPoint> series = new BarSeries<DataPoint>();
		series.setYAxisPosition(Position.LEFT);
		series.setColumn(true);
		series.setStacked(true);
		
		//numSeries is the number of bars you want stacked, it conforms to the chartObject.cols().length
		for (int i = 0; i < numSeries; i++) {
			MapValueProvider valueProvider = new MapValueProvider(ChartModel.CELL_VALUE + i);
			series.addYField(valueProvider);
		}
		
		return series;
	}

When I instantiate the BarSeries, I just pass in a key for each the number of stacked bars in my response data.

Finally, the finished product.

Screen Shot 2014-04-09 at 5.03.44 PM

Custom Resized Portlets using Sencha GXT 3.1 Beta

Sencha makes a container called the PortalLayoutContainer available through its API which allows the developer to add portlets to the container. The container extends the VerticalLayoutContainer and works by the user adding portlet widgets to the container in predefined column widths. The PLC doesn’t allow users to specify a grid with a predefined portlet height so that portlets dropped in will fit that size. Hence, we had some portlets that didn’t render nicely because they were too big for screens with low vertical resolutions. Also, because we don’t know what kind of screen resolution our users are going to use, we couldn’t just set a fixed height and forget about it.

In other words, sometimes one just wants to add a portlet that say, takes up 50% of the vertical screen real-estate rather than a fixed 450px. How did I solve this problem? Disclaimer: I admit that this solution is a bit of a hack, but without any other exposed API I did what I had to do.

Where's the rest of my portlet?
Where’s the rest of my portlet?

Solution: I had to resort to event handlers that listened for a resize of the PortalLayoutContainer. Essentially what this method does is traverse up the widget hierarchy until it finds a PortalLayoutContainer, then binds a resize handler for the specific portlet to the PLC. When the resize event is triggered on the PLC, each individual portlet is resized based on the initial parameter of how much screen real estate the portlet should take up. I also set a minimum size so that the portlets would not continue to resize if the overall PLC got too small (where the data started looking bad). You could easily set a maximum size as well so the portlets would not continue resizing after a certain point.

/**
	 * Attaches a resizeHandler to the parent (which is assumed to be a PortalLayoutContainer
	 * You need to call this method after the _portlet is added to the container
	 * @param width - the percentage of width of the window to resize to
	 * @param height - the percentage of height of the window to resize to
	 */
	public void bindResizeHandler(final double width, final double height) {
		if (_portlet != null) {
			_config.setResizable(Boolean.TRUE);
			
			//traverse up widget hierarchy until we reach PortalLayoutContainer
			Widget w = _portlet.getParent();
			while (w != null && !(w instanceof PortalLayoutContainer)) {
				w = w.getParent();
			}
			
			//Resize
			PortalLayoutContainer plc = ((PortalLayoutContainer) w);
			if (plc != null) {
				plc.addResizeHandler(new ResizeHandler () {

					@Override
					public void onResize(ResizeEvent event) {
						int newWidth = Math.max((int) (event.getWidth() * width) - 10, _config.getMinWidth());
						int newHeight = Math.max((int) (event.getHeight() * height) - 10, _config.getMinHeight());
						
						//Only resize if width or height > minWidth or minHeight - note min width doesn't quite work right
						//because GXT widgets still try to autoresize
						if (_config.getWidth() != newWidth || _config.getHeight() != newHeight) {
							_config.setWidth(newWidth);
							_config.setHeight(newHeight);
							resize(_config.getWidth(), _config.getHeight());
						}
					}
					
				});
			}
		}
	}
	
	@Override
	public void resize(int width, int height) {
		if (_config.getResizable()) {
			_portlet.setWidth(width);
			_portlet.setHeight(height);
			_portlet.forceLayout();
		}
	}

I know that this turns the portal container into more of a “dashboard,” but that’s what our requirements called for. If you want a portlet to have a fixed size, simply don’t bind the handler to that specific portlet and the height will remain fixed.

Here’s how I bound each portlet to the PLC during instantiation.

//a PLC is made with two columns, each 50% and one column that's 100% (a hack to get a 50/50/100 split
			_portalContainer = new PortalLayoutContainer(3);
			_portalContainer.setColumnWidth(0, .5);
			_portalContainer.setColumnWidth(1, .5);
			_portalContainer.setColumnWidth(2, 1);
			_updateList = new ArrayList<UpdateablePortlet>();

//A portlet config is my own object that has parameters like whether the portlet is resizable and options specific to the data used to populate the portlet
			PortletConfig _tjpConfig = new PortletConfig();
			_tjp = new TestJobsPortlet(_tjpConfig);
			_portalContainer.add(_tjp.asWidget(), 1);

//This will add the portlet to the top right corner of the PLC and bind a resize handler to take up 50% of the width and 50% of the height (i.e. 1/4 of the screen) whenever the PLC is resized
			_tjp.bindResizeHandler(.5, .5);

And here’s the result:

much better!
much better!

Custom axes with Sencha 3.1 GXT Charts

Here’s a short idiom for today. If you have a Sencha chart with multiple BarSeries, you can click on the legend by default, which will “exclude” certain values from the recalculation of the y-axis minimum and maximum. We found that there are certain times when Sencha will compute the max range of a chart as a value less than 10, while the axis has 10 tick marks. This leaves some ugly decimal points in the y-axis labels, which just doesn’t make sense in some instances (e.g. you can’t have a fraction of a whole object).

Bad! You can't have a fraction of a device!
Bad! You can’t have a fraction of a device!

In order to get whole numbers here, I added a legend handler which recomputes the axis and sets a minimum value if Sencha would compute a < 10 max for the y-axis. Here's the code snippet:

			legend.addLegendSelectionHandler(new LegendSelectionHandler() {

				@Override
				public void onLegendSelection(LegendSelectionEvent event) {
					if (axis.getTo() < 10) {
						axis.setMaximum(10.0);
					} else {
						axis.setMaximum(Double.NaN);
						axis.setInterval(Double.NaN);
					}
					axis.calcEnds();
					axis.drawAxis(false);
					_chart.redrawSurface();
				}
				
			});

And here's the result:

Screen Shot 2014-04-01 at 2.04.52 PM
Ahh, much better.