How to recover lost rides on Strava

I recently started biking, and the fact that I’ve kept at it is due in no small part to Strava.  I really like tracking my progress, seeing how I compare to others and myself.  I’ll admit on more than one occasion, I’ve chosen to ride just to see those weekly numbers tick up.

My obsession with stats can obviously only be sated with more stats.  After putting in the time on long rides, I pore over each segment seeing how my performance stacked up from week to week.  Today, I went on a long ride (~37 miles) going up Montebello road and then down Page Mill.   I huffed and puffed, facing cold rain atop Black Mountain, a sore elbow and unhealthy wobbles descending on the dirt road that almost ended with me eating shit. Yet I was happy to do it all so I could see my stats.

strava2
random strava logo

Then the terrible occurred – when I went to upload the file, nothing appeared in my feed. I waited … then waited some more … and then some more.  It still wouldn’t appear!  I tried recording another activity just to make sure I was able to upload files and it recorded just fine.  Mortified, I thought my three hour suffer-fest would end with nothing to show for it (because in reality, what’s the point of riding without obsessing over times?).

Disclaimer: To be fair, I’m not sure I waited the entire time for my file to upload (it was pretty big, around 2.9 MB, and speaking of which Strava, I highly suggest you compress the files before upload, it would make them so much faster to upload), but still I thought that it was just taking the Strava servers some time to process.

However, it turns out that after a little digging in, I was able to recover the file and successfully upload it to Strava’s servers.  Here’s the process to do so, so your precious activity doesn’t go uncollected.

First some context:

I have an iPhone 6 plus, running iOS 9.3.2 and Strava v.4.16.0 (4015).

Steps:

1. First, make sure the file is still on your mobile device.  On the iPhone, you can connect your phone, open up iTunes, and navigate to Apps/Strava where you can see all of your activities.  Here’s what it looks like: 

Note: If you don’t have this file anymore, unfortunately I think you’re hosed.  You can always try to contact Strava to see if your file is on their server somewhere.

Screen Shot 2016-05-21 at 3.26.59 PM
iTunes strava app data view

2. Next, locate the file that failed to upload to Strava.  This may take a little bit of guess and check, however if it’s your latest activity, you’ll see it sorted by timestamp so it should be at the end:

Screen Shot 2016-05-21 at 3.27.51 PM

3. The next part is a little tricky.  You can save the file to your local machine by right-clicking “Save to…”.  And then you can open the file in a text editor.  I used sublime text (hey I’m a dev).  The file will be huge and looks something like this:

Sample file:

strt: t:485449851.286994
lvseg: t:485537646.134324 on:1
wp: lat:37.337964; long:-121.979434; hacc:5.000000; vacc:3.000000; alt:38.173744; speed:3.720000; course:268.593750; t:1463844846.142122; dt:1463844846.142122; dist:7.644818
wp: lat:37.337964; long:-121.979434; hacc:5.000000; vacc:3.000000; alt:38.173744; speed:3.720000; course:268.593750; t:1463844846.142135; dt:1463844846.142135; dist:7.644818
relv: v:0.977640; t:1463844846.825438;

Side note on understanding the file.  I don’t work for Strava, so I’m just spitballing here, but the file is obviously appended to as your activity takes place.  Here are some guesses as to what each datapoint means:

strt: looks like start

  • t: unsure.  At first I thought timestamp, but the value is way too low (resolved to somewhere in 1985).  So still a mystery.

lvseg: looks like live segment

  • given that strava just added this, I’m going to guess that this means you’re doing a live segment and on: 1 means it’s recording

wp: looks like these are the main data points

  • lat: latitude
  • long: longitude
  • hacc/vacc: horizontal/vertical acceleration
  • alt: altitude
  • speed: self explanatory
  • course: unsure
  • t: start timestamp
  • dt: end timestamp
  • dist: distance (in kilometers, I think)

relv: relative velocity (maybe)

So essentially, by looking at the latitude and longitude and timestamp, you can figure out where you were at what time.  It’s a little bit tedious to do so (if you’re trying to upload an old file), however you can avoid this if it’s your most recent activity.

4. My Strava refused to upload this file (also, it had a timestamp in the past), so I’m guessing that somewhere on your mobile device strava marks certain events that you decide not to upload and skips over them when the app boots up.  

To work around this limitation, I simply recorded a new activity and brought myself to the screen before the “save” screen.  

Don’t hit “save” yet.

You can see in iTunes that a new file was generated with the correct timestamp.  I next clicked “Save to …” to copy this to my local machine, where I brought it up in sublime again.  Note the UUID (DF5…857A831ECFD) looking thing so you can locate the file after you copy it locally.

Screen Shot 2016-05-21 at 3.36.17 PM

The file looks like this (note how small it is since we only recorded about 9seconds worth; also maybe interesting, Strava filling up your phone maybe @ 0.5 – 1KB/second means that each hour of activity is very roughly few MB, probably depends on the # of breaks you take):

lvseg: t:485562945.972006 on:1
strt: t:485562945.983277
wp: lat:37.338599; long:-121.978195; hacc:10.000000; vacc:32.000000; alt:54.916534; speed:0.620000; course:-1.000000; t:1463870147.006819; dt:1463870147.006819; dist:0.000000
wp: lat:37.338570; long:-121.978124; hacc:10.000000; vacc:12.000000; alt:42.931915; speed:0.000000; course:-1.000000; t:1463870148.002843; dt:1463870148.002843; dist:0.000000
relv: v:0.000000; t:1463870148.671240;

Next what I did is copy the contents of the original file into this file, essentially deleting everything in this file after the “strt: t:485562945.983277”.  Not sure if this is strictly necessary, but at any rate, it probably doesn’t hurt.

5. Finally, the last step is to copy the modified file (with the same filename) back to iTunes.  iTunes will prompt you for whether you want to overwrite the file.  Hit “Replace”.

Screen Shot 2016-05-21 at 3.40.30 PM 

Now you’ll see that the file changes size on your iPhone.

Screen Shot 2016-05-21 at 3.41.34 PM

8. Now you can hit “save” on your Strava app.  If you did everything right, you’ll see that the activity gets uploaded and parsed by Strava’s servers.  If it’s a large file, it may take some time.

That’s it!  Your precious stats are back.

Note to Strava developers: you can probably encrypt these files rather than store them in plaintext.  I’m not saying I would ever mess with the GPS coordinates or timestamps to hack my way to King of the Mountain status or anything, but others might. 😉

IMG_7502
Hooray! I have a reason to ride again!

 

Week 2 of biglawrefuge, what have we learned?

Week 2 Stats

Two weeks have now passed since the launch of biglawrefuge.  It’s been really hectic, with trying to market the website through various channels.  Here are a few of the methods I’ve tried, along with how effective they’ve been:

1.  Emailing law journals/tabloids (e.g. Abovethelaw)/legal blogs

Effectiveness: Low

Description: I’ve tried emailing various online content providers like Abovethelaw, the National Law Journal, the WSJ Law Blog, and other legal bloggers.  So far, I’ve heard a “we’ll get back to you if we’re interested” from ATL, nothing from the NLJ or WSJ, and only one response from a legal blogger.  Not sure how to really grab their attention.  Maybe biglawrefuge needs to have more traffic before it’s worth of national attention.

2.  Direct referrals (e.g. posting in http://www.top-law-schools.com or http://www.jdunderground.com)

Effectiveness: High

Description: I started off by announcing biglawrefuge to TLS.  So far the community has been awesome.  Super supportive (mostly) and I’ve gotten both great feedback and really positive responses.  In many ways, that community represents the community that I’m directly marketing to.  The nature of biglaw hiring is that biglaw firms take disproportionately from the top law schools and the biggest confluence of students from that background gather on TLS.  Jdunderground is a MUCH smaller community.  Using google analytics as a guide, TLS drives 100x more traffic.

3. Social Networking (e.g. reddit posts, facebook announcements, tweets)

Effectiveness: Medium

Description: Every time I have something worth sharing, I’ve posted it to http://www.reddit.com/r/lawschool.  So far, I’ve made three self-referencing posts, which some users have not been too happy about.  Not sure why it’s that big of a deal, they’re not duplicative in any sense (except maybe by having the articles hosted on the same webpage).  In any case, the reddit community is large and the fact that everyone sees what’s on the frontpage makes it in many ways more effective than TLS.  The last two times I’ve posted an article, my average views has tripled or quadrupled.  Facebook on the other hand has been disappointing.  Most likely my friends are tired of hearing about biglawrefuge, so there aren’t many clicks driven by that.  Twitter has been useless, but that’s mainly because I have so few followers.

4.  Direct emailing (e.g. weekly emailing of users)

Effectiveness: Medium

Description: Each week so far, I’ve sent an email to all the users who are signed up for biglawrefuge. And as I send the email, I watch google analytics to see how many users get on the site.  It definitely brings people back.  I’ve heard from others that email campaigns are the way to go, but it seems like a fundamental problem with emailing users is getting the users first.  I understand that it’s a way to keep the momentum going, but obviously it doesn’t help to get the ball rolling.

Surprises, so far …. 

1. Content rules

Last week (or was it the first week? I don’t remember), I added the ability to add articles to biglawrefuge.  So far, I’ve kept this functionality locked down, but just added the ability for select users to contribute articles, subject to admin approval.  What’s been extremely surprising is that the few articles I’ve written have been incredibly popular:

http://www.biglawrefuge.com/articles/2-a-day-in-the-life-of-a-biglaw-attorney

http://www.biglawrefuge.com/articles/3-how-i-left-biglaw-part-i

When I post them to reddit or TLS, it ends up generating an enormous (for me) amount of traffic.  I know the articles are popular.  There’s the occasional bad comment here and there, but by and large people have all stated that they really like the content.  I think part of the reason for this reaction is my approach to writing the articles.  I’ve been 100% transparent when writing them, not hiding the ball, even at risk of my own reputation being tarnished.  That doesn’t matter.  I want to bring greater transparency to the legal profession, so what better way to do so than to lead by example?

2. Law students are shy

Despite having close to 600 users now, probably 3-400 of whom are going through OCI, there have only been 159 job postings.  I don’t know why law students are so secretive about this information.  I guarantee that extremely few people care one way or another.  So despite the thousands of views to the job postings and law firm reviews, law students are reluctant to contribute themselves.  I may eventually introduce some gating in the future to encourage more contribution, but won’t do that at first.  Cmon people, contribute!  I built this website for you.

3. Marketing is hard, as usual

It’s a constant battle to market the website.  I feel like my welcome is already wearing thin at TLS and reddit/r/lawschool.  Not sure what other channels there are besides direct contacting of various law schools career services offices.  And that’s an uncertainty, because I don’t know what the reception will be like.  The days I publish articles, traffic jumps sharply, but of course, it’s just a temporary spike.  Quickly after, the traffic mostly goes back to normal.  Hopefully, by the time I run out of content, the community will mostly be self-supported.

4.  Be honest with your community, and they’ll treat you well

I’ve been completely honest to the communities I’ve posted to.  I don’t plan on hiding the ball from them or suddenly changing what I’m doing.  I’ve been honest about the struggles of running biglawrefuge myself, with dealing with unfriendly folks, and with getting users to contribute.  Thank you ALL for those who have reached out to me to encourage me in the process.  It means a ton and it motivates me to continue writing and bettering biglawrefuge for you guys.

How I left biglaw (Part III)

… Dejected, I made the long trek back to my office, where I closed my door and contemplated my future as a lawyer.  I looked at the spreadsheet that I had made, counting down the months until I my net worth was $0.  Still over two years away.  I knew I wouldn’t make it.  I abandoned my plan to stay in law, and began to draft a new resume…

This is the final part of my story, and follows the trek from abandoning my plan to go in house to the decision to leave the law entirely.  On above the law, you often come across stories of biglaw associates who burn out and leave the law to do something else entirely.  For example, there’s the lego guy and the cupcake girl.  If you’ve been reading along, you’ll know my story isn’t as glamorous as the stories of those attorneys.  It’s just an ordinary tale of one biglaw associate who in desperation, decided to leave.

Part II of this post: How I left biglaw (Part II)

Part I of this post: How I left biglaw (Part I)

———

It turns out that no one outside the legal community cares that you work for a V5 law firm.  Lawyers, however, and law students in particular (myself included), are obsessed with prestige: “Where do you work?”, “How much were your bonuses?”, “Where’d you go to law school?”, “Where’d you clerk?”, “What’s your class rank?”  While in law school, I adopted this attitude, and it heavily influenced my decisions when applying to outside jobs.  After abandoning my plan to go in house, I was left with the alternative to reapply for engineering positions.  I had never stopped considering going back to engineering (from my first application submitted during Thanksgiving of the previous year), though I thought it would be more rational to try to get another job in a legal capacity first.  But after a string of failures, I reconsidered my reasons for wanting to stay in the law.  When viewed in retrospect, my reasons for wanting to stay in the law were very similar to the reasons I went to law school in the first place.  I wanted to continue telling others that I was a lawyer, I wanted to make my parents proud, and I wanted the financial security and social stature that came with being an attorney.  Compounding that was the harrowing sunk cost effect -hundreds of thousands of dollars and years of my life spent devoted to law, the mental effect of giving up, of facing my friends and family after the fact.  However, conspicuously absent from my set of reasons to stay was the desire to actually practice the law.  Law school was an isolated experience free from the restrictions of looming bills, demanding partners and a professional career.

By March, I had fully turned my attention from going in-house to going back to engineering.   When I began applying for software engineering jobs, my resume was still structured as a legal resume.  At the top, I listed my accolades: law school, legal journals, published papers, leadership experience, law firm.  Towards the bottom, I broke out my work experience into two sections, “Legal Experience” and “Software Engineering Experience.”  With each application submitted, I attached my cover letter, which I refined multiple times over the next several months of applications:

Dear [Company X],

I am a software engineer and attorney who has been working on coding projects for the [Y] since 2011 and was previously employed with [Z] in 2009 as a developer for our web portal on the __ team. I spent the last three years furthering my studies through attending law school and received my Juris Doctor degree from Stanford Law School in 2012. Since graduating and becoming an attorney, I have come to realize that my passion remains in software engineering. Though studying law is intellectually demanding and stimulating, I find software engineering to be a better outlet for my career goals. In short, I want to create products that help others and software engineering gives me that opportunity.

To that end, while I was in law school, I kept up with my software fundamentals through coding projects on the side. I am eager to more deeply engage in my software engineering career again, and I hope to be able to continue learning and doing so with [Company X]. Thank you for considering my application.

Sincerely,

Codeandcodes

I spent the next two months applying without any success.  At first, I targeted only well known companies, thinking that a bigger company would have the resources and financial security to take a chance on an ex-software engineer to do their coding.  But application after application was rejected or ignored.  By this point, I made my desire to leave biglaw public among many of my non-lawyer friends, many of whom were working in the software engineering field.  Some of them, to whom I’m still grateful, put their faith in me and referred me to positions with their companies.  I found, however, that even with their endorsement, I wasn’t able to even get past the resume screen. During this time, I kept an ace up my sleeve. I had spent several years in the software industry prior to going to law school, and one of my former companies had an office in my town. My old boss still worked there, and I had departed on good terms (for law school). I was fairly confident that if I reached out to him, he would be able to get me through the initial screen. However, I waffled back and forth on whether to reach out to him again. Part of me knew that by doing so, I had to commit to actually leaving the law or else burn bridges and perhaps my best hope of getting out. It felt final, and I kept asking myself if I was really prepared to leave everything I had spent the last four years working for. But after all my mental debate, I eventually decided to make the call. I was going to do this.

After contacting my old boss, the mental burden of leaving was lifted. I decided to widen my net.  I redoubled my efforts and by this point, 100% of my free time was devoted to applying to jobs or preparing for interviews.  I would come home weary from work each day, spend an hour or two looking at relevant job postings on any site I could find, and then spent a few hours brushing up on algorithms, data structures and catching up with the last several years of advancements made in software.  It was grueling.  Balancing individual study with a biglaw job is obviously not easy, but the allure of freedom kept up my morale.  At first, the challenge seemed insurmountable. However, with time, my efforts began paying off.  A few weeks after I started applying to startups, I actually began hearing back from recruiters.  Some were willing to talk with me, many others were simply curious about my background.  I still had many rejections.  However, even getting the opportunity to speak with a recruiter was a step in the right direction.  I even had a few phone screens in the beginning (all of which I failed miserably).  The software world had changed dramatically in the past four years that I had been out of the game.  Though I had kept up with programming on the side, I was woefully unprepared for jumping straight back in.

Then one day, which I consider the turning point of my application process, a contract recruiter contacted me regarding one of the many applications I had submitted.  She saw my background, was intrigued, and decided to submit my application to her client because she thought I was a good fit (that client eventually rejecged me without interview).  However, the most valuable thing she did for me was suggest that I remove references to my legal accolades from my resume, something I never considered.  The idea of removing my legal accolades was foreign to me. After all, they are the factors by which lawyers as distinguished.  I was skeptical about this advice, but since I wasn’t having much luck, I prepared a second resume (I still have it, and called it [codeandcodes SE.pdf).  I widened my net yet still, and began to apply to every job I could find, some with my SE.pdf resume, and others with my hybrid law/engineering resume. By the peak of my application cycle, I was applying to thirty jobs a day.  And I started getting hits back.  Previously, with only my hybrid resume, I had a response rate from recruiters of about 5% -for every 20 resumes and cover letters I wrote, I would receive about one response.  Out of all of those responses, I received only one phone screen.  With my new resume, which left off every tie to law (except my law school), my response rate from recruiters jumped to 20%.  And because my resume had no reference to the law firm at which I was currently employed, these recruiters viewed me without bias (except maybe as an unemployed software engineer with a very large gap on my resume).

Meanwhile, my late night study sessions were also paying dividends.  The concepts that I used everyday in the past as a software engineer started coming back to me, and I grew sharper and more familiar with the current technical buzzwords.  I also purchased several software engineering interview books, and attempted each problem, no matter how long it took me.  Slowly, I started getting more phone screens, and they started to go better.  I was still far from a callback, but my efforts showed promise.  This process continued for months.  By July, I had applied to hundreds of jobs and was speaking with recruiters on a regular basis.  Finally, the day came when one of my phone screens went well enough where the interviewer wanted to bring me in onsite interview.  The CEO of this startup told me that he would have his Director of Engineering contact me first to verify the interviewer’s feedback.  Long story short, after a conversation with the Director of Engineering, which fortunately was less technical in nature, the CEO and the director agreed to bring me in. The director gave me a homework assignment before my onsite interview, which I still have to this day.  He was experimenting with Scala (a functional programming language that runs on the JVM) and wanted me to research the pros and cons of Scala versus Java.  I had never heard of Scala or functional programming before, but I attacked this assignment with incredible vigor.  Before my onsite interview, I researched dozens of articles on Scala, Java and functional programming languages.  I put all my notes into a digestible chart in Microsoft OneNote, ironically my favorite application for note-taking in law school.  When the day of the onsite arrived, I was well-prepared.

The rest is mostly history.  Once I finished the onsite interview, I had a strong suspicion that an offer was coming.  But when it finally came, I didn’t celebrate or rejoice.  It wasn’t the compensation or the company.  Again, it came down to the decision to leave. Leaving the law was no longer just a fantasy.  It came down to the simple choice that I had debated all this time. All my work in the past half decade, from studying for the LSAT, to applying to law school, to my 1L year, my hours at the law firm, passing the bar, was about to become my past.  All of my fears were coming to a head.  I wasn’t sure if I was making the biggest mistake of my life by leaving the law behind and again, and the fears and doubts that I was just being immature about my job situation resurfaced.  I began reaching out to everyone I knew (including several people I didn’t know) who had left biglaw to do something else.  Over the next two weeks, before accepting, I sought the advice and counsel of many of these people.  All of them told me to take the leap of faith and leave.  In the end, their reassurances were all unnecessary, because I already knew what I wanted.  I finally realized that my life was being spent doing something I didn’t want to do.  I called up the recruiter and accepted.

Aftermath:

After accepting the position at the startup, I told my partner I was leaving.  He was respectful and supportive and wished me well.  It was done.  That moment, the moment where I gave notice that I was leaving, remains one of the happiest moments of my life.  It’s hard to describe the relief or joy, or whatever feelings that accompanied that decision.  It’s been two years since I departed, and I haven’t regretted my decision for a day.

48 hours … checking in.

So it’s been roughly 48 hours since the launch of biglawrefuge.  Here are some stats:  Ready to be blown away?

Stats

  • 130 total sign ups
  • 3 reviews! (one is mine)
  • 15 job listings (13 are mine)
Screen Shot 2015-07-28 at 10.31.27 PM
This is me.

So … not very impressive at all so far.  That’s ok.  It’s still early in the launch, so there’s definitely room to grow.  Also, OCI season has not kicked into full swing yet for law students, which is when I expect they’d find the site more useful.  However, it does raise the question, “how does one market a new webapp?”  This is something admittedly, that I don’t know the answer to.

What me, market?

My plan was to target legal forums (the top being http://www.top-law-schools.com), and actually it’s the only place I’ve announced biglawrefuge.  I considered also reaching out to various career services departments at law schools around the country, but don’t know whether that’s a good idea or a bad idea.  On the one hand, they might be friendly and circulate the website to their students, seeing how it’s a valuable resource for them.  On the other (more likely) hand, I think they would outright reject the idea for being in “competition with them” or bringing their clandestine statistics to light, or (most likely) they would just ignore me completely.

I’ve also thought about reaching out to some old law professors at SLS to see what they thought.  Actually, while their thoughts would be useful, it would be more useful for them to just tell their students about the site.  I also don’t want it to feel like I’m attacking the law school or any law school in particular.  That’s definitely not my intention.  My intention is to let the stats speak for themselves, and to let law students and lawyers make the judgments.  After all, biglawrefuge is only the platform.

So in the end, I’m pretty much reliant on word of mouth, for now, at least.  Hopefully it starts picking up some steam and law students start seeing the value.  In the meantime, I’ll stay positive and continue to add features.  I recently just added the ability to post up a company review anonymously.  Cool stuff!

Joining the flock

new_twitter_logo

I’ve decided to the join the flock. That probably doesn’t mean anything to a whole lot of people except perhaps those in the engineering community, but “joining the flock” is Twitter parlance for joining their company. A little bit of background about me for new readers: Back in 2013, I took a leap of faith and left my life as a lawyer and a highly profitable law firm to go back into software engineering at a small silicon valley startup *(See here). At the time I left, my life at the law firm had ironically increased my tolerance for risk because of how unenjoyable the work was. I honestly felt that any non-law opportunity, even if it had a dubious potential upside, would be better than grueling it out for several more years only to enter into an in-house counsel gig. In retrospect, I didn’t even want the position of in-house counsel; I think I was just looking for meaningful work and better work-life balance without having to be electronically tethered to my phone.

In the following months after leaving big law, I drank from the proverbial firehose. It’s hard to overstate the degree to which the software engineering community and tools had changed within the past few years. I had missed much. While it felt good to be programming again, I constantly battled feeling overwhelmed with relearning old coding techniques, digesting new paradigms and finding my place in the software community again. I remember picking up javascript for the first time in years and thinking to myself how foreign everything looked. Playing around in jquery, seeing all the dollar signs, variable names without associated types, and nested callbacks and closures made my head spin. However, after overcoming that initial shock, things started quickly becoming familiar again. Coding eight to ten hours a day definitely fast-tracked my improvement as a software engineer in ways that coding on my own after normal working hours while I was with the law firm couldn’t.

Not too soon after I joined the startup did I begin searching out different opportunities. There was no real impetus towards my search; rather it was exploratory in nature. There was nothing that I particularly disliked about the startup, however I knew it wasn’t a place where I could really thrive. It was a small shop laden with past encumbrances that it couldn’t shake and simply wasn’t in a position for growth. Subsequently, I landed at a large hardware/software company in the networking market segment where I set to work on more interesting and difficult engineering problems. Essentially, the type of work I was doing at this company is what I’ve been blogging about for the past several months. The software issues I dealt with were engaging. I built an analytics stack from the ground up, and spent a good bit of time figuring out how to scaling the analytics processing and increasing node.js and mongoDB performance, and had fun developing the logical software frameworks and architecture to improve my software’s maintainability and scalability. In short, it was a great experience, and I was thriving at the company.

So then why did I leave? There are a few reasons: First, Sallie Mae. As all you lawyers can relate, law school is insanely expensive. Second, my company only deals in the enterprise space, which isn’t very exciting. I used to joke with my coworker that on the day we had one user using the software suite, we would have cause for celebration. Third, this past year and a half has been … awkward for me. I’ve gone from being deeply invested in the law to deeply invested in software engineering. I’ve grown much faster in this past year in my software engineering ability and competence than I have at any other part in my career (and maybe life), in part because of my dedication to software engineering and becoming a better engineer, but also vastly due to me simply remembering old techniques, design patterns, abstract data types, and software methodology that I had forgotten. Spending the majority of my free time coding has also certainly propelled me much further than just my day job alone. The crux is that it’s been a struggle to find where I belong back in the community of software engineers. My resume still remains a source of question and curiosity to both recruiters and interviewers alike. It probably will continue for the long run. Thus, at least in this hot economy for software engineers, it’s been difficult to estimate my market value when dealing with so many variables. I don’t blame prospective employers for having questions when reviewing a candidate like me. As I’ve come back up to speed though, my market value has risen and I’ve been hungry for new challenges.

I know that I’ll get left behind again if I don’t continue pushing myself in my free time. While it’s my belief that there will always be a great market for great software engineers, I recognize that dropping from great to good to mediocre can happen quickly through complacency. And apparently, the reverse is true through hard work and natural ability. In essence, software engineering is the path I’ve chosen and I’m pot-committed. I threw away a promising career in law that I didn’t love to do what I do love now, and I intend to see it through to my maximum ability. And that means I want to be at a company where I can push and be pushed in engineering quality and knowledge, where my weekend work and exploration helps me develop my edge and promotes my future, and where excellence is recognized by the company. I think I’ve found it with Twitter. Twitter’s engineering team is outstanding; their employees are young, hungry and eager to learn and do great things. Twitter as a platform has also not peaked (in my opinion), so there are great opportunities for growth within the company. Although Twitter is much bigger than some of the other startups that I’ve considered, there is still the opportunity to dive in and do impactful things at the company. I’m definitely looking forward to the first hack week.

Lastly, now that I’m fully integrated back into the software community, there’s perhaps a greater penchant for financial conservatism. I’m getting older, but my debt acts as an albatross around my neck, so it’ll be nice to have some financial security for a change. I should also say that many startups in the valley also have excellent engineering teams and hiring standards and are solving interesting issues. Yet many of those startups do not enjoy public recognition of that fact, so recruiters can’t easily get a good gauge of the caliber of engineer. At a large company like Twitter, Google or Facebook, the software engineers enjoy the reputation of having survived the interview gauntlet. In other words, such engineers are generally of high quality and will be known as such. While I still have the desire to do things on my own such as exploring the intersections of law and software engineering, I see no reason that such a goal is mutually exclusive with working at a large software company. I’ll continue doing what I’ve always done, spending my time improving my skills, building tools that may or may not have merit, but always and ceaselessly bettering myself.

MongoDB performance enhancements and tweaks

MongoDB performance enhancements and tweaks

In my travails in building and my work on a real time analytics engine, I’ve formed some opinions on how well mongoDB is suited for scalability and how to tweak queries and my node.js code to extract some extra performance. Here are some of my findings, from several standpoints (mongoDB itself, optimizations to the mongoose driver for Node, and node.js itself).

Mongoose Driver
1. Query optimization

A. Instead of using Model.findOne or Model.find and iterating, try to use Model.find().limit() – I encountered a several factor speed up when doing this. This is talked about in several other places online.

B. If you have excess CPU, you can return a bigger chunk of documents and process them using your server instead and free up some cycles for MongoDB.

Improvement: Large (saw peaks of 1500ms for reads in one collection using mongotop. Afterwards, saw this drop to 200ms)

Example:

//Before:
Collection.findOne(query3, function(err, doc) {
  //Returns 1 mongoose document
});

//After
Collection.find(query3).limit(1).exec(function(err, docs) {
  //returns an array of mongoose documents            
});

See these links for some more information: Checking if a document exists – MongoDB slow findOne vs find

2. Use lean()
According to the docs, if you query a collection with lean(), plain javascript objects are returned and not mongoose.Document. I’ve found that in many instances, where I was just reading the data and presenting it to the user via REST or a visual interface, there was no need for the mongoose document because there was no manipulation after the read query.

Additionally, for relational data, if you have for instance a Schema that contains an array of refs (e.g. friends: [{ type: mongoose.ObjectId, ref: ‘User’}]), and you only need to return the first N number of friends to the user, you can use lean() to modify the returned javascript objects and then do population instead of populating the entire array of friends.

Improvement: Large (depending on how much data is returned)

Example:

//Before:
User.find(query, function(err, users) {
  //Users will be mongoose Documents. Hence you can't add fields outside the Schema (unless you have an { type: Any } object
  var options = {
     path: 'friends',
     model: 'User',
     select: 'first last'
  };
  Users.populate('friends', options, function(err, populated)) //will populate ALL friends in the array
});

//After
var query = new Query().lean();
User.find(query, function(err, users) {
  //Users will be javascript objects. Now you can go outside the schema and return data in line with what you need
  users.forEach(function(user) {
     user.friends = friends.splice(0, 10);  //take the first ten friends returned, or whatever
  });
  var options = {
     path: 'friends',
     model: 'User',
     select: 'first last'
  };
  Users.populate('friends', options, function(err, populated)) //now Model.populate populates a potentially much smaller array
});

Results (Example on my node.js server using mongoTop):
Load (ms)
Seconds No Lean() Lean()
5 561 524
10 371 303
15 310 295
20 573 563
25 292 291
30 302 291
35 544 520
40 316 307
45 289 286
50 537 503
Average 409.5 388.3
% improvement 0.051770452 = 5.177%

3. Keep mongoDB “warm”.
MongoDB implements pretty good caching. This can be evidenced by running a query several times in quick succession. When this occurs, my experience has been that the query time decreases (sometimes dramatically so). For instance, a query can go from 50ms to 10ms after running twice. We have one collection that is constantly queried – about 500 times per second for reads and also 500 times per second for writes. Keeping this collection “warm”, i.e. running the query that will be called at some point in the future, can help keep the call responsive when Mongo starts to slow down.

Improvement: Untested
Example:

function keepwarm() {
   setTimeout(function() {
      User.find(query);
      keepwarm();
   }, 500);
}

Mongo Native
1. Compound indexing
For heavy duty queries that run often, I decided to create compound indices using all the parameters that comprised the query. Even though intuitively, it didn’t jump out to me that indexing by timestamp for instance would make a difference, it does. According to the mongoDB documentation, if your query sorts based on timestamp (which ours did), indexing by timestamp can actually help.

Improvement: Large (depending on how large in documents your collection is and how efficiently mongoDB can make use of your indices)
Example:

//in mongo shell
db.collection.ensureIndex({'timestamp': 1, 'user': 1});

//in mongoose schema definition
Model.index({'timestamp': 1, 'user': 1});

Alternative? Aggregating documents into larger documents, such as time slices. Intuitively, that would mean that queries don’t have to traverse as large an index to reach the targeted documents. You may ask what the difference is between creating a compound index versus breaking the document down into aggregates like a day’s or hours slice. Here’s a few possibilities:

  1. A. MongoDB tries to match up queries with indices or compound indices, but there’s no guarantee that this match will occur. Supposedly, the algorithm used to determine which index to use is pretty good, but I question how good it is if for instance, the query you are using includes an additional parameter to search for. If MongoDB doesn’t see all parameters in the index, will it still know to use a compound index or a combination of compound indices?
  2. B. Using aggregates could actually be slower if it requires traversal of the document for the relevant flight data (which might not afford fast reads).
  3. C. If writes are very heavy for the aggregate (e.g. you use an aggregate document that is too large in scope), the constant reading and writing of the document may cause delays via mongoDB’s need to lock the collection/document.
  4. D. Aggregates could make indexing more difficult
  5. E. Aggregates could make aggregation/mapreduce more difficult because your document no longer represents a single instance of an “event” (or is not granular enough)

2. Use Mongotop to determine where your bottlenecks are.
Mongotop shows each collection in your database and the amount of time spent querying reads and writes. By default it updates every second. Bad things happen when the total query time jumps over a second. For instance, in Node, that means that the event queue will begin to block up because mongo is taking too long

Example:

 
//example output
                            ns       total        read       write		2014-07-31T17:02:06
              mean-dev.packets       282ms       282ms         0ms
             mean-dev.sessions         0ms         0ms         0ms
               mean-dev.series         0ms         0ms         0ms
              mean-dev.reduces         0ms         0ms         0ms
             mean-dev.projects         0ms         0ms         0ms

3. Use explain()… sparingly
I’ve found that explain is useful initially, because it will show you the number of scanned documents to reach the result of the query. However, when trying to optimize queries further, I found that it was not that useful. If I’ve already created my compound indices and MongoDB is using them, how can I extract further performance using explain() when explain() may already show a 0 – 1ms duration?

Example:

//in mongo shell
db.collection.find({
        $and: [{
            'from.ID': 956481854
        }, {
            'to.ID': 1038472857
        }, {
            'metadata.searchable': false
        }, {
            'to.IP_ADDRESS': '127.0.0.1'
        }, {
            'from.timestamp': {
                $lt: new Date(ISODate().getTime() - 1000 * 60 * 18)
            }
        }]
    }).explain()

4. For fast inserts for a collection of limited size, consider using a capped collection.

A capped collection in mongoDB is essentially a queue-like data structure that enforces first-in first-out. According to the mongoDB docs, capped collections maintain insertion order, so they’re perfect for time series. You just have to specify what the max size of the collection should be in bytes. I used an average based on: db.collection.stats(), where I found that each record was about 450 bytes in size.

To enforce this, you can run this in the mongoDB shell:

db.runCommand({"convertToCapped": "mycoll", size: 100000}); //size in bytes

See mongoDB docs here:

Node.js
1. Implement pacing for large updates.
I’ve found that in situations where there is a periodic update on a large subset of a collection while many updates are going on, the large update could cause the event queue in Node to backup as mongoDB tried to keep up. By throttling the number of updates that can go on based on total update time, I could adjust based on the load on the server currently. The philosophy is if node/mongoDB have extra cycles, we can dial up the pace of backfilling/updates a bit, whereas when node/mongoDB is overloaded, we can backoff.

Example:


//Runs periodically
    _aggregator.updateStatistics(undefined, updateStatisticsPace, function(result) {
          console.log('[AGGREGATOR] updateStatistics() complete.  Result: [Num Updated: %d, Duration: %d, Average (ms) per update: %d]', result.updated, result.duration, result.average);
          if (result.average < 5) {  //<5 ms, speed up by 10%
            updateStatisticsPace = Math.min(MAX_PACE, Math.floor(updateStatisticsPace * 1.1));    //MAX_PACE = all records updated
          } else if (result.average >= 5 && result.average < 10) { //5 < ms < 10, maintain pace
            updateStatisticsPace = Math.min(MAX_PACE, updateStatisticsPace);
          } else {  //>= 10ms, slow down by 2/3, to a min of 10
            updateStatisticsPace = Math.min(MAX_PACE, Math.max(updateStatisticsPace_min, Math.floor(updateStatisticsPace * .66)));
          }

          if (MAX_PACE === updateStatisticsPace) { console.log('[Aggregator] updateStatistics() - Max pace reached: ' + _count); }
          console.log('[AGGREGATOR] updateStatistics() Setting new pace: %d', updateStatisticsPace);
          callback(null, result)
    });

Custom tags directive in angular.js

For me to finish up phase 1 of Dockumo development (allowing users to tag and categorize articles), I had to let users add custom tags to their articles. If the article is made public, then the user’s tags will be indexed and searchable through the search interface. This will let other users sift through content based on tag and could eventually give some good insight into what content was most popular.

Why I created it: I’ve used angular-tag before, but I didn’t like the way it hooked in the “Delete” button, which on my macbook sometimes defaults to going to the previous page in history (like clicking the “Back” button). I also found the CSS to be a bit wonky and difficult to work with. When I would click on the input box, the box would expand and didn’t play nicely with my css. I’ve been feeling more reticent these days with respect to using untested third party libraries out there (even small). Sometimes they save me lots of time, but other times, they only cause lots of headaches. Delving into someone else’s source code, modifying their css, figuring out how to mash my code and theirs takes a lot of time. Sometimes it’s not until I do all that do I realize that the library doesn’t really do what I want it to do. Hence the frustration.

What it does: My angular directive is very simple. It lets the user bind a variable to the directive through ngModel. The variable is an array of strings (a set). The directive then renders an input text box below, letting the user enter in a comma separated list of tags. If the user clicks “add”, these tags are split by commas, trimmed and their lowercase values added to the set of tags already bound. It works similarly to wordpress’s tagging system. That’s it: no fuss, no muss.

Screen Shot 2014-07-28 at 10.09.00 AM

Without further ado, here’s the source:

angular.module('mean.articles')

.directive("mTag", ['$resource', function($resource) {
  return {
    restrict: 'E',
    controller: 'TagController',
    scope: {
      read: '=',  //read or write (I provide an API in case the user only wants to show the tags and not provide the input
      tags: '=ngModel'  //the ng-model binding
    },
    templateUrl: 'public/system/views/directives/tag.html'
  };
}]);

Here’s the controller code:

'use strict';

angular.module('mean.system').controller('TagController', 
    ['$scope', function ($scope) {

    //only allows inputs of alphabetic characters (no numbers or special chars)
    $scope.validate = function(string) {
    	if (!string || string.length === 0) {
    		return true;
    	} else {
    		return string.match(/^\s?[A-Z,\s]+$/i);	
    	}
    };

    //Adds the tag to the set
    function addTag (string) {
    	if ($scope.tags.indexOf(string.toLowerCase()) === -1) {
    		$scope.tags.push(string);
    	}
    };

    //When the user clicks "Add", all unique tags will be added
    $scope.appendTags = function(string) {
    	if ($scope.validate(string)) {
    		if (string) {
	    		var split = string.split(',');

	    		split.forEach(function(tag) {
	    			if (tag.trim().length > 0) {
	    				addTag(tag.trim());
	    			}
	    		});

	    		$scope.temptags = "";
	    	}
    	}
    };

    //When the user wants to delete a tag
    $scope.deleteTag = function(tag) {
    	if (tag) {
    		var idx = $scope.tags.indexOf(tag);
    		if (idx > -1) {
    			$scope.tags.splice(idx, 1);
    		}
    	}
    };

}]);

And lastly, the HTML template:

<div class="input-group" ng-show="!read">
    <input type="text" name="temptags" ng-model="temptags" class="form-control"  placeholder="Enter comma separated tags" ng-class="{'haserror':!validate(temptags)}" style="margin-bottom:0px"/>
    <span ng-show="!validate(temptags)" class="label label-danger">Please only use regular characters for tags (a-z)</span>
    <div class="input-group-btn" style="vertical-align:top">
        <button class="btn btn-inline" ng-click="appendTags(temptags)">Add</button>
    </div>
</div>

<div class="margintopten">
    <span class="label label-default normal tag marginrightten" ng-repeat="tag in tags"><a ng-click="deleteTag(tag)" ng-if="!read"><i class="fa fa-times-circle" ng-click="deleteTag(tag)"></i></a>  {{tag}}</span>
</div>

Here’s a working jsfiddle.

Custom tag directive with error handling
Custom tag directive with error handling

As always, let me know comments or feedback.

Real Time Analytics Engine

My current project at work is to architect a solution to capturing network packet data sent over UDP, aggregate it, analyze it and report it back up to our end users. At least, that’s the vision. It’s a big undertaking and there are several avenues of approach that we can take. To that end, I’ve started prototyping to create a test harness to see the write performance/read performance of various software at different levels in our software stack.

Our criteria are as follows (taken from a slide I created):

Our Criteria
Our Criteria

Based on these four requirements pillars, I’ve narrowed down the platform choices to the following:

SERVER

nodejsStrengths

      • Single-threaded, event loop model
      • Callbacks executed when request made (async), can handle 1000s more req than tomcat
      • No locks, no function in node directly performs I/O
      • Great at handling lots of small dynamic requests
      • Logo is cool

Weaknesses

      • Taking advantage of multi-core CPUs
      • Starting processes via child_process.fork()
      • APIs still in development
      • Not so good at handling large buffers

logo-white-big

Strengths

        • Allows writing code as single-threaded (Uses Distributed Event Bus, distributed peer-to-peer messaging system)
        • Can write “verticles” in any language
        • Built on JVM and scales over multiple cores w/o needing to fork multiple servers
        • Can be embedded in existing Java apps (more easily)

Weaknesses

        • Independent project (very new), maybe not well supported?
        • Verticles allow blocking of the event loop, which will cause blocking in other verticles. Distribution of work product makes it more likely to happen.
        • Multiple languages = debugging hell?
        • Is there overhead in scaling over multiple nodes?

PERSISTENCE

For our data, we are considering several NoSQL databases. Our data integrity is not that important, because our data is not make-or-break for a company. However, it is essential to be highly performant, with upwards of 10-100k, fixed-format data writes per second but much fewer reads.

mongo

Architecture

        • Maps in memory space directly to disk
        • B-tree indexing guarantee logarithmic performance

Data Storage

        • “Documents” akin to objects
        • Uses BSON (binary JSON)
        • Fields are key-value pairs
        • “Collections” akin to tables
        • No conversion necessary from object in data to object in programming language

Data-Replication

        • Mongo handles sharding for you
        • Uses a primary and secondary hierarchy (primary receives all writes)
        • Automatic failover

Strengths

        • Document BSON structure is very flexible, intuitive and appropriate for certain types of data
        • Easily query-able using mongo’s query language
        • Built-in MapReduce and aggregation
        • BSON maps easily onto JSON, makes it easy to consume in front end for charting/analytics/etc
        • Seems hip

Weaknesses

        • Questionable scalability and write performance for high volume bursts
        • Balanced B-Tree indices maybe not best choice for our type of data (more columnar/row based on timestamp)
        • NoSQL but our devs are familiar with SQL
        • High disk space/memory usage

 

cassandra
Architecture

        • Peer to peer distributed system, Cassandra partitions for you
        • No master node, so you can read-write anywhere

Data Storage

        • Row-oriented
        • Keyspace akin to database
        • Column family akin to table
        • Rows make up column families

Data Replication

        • Uses a “gossip” protocol
        • Data written to a commit log, then to an in-memory database (memtable) then to disk using a sorted-strings table
        • Customizable by user (allows tunable data redundancy)

Strengths

        • Highly scalable, adding new nodes give linear performance increases
        • No single point of failure
        • Read-write from anywhere
        • No need for caching software (handled by the database cluster)
        • Tunable data consistency (depends on use case, but can enforce transaction)
        • Flexible schema
        • Data compression built in (no perf penalty)

Weaknesses

        • Data modeling is difficult
        • Another query language to learn
        • Best stuff is used by Facebook, but perhaps not released to the public?

 

467px-Redis_Logo.svg

            TBD

 

        FRONT-END/REST LAYER

For our REST Layer/Front-end, I’ve built apps in JQuery, Angular, PHP, JSP and hands-down my favorite is Angular. So that seems like the obvious choice here.

AngularJS-large

Strengths

      • No DOM Manipulation! Can’t say how valuable this is …
      • Built in testing framework
      • Intuitive organization of MVC architecture (directives, services, controllers, html bindings)
      • Built by Google, so trustworthy software

Weaknesses

      • Higher level than JQuery, so DOM manipulation is unadvised and more difficult to do
      • Fewer 3rd party packages released for angular
      • Who knows if it will be the winning javascript framework out there
      • Learning curve

Finally, for the REST API, I’ve also pretty much decided on express (if we go with node.js):

express

Strengths

      • Lightweight
      • Easy integration into node.js
      • Flexible routing and callback structure, authorization & middleware

Weaknesses

      • Yet another javascript package
      • Can’t think of any, really (compared to what we are using in our other app – java spring

These are my thoughts so far. In the following posts, I’ll begin describing how I ended up prototyping our real time analytics engine, creating a test harness, and providing modularity so that we can make design decisions without too much downtime.

Mapping Google Datatable JSON format to Sencha GXT ListStores for charting

By now, if you’ve been keeping up with my blog, you’ll notice that we make extensive use of charting in our application. We decided to go with Sencha GXT charting because we were under time pressure to finish our product for a release. However, we weren’t so crunched for time that I decided to model the data in an unintelligent way. Since charts essentially graphically display tabular data, I wanted to ensure that if we were to move to a different graphing package (which we are, btw) in the future, our data was set up for easy consumption.

After tooling around a bit, I decided on google datatable JSON formatting for our response data from queries made to our server. Google datatable formatting essentially just mimics tabular data in JSON formatting. It’s also easily consumable by google or angular charts. Having used angular charts in the past, I knew that the data format would map nicely to charts. However, the immediate problem still loomed, which is how to convert from google datatable JSON format to consumable ListStores for consumption by GXT charts?

Solution: In the end, I ended up writing an adapter with specific implementations for consuming our backend charting data. I needed to write a Sencha/GXT specific one. Here’s an example of the data we receive from our REST query.

//Note: some data omitted for readability
{
    "offset": 0,
    "totalCount": 1,
    "items": [
        {
            "title": "Registration Summary",
            "data": {
                "cols": [
                    {
                        "id": "NotRegistered",
                        "label": "Not Registered",
                        "type": "number"
                    },
                    {
                        "id": "Registered",
                        "label": "Registered",
                        "type": "number"
                    }
                ],
                "rows": [
                    {
                        "l": "9608",
                        "c": [
                            {
                                "v": "5",
                                "f": null
                            },
                            {
                                "v": "4",
                                "f": null
                            }
                        ]
                    },
                    {
                        "l": "9641",
                        "c": [
                            {
                                "v": "2",
                                "f": null
                            },
                            {
                                "v": "5",
                                "f": null
                            }
                        ]
                    },
                    {
                        "l": "9650SIP",
                        "c": [
                            {
                                "v": "3",
                                "f": null
                            },
                            {
                                "v": "0",
                                "f": null
                            }
                        ]
                    }
                ]
            }
        }
    ]
}

And here’s an example of how it’s converted into a Sencha ListStore:

[
    {
        v1=4,
        label0=NotRegistered,
        v0=5,
        label1=Registered,
        id1=Registered,
        type1=number,
        id0=NotRegistered,
        type0=number,
        f1=null,
        l=9608,
        f0=null
    },
    {
        v1=5,
        label0=NotRegistered,
        v0=2,
        label1=Registered,
        id1=Registered,
        type1=number,
        id0=NotRegistered,
        type0=number,
        f1=null,
        l=9641,
        f0=null
    },
    {
        v1=0,
        label0=NotRegistered,
        v0=3,
        label1=Registered,
        id1=Registered,
        type1=number,
        id0=NotRegistered,
        type0=number,
        f1=null,
        l=9650SIP,
        f0=null
    }
...
]

Step 1: Define the ChartModel I/F to map to the JSON datatable format.

Here’s an example of a ChartModel java object that maps to JSON object.

public abstract class ChartModel {
	
	private static ChartFactory factory = GWT.create(ChartFactory.class);
	private static JsonListReader<ChartObjects, ChartObject> reader = new JsonListReader<ChartObjects, ChartObject>(factory, ChartObjects.class );
	
	/**
	 * These constants are fixed per the value retrieved from JSON
	 */
	public static final String POINT = "point";
	public static final String COL_LABEL = "label";
	public static final String COL_TYPE = "type";
	public static final String COL_ID = "id";
	public static final String CELL_VALUE = "v";
	public static final String CELL_FORMATTED_VALUE = "f";
	public static final String ROW_LABEL = "l";
	
	/*
	 * Define the data model
	 */
	
	public interface ChartObjects extends ModelList<ChartObject>{}
	
	public interface ChartObject {
		String getTitle();
		ChartData getData();
	}
	
	public interface ChartData {
		List<ChartCol> getCols();	//represents # series
		List<ChartRow> getRows();	//an array of array of cells, i.e. all the data
	}
	
	public interface ChartCol {
		String getId();
		String getLabel();
		String getType();
	}
	
	public interface ChartRow {
		List<ChartCell> getC();	//an array of chart cells, one per column
		String getL();	//label for this group of chart cells (e.g. a timestamp)
	}
	
	public interface ChartCell {
		String getV();	//get value
		String getF();	//get formatted value
	}

	public abstract ListStore<DataPoint> getListStore(ChartObject chartObject) throws Exception;
...
}

You can see that I also added an abstract getListStore method that requires any consumer of the ChartObject to define how the ChartObject gets modeled to the ListStore.

Step 2. Create an Adapter that extends the ChartModel and implements the function getListStore.

Here’s an example of a StackedBarChart model (the models turned out to be different for multiple series in a stacked bar chart and for example, a pie chart. Line charts and stacked bar charts turned out to be the same.


public class StackedBarChartModel extends ChartModel {
	@Override
	public ListStore<DataPoint> getListStore(ChartObject chartObject) throws Exception {
		return new GXTChartModelAdapterImpl(chartObject).getSeries(ChartType.STACKED_BAR);
	}
}

You can see that this method instantiates a new adapter instance for GXT charts, passes in the chart object (modeled to the datatable JSON format) and returns the appropriate chart series (in this case a BarSeries GXT object).

Step 3. Map the chart object to a ListStore.

public class GXTChartModelAdapterImpl implements GXTChartModelAdapter {


	@Override
	public ListStore<DataPoint> getSeries(ChartType type) throws Exception {
		switch(type) {
			case LINE:
				return getLineSeriesListStore();
			case STACKED_BAR:
				return getStackedBarSeriesListStore();
			case BAR:
				return getLineSeriesListStore();
			case PIE:
				return getPieSeriesListStore();
			default:
				throw new Exception("Unknown data series type.");
		}
	}

/**
	 * Returns a ListStore for a BarSeries GXT object to add to a chart
	 * @return
	 */
	private ListStore<DataPoint> getStackedBarSeriesListStore() {
		ListStore<DataPoint> _listStore = new ListStore<DataPoint>(new ModelKeyProvider<DataPoint>() {
			@Override
			public String getKey(DataPoint item) {
				return String.valueOf(item.getKey());
			}
		});
		
		int numPoints = this.getNumPoints();  //The number of rows in the rows obj in ChartObject
		int numSeries = this.getNumSeries();  //The number of rows in the col obj in ChartObject
		
		for (int i = 0; i < numPoints; i++) {	
			//This key must be unique per DataPoint in the store
			DataPoint d = new DataPoint(indexStr(ChartModel.POINT,Random.nextInt()));
			
			for (int index = 0; index < numSeries; index++) {
				d.put(indexStr(ChartModel.COL_LABEL, index), chartObject.getData().getCols().get(index).getLabel());
				d.put(indexStr(ChartModel.COL_TYPE, index), chartObject.getData().getCols().get(index).getType());
				d.put(indexStr(ChartModel.COL_ID, index), chartObject.getData().getCols().get(index).getId());
			}
			
			ChartRow row = chartObject.getData().getRows().get(i);	//get the i-th point
			d.put(ChartModel.ROW_LABEL, row.getL());
			for (int j = 0; j < numSeries; j++) {
				if (row.getC().get(j) != null) {	//if ith-point is not blank
					d.put(indexStr(ChartModel.CELL_VALUE, j), row.getC().get(j).getV());
					d.put(indexStr(ChartModel.CELL_FORMATTED_VALUE, j), row.getC().get(j).getF());
				} else {	//otherwise, assume 0 value
					d.put(indexStr(ChartModel.CELL_VALUE, j), "0");
					d.put(indexStr(ChartModel.CELL_FORMATTED_VALUE, j), "");
				}
			}
			_listStore.add(d);
		}
		
		return _listStore;
	}
...
}

A few things to note here.

  • A DataPoint is basically just a map of keys and value pairs. Since the DataPoint object is specific to the ListStore, the implementation is in the interface GXTChartModelAdapter. To do this, I used the instructions here: https://www.sencha.com/blog/building-gxt-charts/
  • Next, you can see that for each bar I have (not each stack), I create a new DataPoint object that contains a the corresponding value of the specific row in the ChartObject.rows object. I know that each column corresponds to a stacked bar, so I increment the “v” value by 1 and use this as the key / value pair to the DataPoint.
  • Finally, because the number of columns defines the number of stacks I expect in each bar, if there isn’t a value (because our backend didn’t provide one), I assume the value is 0. This enforces that the v(n) in the first bar will correspond to v(n) in any subsequent bars.
  • I added an “l” value to each row in the ChartObject.rows obj which corresponds to a label for the entire stacked bar. I know that in the datatables JSON format you can specify the first row in each row obj to be a String and correspond to the label for each bar. However, it’s harder to enforce this type of thing in java and we expected numeric data back each time.


Step 4. Pass in the data into a Sencha GXT Chart!

Now the easy part. Essentially the data is transformed from the datatables format to a ListStore with each DataPoint key: v0 – vN corresponding to a stack in a bar.

Thus when instantiating the chart, this is all I had to do:

private BarSeries<DataPoint> createStackedBar() {

		BarSeries<DataPoint> series = new BarSeries<DataPoint>();
		series.setYAxisPosition(Position.LEFT);
		series.setColumn(true);
		series.setStacked(true);
		
		//numSeries is the number of bars you want stacked, it conforms to the chartObject.cols().length
		for (int i = 0; i < numSeries; i++) {
			MapValueProvider valueProvider = new MapValueProvider(ChartModel.CELL_VALUE + i);
			series.addYField(valueProvider);
		}
		
		return series;
	}

When I instantiate the BarSeries, I just pass in a key for each the number of stacked bars in my response data.

Finally, the finished product.

Screen Shot 2014-04-09 at 5.03.44 PM

Drill-down charting with Sencha GXT 3.1

My latest assignment at work was to use charting in GXT and build a widget that has the ability to render one chart and drill down into charts of greater detail when the user clicks on an object in the parent chart. To complicate matters, the data used to render the drill-down chart is variable depending on the parameters passed in through our REST API. Finally, I wanted to encapsulate all this behavior within a portlet that could be added to a portal container by the user. How did I solve this?

When I started out, I knew that there were already handlers that could be attached to a series within a chart, but as with all things Sencha, nothing is as easy as it seems at first glance. Furthermore, I was battling with Sencha’s currently buggy beta release of GXT 3.1, which led me more than once to waste several hours thinking that I was doing something wrong when in fact, there was a problem on Sencha’s end. At any rate, this is how I solved the problem:

      Step 1: Define a chart object whose store can be manipulated through a configuration

For the drill-down chart, I decided to implement my own object which takes an object called ChartConfig. My ChartConfig class is just an object containing various options on how to render the chart, including what optional parameters to pass into my REST call.

public class ChartConfig {

	public int height;
	public int width;
	public Boolean initiallyInvisible;
	public List<Color> seriesColors; //Color of series in order
	public HashMap<String, String> options;	//REST options

}

I then defined my custom chart object to take this configuration into account when rendering itself.

public class MyBarChart extends UpdateableChart implements IsWidget {	
	
	public RegistrationBarChart() {
		super();
		_model = new StackedBarChartModel();
	}
	
	public RegistrationBarChart(ChartConfig _config) {
		super(_config);
		_model = new StackedBarChartModel();
	}
	
    @Override
	public Widget asWidget() 
	{
		if (_chart == null) 
		{	
			try {
				buildChart();
			} catch (Exception e) {
				e.printStackTrace();
			}
		}

		return _chart;
	}
}

buildChart() is simply a method that uses the configuration, instantiates a new Chart and sets the ListStore that's used to render the chart. Don't worry about the StackedBarChartModel(). This is a custom data model and adapter that I had to implement to convert from JSON (following google's datatable format here: https://developers.google.com/chart/interactive/docs/reference#dataparam) to a ListStore consumable by Sencha (an entirely different battle. To be described in a follow-up post).

In any case, UpdateableChart is an abstract class that executes the REST API call based on the ChartConfig and regenerates the ListStore. Since I figured that all of our charts would follow the same logic here, I made it an abstract class with the subclasses needing to implement only one method: buildSeries();

public abstract class UpdateableChart {
	/**
	 * Abstract Methods
	 */
	public abstract void buildSeries();

	/**
	 * Refreshes the current chart with parameters passed in through the ChartConfig
	 * These parameters, in the HashMap<String,String> options map, get serialized
	 * and appended to the base REST URL.  
	 * 
	 * If options == NULL, the default url is used.
	 * 
	 * Upon success, the function will get a newstore and will call
	 * 	updateStore(newStore)
	 * @param c
	 */
	public void refresh(ChartConfig c) {
		String url = (c == null || c.getOptions().isEmpty()) ? _model.getListUrl() : serializeOptions(_model.getListUrl(), c.getOptions());
		
		_config = c;	//update config upon refresh
		
		RequestBuilder restReq = new RequestBuilder(RequestBuilder.GET,
				URL.encode(url)); //Use your flavor of request builder
		
		restReq.sendRequest(null, new () {

			@Override
			public void onSuccess(Response response) {
				JsonListReader<ChartObjects, ChartObject> reader = _model.getListReader();
				ListLoadResult<ChartObject> charts = reader.read(null, response.getText());
				//ListLoadResult<ChartObject> charts = reader.read(null, _resources.jsonPieData().getText());
				List<ChartObject> chartList = charts.getData();
				
				try {
					ChartObject chartObject = chartList.get(0);	//right now, only render first chart
					ListStore<DataPoint> newStore = _model.getListStore(chartObject);
					updateStore(newStore);
				} catch (Exception e) {
					e.printStackTrace();
				}
			}
			
			@Override
			public void onFailure(RestError restError) {
				//logger.error("Rest error" + restError.getErrorMessage());
			}
		});
	}
}

In the method above, you can see that after the REST Call is made and the JSON response is parsed into a ChartObject (basically a schema defined to work with Sencha's JSONReader), I use the adapter to convert the ChartObject into a ListStore which is then used to update the store which renders the chart. Here's an example of an implementation for my bar chart of buildSeries().

	@Override
	public void buildSeries() {
		for (int x = 0; x < _chart.getSeries().size(); x++) {
			_chart.removeSeries(_chart.getSeries(x));
		}
		
		BarSeries<DataPoint> series = createStackedBar();
		_chart.addSeries(series);
		
		refreshChart();  //regenerates chart axes
		setVisible(true);  //turn visible - after REST call returns
	}

In a nutshell, the work flow is like this:

  • Create a model to represent a ListStore which is used in a chart.
  • Instantiate a chart with a dummy store, which models this ListStore
  • Implement buildSeries() which will take the new ListStore object (in the class) and use it to regenerate the series
  • Re-render the chart
  • This framework makes it so that any charts that we add to our portal are able to be updated simply by calling the “refresh(ChartConfig config)” method.

        Step Two: From your entry point, add a method that will rerender a chart based on some variable parameter

    In our portlet, we have a pie chart that contains drill-downs to bar charts depending on what slice you pick. Before binding the drill-down event to the actual pie chart, I just wanted to make sure that charts could rerender properly.

    I did this by creating an entry point to the refresh(ChartConfig config) method on my bar chart object. I added a selector to our portlet widget and bound it to a “toggleView()” method which basically just generates the ChartConfig based on the selected value and calls refresh on the correct chart. Notice that I set both charts to visible(false) so that they don’t appear to the user. When implementing the buildSeries() method for both my pie and bar charts, I actually make the chart visible after the rest call returns.

    
    private void toggleViews(SeriesSelectionEvent<DataPoint> event) {
    		displayParent = !displayParent;
    		
    		//Show Pie Chart
    		if (displayParent) {	
    			_pieChart.setVisible(false);
    			_pieChart.refresh(_pieChartConfig);
    			_barChart.setVisible(false);
    			_backButton.disable();
    		}
    		
    		//Show Bar Chart (Drill Down)
    		else {
    			_pieChart.setVisible(false);
    			_barChart.setVisible(false);  //set invisible until rest call loads, buildSeries() will make the chart visible again
    			
    			//refresh registration bar chart depending on what slice is selected
    			_barChartConfig = _barChart.getChartConfig();
    			_barChartConfig.setOption(ChartConfig.TYPE, event.getItem().get(ChartModel.COL_LABEL));
    			_barChartConfig.setOption(ChartConfig.FILTER, "registrationState(EQ)"+event.getItem().get(ChartModel.COL_LABEL));
    			_barChartConfig.setOption(ChartConfig.LIMIT, "10");
    			_barChartConfig.setOption(ChartConfig.ORDER_BY, "count");
    			_barChartConfig.setOption(ChartConfig.ORDER_DIR, "DESC");
    			
    			if (_barChartConfig.getOption(ChartConfig.TYPE).equalsIgnoreCase(pc.registered())) {
    			   List<Color> c = new ArrayList<Color> ();
    			   c.add(new RGB(148, 174, 10));
    			   _barChartConfig.setSeriesColors(c);
    			} else if (_barChartConfig.getOption(ChartConfig.TYPE).equalsIgnoreCase(pc.unregistered())) {
    			   List<Color> c = new ArrayList<Color> ();
    			   c.add(new RGB(17, 95, 166));
    			   _barChartConfig.setSeriesColors(c);
    			} else if (_barChartConfig.getOption(ChartConfig.TYPE).equalsIgnoreCase(pc.partially_registered())) {
    			   List<Color> c = new ArrayList<Color> ();
    			   c.add(new RGB(166, 17, 32));
    			   _barChartConfig.setSeriesColors(c);
    			}
    			_barChart.refresh(_barChartConfig);
    			_backButton.enable();
    		}
    	}
    

    Finally, the last step was to bind the handler to the actual slices of my pie chart. One thing to keep in mind here. Because the handler is bound to the series of the pie chart, I couldn’t rebuild the pie series otherwise the event handler would actually fall off. Thus, I knew that after the first time the series was built, I needed to bind the handler, then leave it up to the ListStore to regenerate the pie chart. To do this, when I instantiate the pie chart, I had to set an empty ListStore to the store to render the pie chart. I’ve also included my buildSeries implementation for the pie chart below so you can see that I don’t rebuild the series at all with the pie chart.

    public class MyPieChart extends UpdateableChart implements IsWidget {
    	
    	private PieSeries<DataPoint> _series;
    	
        @Override
    	public Widget asWidget() 
    	{
    		if (_chart == null) 
    		{	
    			try {
    				buildChart();
    			} catch (Exception e) {
    				e.printStackTrace();
    			}
    			
    	    	//refresh(_config);	//refresh the chart store	
    		}
    
    		return _chart;
    	}
    	
    	/**
    	 * 	Iterates over each series and add a line series for each
    	 *  This is done only once.
    	 * @throws Exception 
    	 */
        @Override
    	public void buildChart() throws Exception {
    		_chart = new Chart<DataPoint>();
    		_chart.setDefaultInsets(10);
    		_store = _model.getBlankStore(numSeries, 1);   //This is where I bind the blank store to the chart
    		_chart.bindStore(_store);
    		_chart.setShadowChart(false);
    		_chart.setAnimated(true);		
    		_chart.setVisible(!_config.getInitiallyInvisible());
    		
    	    _series = createPieSeries();   //Note that this is only created once.
    	    _series.addColor(slice1);
    	    _series.addColor(slice2);
    	    _series.addColor(slice3);
    		_chart.addSeries(_series);
    		
    		final Legend<DataPoint> legend = new Legend<DataPoint>();
    		legend.setPosition(Position.RIGHT);
    		legend.setItemHighlighting(true);
    		legend.setItemHiding(true);
    		legend.getBorderConfig().setStrokeWidth(0);
    		_chart.setLegend(legend);
    		
    
    	}
     
            @Override
    	public void buildSeries() {
    		//Only build the Series the first time so we don't lose the selection handler binding
    		if (_store.size() > 0)
    			setVisible(true);
    	}
    }
    
        Step 3: Bind the drill down event handler to the series of the parent chart

    In the portlet widget, which as described above has both a pieChart and a barChart, after instantiating the pieChart, I bound the SeriesSelectionHandler to the pieChart in order to enable the drilldown. I didn’t have to wait for the REST call return to have data either, because I created a PieSeries with my dummy ListStore. This part I’m not that proud of. Because I bind the SeriesSelectionHandler outside the actual MyPieChart object, the implementation violates encapsulation. I could have implemented the SeriesSelectionHandler directly in MyPieChart, but I didn’t want to do this because I felt it would confuse developers later on why a PieChart had a BarChart object in it. I thought it would also set a bad precedent for development because it would seem like we needed to nest objects within objects whenever we wanted to create a drilldown. My implementation, though perhaps not ideal, increased our flexibility because it allows me to create PieCharts that don’t necessarily have to have barChart objects in them and yet still maintain the drilldown capability. In the future, we could create a datamodel that contains a tree-like hierarchy for parent/child drilldowns, where the back button always rerendered the parent, and the drilldown always rendered the child clicked. Something to think about. In any case, here is the binding of the toggleView() method to the SeriesSelectionEvent.

    	@Override
    	public Widget asWidget() {
    _pieChart.addSeriesHandlers(new SeriesSelectionHandler<DataPoint> () {
    
    				@Override
    				public void onSeriesSelection(
    						SeriesSelectionEvent<DataPoint> event) {
    					toggleViews(event);
    				}
    				
    			});
    }
    

    That’s it for now! Let me know if you have any questions or comments. In a post to come, I’ll describe how I modeled the google datatables JSON format into a ChartObject and how it’s consumed by GXT charts.

    The end result:

    The top level pie chart
    The top level pie chart

    Drilling down by clicking on a pie slice.
    Drilling down by clicking on a pie slice.