LitJSON and UserStreamParser

Feb 24, 2013 at 5:35 PM
I've been updating the code in UserStreamParser (or userstreamex or UserStreamsParser -- it has some identity crisis) to remove the CreateType calls and replace them with the constructor that takes the JsonData instance. However, while doing that, I've had to look into the guts of LitJSON and some of the stuff in there is a little odd. While questioning why he didn't implement this or that (but went totally nuts implementing these other things), I visited the project page (link) and noticed that the last update was Oct 04, 2007 .

So that leads me to my question: any plans to replace LitJSON? You could even go all out and move to using JSON serialization for your entities and rip out huge chunks of code. You seem to already have code written to shim some of its deficiencies (e.g. your GetValue extensions). I suppose, though, as jobs go in this codebase, it's a big one and maybe not one that would pass the "is it broke?" check.

Some of the stuff that has annoyed me the most:
  • Why doesn't it have a Contains(string key) method? Why make me do data.Select(kv => kv.Key).Contains("name")?
  • Why is nothing commented? I don't even understand how you can conscionably release a library without, at least , all of the public members and all public classes commented.
  • Why did he prefer all of those explicit cast overloads to just writing a generic method? You obviously agree with me here :).
P.S. How do I write the markdown for an italic word followed by a comma that renders with no space after the word? e.g. "the italic, now" but with the word "italic" in italics and the same amount of spacing between it and the comma (none).
Coordinator
Feb 25, 2013 at 9:26 AM
Hi,

A lot of people will appreciate an update on UserStreamParser. I switched to JSON and Twitter API v1.1 requires it, so it's really the only option.

I don't know what affected the original author's design decisions, but I felt like the licensing (public domain) made it a good choice to pull into LINQ to Twitter. My design goals are to have a single assembly with only .NET Framework dependencies, supporting multiple versions on multiple platforms. The difficulty is in accommodating the differences in each .NET profile. At the time I began converting to Json, some versions of .NET didn't include all the functionality I needed. Sometimes Twitter exposes JSON that's difficult to parse, i.e. property names that were dates in the Trends API, so I needed more flexible options. LitJson was small enough to pull in without a lot of extra baggage and let me parse the pathological formats that existed at the time. Since integrating the code, I've modified it to handle more types and you've seen the strongly-typed convenience methods. It's worked and today I added a new API and parsed the JSON without much effort. Currently, I'm not inclined to replace this anytime in the near future - there are so many other things I have planned.

I'm still getting used to the new editor. I recall that italics is underlines in normal wiki pages.

@JoeMayo
Feb 26, 2013 at 2:59 AM
Yeah, having it already integrated and also having already done your own work on it is a big plus. The licensing thing is definitely something I didn't consider when I was busy being annoyed with its quirks :). I need to remember to use "pathological" to describe code, in the future. That's brilliant.

As you've sort of taken ownership of the LitJSON thing, I suppose the version of it in your code could just be polished up, no? Looking through the author's code, it does look like he had a good grasp of programming linguistics, just not so good with writing public interfaces. I guess that's fairly common.

Do you have some kind of a future roadmap for your plans for the LTT library?
Coordinator
Feb 26, 2013 at 3:57 AM
Polishing is welcome. I haven't written a roadmap yet, but off the top of my head, there are some items that are frequently asked for and/or other features that I think would add value:
  1. Full preparation for Twitter API v1.1. Twitter API v1.0 is deprecated and Twitter will start rolling blackouts next month. I'm still combing over parameters, entity properties, new APIs, and/or anything else I might have missed to maintain 100% coverage. I won't do anything big until we're past this phase because a lot of people will be doing emergency upgrades (most people wait until the last minute or are still blissfully unaware that this is happening) and I want to reduce as much risk as possible.
  2. True C# 5.0 async support. I have some ideas that I think will be pretty cool and fun to implement.
  3. Improve streaming support. There are various odds-and-ends here that would make streaming a better experience, such as moving to OAuth on public streams, better instrumentation, and cleaning up some complex back-off/error recovery code.
  4. Go full speed on Mono support. I've tested binary support on desktop Linux and everything works fine. I'd like to see if I can achieve code compatibility as well so that the Mono community can experience the same open-source-ness experience as in the Windows world. I have a barrier to cross on here because last time I looked MonoTouch didn't fully support Generic type reflection, particularly in the area that LINQ to Twitter uses. Either I have to change my code, wait for MonoTouch to support this, or go contribute the code myself. My ultimate goal is to have LINQ to Twitter operating on every CLR compatible platform possible and having nearly covered the entire Microsoft stack, Mono is a natural destination goal.
  5. I've started a Portable Class Library, but ran into a few bumps that have slowed me down. I need IQueryable support on all .NET profiles (which hasn't always been the case) and MS might be close to that. This would give me access to Xbox too.
  6. There's more I'd like to do with OAuth. My current implementation is borrowed from another open source project, which is more generic than I need. I've wrapped all kinds of code around it and it's not as clean as I would like. Because of the complexity of the code around it, that's probably a library I would be inclined to rewrite one day. I'd also like to provide more documentation and more sophisticated samples because it's a particularly difficult area for most people to adapt to. There are also new libraries in the .NET Framework that might provide opportunities for more authorizers, i.e. the WebApplicationBroker totally transformed authentication in Windows 8 applications via the WinRtAuthorizer. There are new libraries for ASP.NET that I need to look at too.
  7. The Issues list on this site has some interesting problems to solve too, some that dovetail with what I'm talking about here. Items like GZip support are tough because some .NET profiles have this support and others don't. Occasionally, someone will try to build a query in a different way, exposing an edge-case that I haven't covered yet - the work-around typically being Expression Trees, but my goal is to make the experience easier for developers.
Just a few thoughts. So between support, fixing bugs, and keeping up with a continually evolving Twitter API - I do have a few longer range goals. If you or anyone else would like to contribute, let me know.

@JoeMayo
Feb 26, 2013 at 4:22 AM
Edited Feb 26, 2013 at 4:59 AM
Yeah, I know I have sort of had some trouble properly grokking how to set up the authorizers and which one to use where (also what the difference between the credentials are and what applies to what). My first test project was using the PinAuthorizer, but then I needed to switch to one that allowed me to specify all four keys from a static source, so I think that's SingleUserAuthorizer, but the name threw me at first. They're all authorizing a single user, right? That's the idea, as far as I got it: always the app+acting-user; so, at first I wasn't sure that was the right one.

WRT the entities themselves, the one thing I found myself confused by is what the separation of input and output properties was and how, sometimes, properties that look like they should have a value, given their name, don't have a value, and that value is actually elsewhere. The LINQ-like wrapper is really a brilliant piece of abstraction, but I wonder if a separation of Criteria and Result entities might be/might have been a bit clearer to understand.

That sort of ties into another thing ... I get that causing the "collection" to be "enumerated" is the thing that fires off the queries. So, anything that does that like ToList() or ToArray() or First() or whatever. But, if the two entity types did separate, you could make your own custom thing that's intended to be the "final" method that would both cause the enumeration and project the sequence of criteria objects into a sequence of result objects. Of course, this just the result of a weekend's worth of thought and nothing more than me just throwing ideas at you for the sake of transmitting my experience using this stuff :).

That's funny you should say...
[...] such as moving to OAuth on public streams [...]
Does that mean it's not set up that way? Is that why I'm still trying to figure out why it's giving me a 401 Unauthorized while the REST stuff is working?

As for Mono/portable ... the Expression code is DLR stuff, right? Is that even implemented in Mono or the portable versions? Seems like that kind meta-programming programming would be ... a non-trivial bit of code to implement in a runtime.

Supporting async would be pretty cool. I've been looking for a reason to use it at work and I just haven't found the right hole for that shaped peg. That would almost require you to fork the code, though, right (for the purposes of compatibility with previous framework versions)? Unless you could somehow pull the execution code out into modules where one does it the old way and one does it the new way, I guess.

Oh, one other thing, why do you have the repository folder structure set up so differently from the solution's folder structure? Just curiosity, no judgment :).

EDIT : I just looked at the top of the streaming example. So that means I have to type a literal user/password for the streaming to work? If that's the case, couldn't LinqToTwitter give a bit more of a warning that that's what I was missing? :)

EDIT2 : Okay, I also just loaded up the LinqToTwitterDemo project. Now I see where all the documentation examples come from and why none of them show how the twitterCtx variable is initialized. Looking through this now...

EDIT3 : So, I think I need to remember to read more of the source before I complain about a problem :). I got it to connect with the user/password. I definitely agree that getting OAuth to work for that would be good though. Kind of odd to have to do it this way.
Coordinator
Feb 26, 2013 at 5:06 AM
That's the crux of why you need deferred execution - so you can build up the query in different pieces and then execute when ready. A typical example is implementing Search, where some filters are optional.

My first implementation of streams was with username/password for public streams - I don't even recall if OAuth was available for them back then. The more I talk about it, the more I realize that converting those to OAuth needs to be higher on the list.

What I mean by Expressions are in support of the LINQ provider. The engine for converting a query relies on the Visitor pattern on Expression trees. Lambdas and Expressions can be used together where the Expression is the data representation of the code, but the Lambda is the executable representation of the code. You can switch back and forth, which gives you the power to create complex queries where the basic syntax doesn't allow. i.e. the Standard Where operator performs AND operations, but not OR, and you can use Expressions to build a Where OR query. Again, a typical use case for this is in supporting searches. There's an example of Expressions in the SearchDemos in the downloadable source code. I don't see an association with DLR, but Mono does support Expressions.

I have an MSBuild project that I use to build and deploy releases and NuGet packages. My thoughts were to separate the code via pre-processing directives and let the build select the appropriate code, depending on whether the feature was supported in a given version. This has worked pretty well so far to separate various .NET profiles - a little tedious at times, but still keeps me in a single assembly.

@JoeMayo
Feb 26, 2013 at 5:35 AM
JoeMayo wrote:
That's the crux of why you need deferred execution - so you can build up the query in different pieces and then execute when ready. A typical example is implementing Search, where some filters are optional.
Yeah, I get the deferred execution thing, I just don't think I like the dual-purpose entities. Having the clear boundary between query building and execution which also "projects" the criteria objects into the result objects I described would separate the two "phases" better, I think. In a normal LINQ scenario, you wouldn't do...
List<User> users = new List<User>
{
  new User
  {
    Name = "shibumi"
  }
};
...and then send that to the server expecting to find the user with the name "shibumi" in the list with all his information. So, here it seems a little odd, given that the LINQ "stuff" is what this library is intended to imitate. If you make the concession that it's not really exactly like LINQ, then you could end up with something like...
statuses = context.Status // IEnumerable<StatusCriteria>
    .Where(s =>
        s.Type == StatusType.User &&
        s.ScreenName == "shibumi" &&
        s.Count == 50
    )
    .ToResponse() // or whatever name; this enumerates and projects
    .Where(u => // IEnumerable<Status>
        u.Entities.UrlMentions.Any()
    );
That would get any of the last 50 tweets that had any URLs in them. Maybe it's just a preference thing.
.
My first implementation of streams was with username/password for public streams - I don't even recall if OAuth was available for them back then. The more I talk about it, the more I realize that converting those to OAuth needs to be higher on the list.
Is it that different to use OAuth for streams than for the REST stuff? Can the OAuth library you're using do what needs to be done?
.
What I mean by Expressions are in support of the LINQ provider. The engine for converting a query relies on the Visitor pattern on Expression trees. Lambdas and Expressions can be used together where the Expression is the data representation of the code, but the Lambda is the executable representation of the code. You can switch back and forth, which gives you the power to create complex queries where the basic syntax doesn't allow. i.e. the Standard Where operator performs AND operations, but not OR, and you can use Expressions to build a Where OR query. Again, a typical use case for this is in supporting searches. There's an example of Expressions in the SearchDemos in the downloadable source code. I don't see an association with DLR, but Mono does support Expressions.
I was under the impression that the Expression stuff was part of the Dynamic Language Runtime which has the stuff to do the at-runtime manipulation of those expression trees. I have written a lot of code to use them at work (e.g.); it's pretty great stuff. Like I said, the LINQ abstraction idea was great. The converting the Where clause into parameters thing was a very sneaky and good use of that stuff, IMO.
.
I have an MSBuild project that I use to build and deploy releases and NuGet packages. My thoughts were to separate the code via pre-processing directives and let the build select the appropriate code, depending on whether the feature was supported in a given version. This has worked pretty well so far to separate various .NET profiles - a little tedious at times, but still keeps me in a single assembly.
Ahh, yeah that sounds like a good reason.

Agghh, how do I get those quotes spaced out without the periods... Every time I use Markdown I find something annoying I can't format properly.
Mar 3, 2013 at 3:56 PM
I've thought about this over the week and I can see that putting that boundary there would break the fantasy of the IQueryable sets. I think, really, an improvement that would go a long way is to make sure that, whenever it's possible to do so -- whenever the data's there, put data into the "query" fields when the API call comes back.

For example, a User query by UserID should return with not only the stuff in Identifier filled out, but also the "outside" ScreenName and the ID property (not entirely sure what that one is for).

I guess in the ideal form of these magical LINQ entities there would be no duplication because the idea is that you're supposed treat the set as if it were already filled out with data. That said ... Where(u => u.Identifier.ScreenName == "Person") is longer than Where(u => u.ScreenName == "Person") and there are definitely some discoverability issues if the fields that can be used as filters are just scattered all around. Also, the filter fields that aren't data fields still have to go somewhere. Meh. It's an interesting puzzle.
Coordinator
Mar 3, 2013 at 5:07 PM
Yes, it's definitely a trade-off. You can see that I created the Identifier property to keep what Twitter returns away from the query filters of the same name. BTW, the ID field has historical significance. In the early days of the Twitter API, there was only ID, but not UserID and ScreenName. ID could hold either a user ID or a screen name. Twitter saw the ambiguity in that and created the UserID and ScreenName, but left ID in-place and populate it with UserID so it won't break any code.