1

Closed

Stream parsing broken on Linux

description

I'm trying to use LinqToTwitter to consume Twitter's sample and filter streams. Everything works fine on Mono/Windows, but when I run the program on Mono/Ubuntu the status messages returned by LinqToTwitter are broken. Specifically, it appears the json strings are incorrectly split on Linux.

Test case:
using System;
using System.Linq;
using LinqToTwitter;

namespace StreamConsumerTest
{
    class Program
    {
        static bool isRunning = false;
        static void Main(string[] args)
        {
            while (true)
            {
                ConsumeStream();
                while (isRunning)
                    System.Threading.Thread.Sleep(1000);
            }
        }

        static TwitterContext twitterContext = null;
        static void CreateContext()
        {
            // oauth application keys
            var consumerKey = "*";
            var consumerSecret = "*";
            var accessToken = "*";
            var accessTokenSecret = "*";

            InMemoryCredentials credentials = new InMemoryCredentials()
            {
                ConsumerKey = consumerKey,
                ConsumerSecret = consumerSecret,
                AccessToken = accessTokenSecret,
                OAuthToken = accessToken
            };
            SingleUserAuthorizer authorizer = new SingleUserAuthorizer()
            {
                Credentials = credentials
            };
            twitterContext = new TwitterContext(authorizer)
            {
                Log = Console.Out
            };
        }

        static IQueryable<Streaming> GetStreamQuery()
        {
            var selection = from strm in twitterContext.Streaming
                            where strm.Type == StreamingType.Filter &&
                                strm.Track == "twitter"
                            select strm;
            return selection;
        }

        static void ConsumeStream()
        {
            isRunning = true;

            if (twitterContext == null)
                CreateContext();

            Console.WriteLine("Consuming Stream");

            int count = 0;
            var selection = GetStreamQuery();
            selection.StreamingCallback(stream =>
            {
                if (stream.Content.StartsWith("{") && stream.Content.EndsWith("}"))
                    Console.WriteLine("ok");
                else
                    Console.WriteLine("error");

                if (++count > 10)
                    stream.CloseStream();
            })
            .SingleOrDefault();
        }
    }
}
Closed Jul 9, 2013 at 4:38 PM by JoeMayo
Fix released with v2.1.07

comments

JoeMayo wrote Jun 16, 2013 at 7:03 PM

Thanks for letting me know.

jakobrog wrote Jun 17, 2013 at 7:38 AM

Hi Joe,

I used ILSpy to inspect the source code of the "net40-client" assembly (the one I use) and I believe I have found the problem. In TwitterExecute.ExecuteTwitterStream(object), there is a section that reads
if (list.Contains(10))
{
    byte[] array3 = list.Take(list.LastIndexOf(10) + 1).ToArray<byte>();
    string @string = Encoding.UTF8.GetString(array3, 0, array3.Length);
    list.RemoveRange(0, array3.Length);
    string[] array4 = @string.Split(new string[]
    {
        Environment.NewLine
    }, StringSplitOptions.None);
    for (int i = 0; i < array4.Length - 1; i++)
    {
        this.DoAsyncCallback(array4[i]);
    }
    array = new byte[8192];
    array2 = new byte[8192];
    list = new List<byte>();
    memoryStream.SetLength(0L);
}
According to Twitter, messages are separated in the stream by \n\r, and newlines inside messages are represented by \n (bytecode 10). In the quoted code, Environment.NewLine will correctly split messages on Windows, but on Linux messages will be split also by linebreaks inside messages, resulting in the error I observed.

The other splitting on byte 10 looks a little weird given that the character can occur also within messages, but I can't think it any case when it will break the parsing.

It would be awesome if you can update the latest binaries with this fix.

jakobrog wrote Jun 17, 2013 at 10:17 AM

Update: I downloaded the source and tested the proposed fix above, i.e. delimiting messages on "\r\n" (previous post incorrectly says "\n\r") instead of Environment.NewLine. This improved things, but there seems to be two more platform-specific bugs that I can't find the cause for.

First, even when splitting messages on the correct delimiter, the lines array (line 515 in TwitterExecute.cs) ends up containing garbage entries on Linux. Some are empty lines, and some are the last 2-3 characters in the message, duplicated as a separate line.

Second, messages still sometimes get cut off. I believe this happens when the stream contains more than one message, but somehow this also happens only Linux but not Windows.

For now I have replaced lines 515-519 with the ugly hack below, but you probably want to look into the problem in more detail.
string[] lines = outputString.Split(new string[] { "\r\n" }, new StringSplitOptions());
for (int i = 0; i < (lines.Length - 1); i++)
{
    if (lines[i].StartsWith("{") && lines[i].EndsWith("}"))
        DoAsyncCallback(lines[i]);
}

JoeMayo wrote Jun 17, 2013 at 3:12 PM

I thought that might be the problem. Thanks for the research. I'll have to set up a Linux/Ubuntu machine to test. Maybe another day or so.

jakobrog wrote Jun 26, 2013 at 11:36 AM

Have you been able to replicate this issue and/or had time to look into what the problem might be?

JoeMayo wrote Jun 29, 2013 at 8:42 PM

Hi @jakobrog,

I ended up doing something similar to what you suggest. I did notice that I'm getting some garbage on Linux in the form of extra lines that seem to be the last 2 or 3 characters of the previous post. That must be why you suggested the extra check before invoking DoAsyncCallback.

I'm running into multiple issues getting Ubuntu set up, which is slowing down trying to fix this. I'm at the point now that I can deploy and run binaries, but still not able to do a git clone.

I've checked this in and would be interested in seeing if it solves your problem.

@JoeMayo