Pulling bulk tweets from a list

May 9, 2012 at 3:31 PM
Edited May 9, 2012 at 3:43 PM

I'm trying to pull a large number of statuses from a list. I looked in the demo source code and couldn't find any examples.

I've created the following:

 

using TwitterList = LinqToTwitter.List;
.
.
.

public static List<Status> GetTwitterListStatuses(TwitterContext twitterCtx, 
                                                  string slug, 
                                                  string ownerScreenName, 
                                                  ulong sinceID)        
{       
    TwitterList statusList = null;
    try
    {
        statusList = (twitterCtx.List.Where(list => list.Type == ListType.Statuses &&                                                                                                  list.OwnerScreenName == slug &&
                                                    list.Slug == ownerScreenName &&
                                                    list.SinceID == sinceID))
                     .First();
    }
    catch (TwitterQueryException ex)
    {
        Console.WriteLine("TwitterQueryException occurred: {0}", ex.Message);
        Console.WriteLine("Request sent to Twitter: {0}", ex.Response.Request);
        Console.WriteLine("Response from Twitter: {0}", ex.Response.Error);
    }
    
    List<Status> statuses = statusList != null ? statusList.Statuses : new List<Status>();            
    return statuses;
}

 

My problem is that I keep getting 404 errors which I believe have to do with trying to pull too many statuses at a time. Is there some means to page out chunks of status from a list. It's not intuitively obvious to me how to do it.

Here's a sample output I get from the catch clause:

Calling GetTwitterListStatuses slug=indianapolis, ownerScreenName=jwpalmer ...
TwitterQueryException occurred: Error while querying Twitter.
Request sent to Twitter: /1/lists/statuses.xml?owner_screen_name=indianapolis&slug=jwpalmer&since_id=200208522003234816
Response from Twitter: Not found

 

Thanks in advance for any help

Dean

May 10, 2012 at 11:32 AM

After checking for a response to my request, I copied and pasted the twitter requests shown in my output to my browser. I got the not found response as expected. However, I did a double take and noted that I have the slug and ownerScreenName attributes reversed. I quickly swapped them in the URL of my browser and Voila! I got a real output response from Twitter.

 

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<list>
<id>1498148</id>
<name>indianapolis</name>
<full_name>@jwpalmer/indianapolis</full_name>
<slug>indianapolis</slug>
<description/>
<subscriber_count>17</subscriber_count>
<member_count>500</member_count>
<uri>/jwpalmer/indianapolis</uri>
<created_at>Sat Oct 31 19:19:05 +0000 2009</created_at>
<following>false</following>
<mode>public</mode>
<user>
<id>14574182</id>
<name>John Palmer</name>
<screen_name>jwpalmer</screen_name>
<location>Indianapolis</location>
<description>
I'm too busy marketing you to manage me. Oh, I did find time to co-found @IndySM.
</description>
<profile_image_url>
http://a0.twimg.com/profile_images/1776033070/jwpicon_normal.png
</profile_image_url>
<profile_image_url_https>
https://si0.twimg.com/profile_images/1776033070/jwpicon_normal.png
</profile_image_url_https>
<url>http://www.johnwpalmer.com</url>
<protected>false</protected>
<followers_count>2125</followers_count>
<profile_background_color>e8e8e8</profile_background_color>
<profile_text_color>000000</profile_text_color>
<profile_link_color>034569</profile_link_color>
<profile_sidebar_fill_color>63AAD1</profile_sidebar_fill_color>
<profile_sidebar_border_color>ffffff</profile_sidebar_border_color>
<friends_count>458</friends_count>
<created_at>Mon Apr 28 19:46:20 +0000 2008</created_at>
<favourites_count>14</favourites_count>
<utc_offset>-18000</utc_offset>
<time_zone>Indiana (East)</time_zone>
<profile_background_image_url>
http://a0.twimg.com/profile_background_images/287469670/bg-tiler.png
</profile_background_image_url>
<profile_background_image_url_https>
https://si0.twimg.com/profile_background_images/287469670/bg-tiler.png
</profile_background_image_url_https>
<profile_background_tile>true</profile_background_tile>
<profile_use_background_image>true</profile_use_background_image>
<notifications>false</notifications>
<geo_enabled>false</geo_enabled>
<verified>false</verified>
<following>false</following>
<statuses_count>2496</statuses_count>
<lang>en</lang>
<contributors_enabled>false</contributors_enabled>
<follow_request_sent>false</follow_request_sent>
<listed_count>88</listed_count>
<show_all_inline_media>true</show_all_inline_media>
<default_profile>false</default_profile>
<default_profile_image>false</default_profile_image>
<is_translator>false</is_translator>
</user>
</list>

This is all well and good and puts me further down the road to what I am trying to do. 

However, my original post question still remains. How do I pull bulk statuses (tweets) from a list request. If I have a large number of expected statuses I assume there is a 200 statuses per response page in a reply. (This assumption being from my recollection that LinqToTwitter caps statuses at 200 but Twitter caps them at 3200 ? or was it 3500? ). In any case, is there a decent example of paging through a large number of status to retrieve up to the maximum that Twitter will allow from an authenticated client application given Twitter's 350 requests per hour limitation?

Coordinator
May 14, 2012 at 2:35 AM

Hi Dean,

You should use cursors.  Here's an example:

        private static void ShowFollowersWithCursorDemo(TwitterContext twitterCtx)
        {
            int pageNumber = 1;

            // "-1" means to begin on the first page
            string nextCursor = "-1";

            // cursor will be "0" when no more pages
            // notice that I'm checking for null/empty - don't trust data
            while (!string.IsNullOrEmpty(nextCursor) && nextCursor != "0")
            {
                var followers =
                    (from follower in twitterCtx.SocialGraph
                     where follower.Type == SocialGraphType.Followers &&
                           follower.ID == "15411837" &&
                           follower.Cursor == nextCursor // <-- set this to use cursors
                     select follower)
                     .FirstOrDefault();

                Console.WriteLine(
                    "Page #" + pageNumber + " has " + followers.IDs.Count + " IDs.");

                // use the cursor for the next page
                // this is not a page number, but a marker (cursor)
                // to tell Twitter which page to return
                nextCursor = followers.CursorMovement.Next;
                pageNumber++;
            }
        }

This example uses SocialGraph, but the cursor implementation is relatively consistent accross all APIs that support them.

Joe

May 14, 2012 at 10:10 PM
Edited May 18, 2012 at 9:46 PM

I implemented pulling bulk tweets similar to your example:

 

 

public static List<Status> GetTwitterListStatuses(TwitterContext twitterCtx, string slug, string ownerScreenName, UInt64 sinceID = minSinceID, int batchTweets = maxtweetsPerPage, int maxTweets = maxTweetsPerRequest)
{
    if (sinceID <= minSinceID) throw new ArgumentOutOfRangeException("sinceID");
    if (batchTweets >= maxtweetsPerPage) throw new ArgumentOutOfRangeException("batchTweets");
    if (maxTweets >= maxTweetsPerRequest) throw new ArgumentOutOfRangeException("maxTweets");

    var statuses  = new List<Status>();
    try
    {
        string nextCursor = "-1";
        while (!string.IsNullOrEmpty(nextCursor) && nextCursor != "0")
        {
            List statusList = (twitterCtx.List.Where(list => list.Type == ListType.Statuses &&
                                                                list.OwnerScreenName == ownerScreenName &&
                                                                list.Slug == slug &&
                                                                list.SinceID == sinceID &&
                                                                list.Count == maxTweets &&
                                                                list.Cursor == nextCursor
                                                        )).First();

            if (statusList == null)
                break;

            statuses.AddRange(statusList.Statuses);
            nextCursor = statusList.CursorMovement.Next;
        }
    }
    catch (TwitterQueryException ex)
    {
        ex.LogError();
    }

    return statuses;
}


I get a different number of tweets each time I run using the same sinceID. Some runs produce a list with less statuses than a previous run. For instance, I ran once and got 728, 738 and 720 tweets. I also noticed I get all the tweets in one pull request. The while loop never iterates more than once when I track it in the debugger. 
What could be causing this? Anyone see anything wrong with the code?
Yes, I know that I'm not using the batchTweets parameter passed into the method in the code. I had tried to use it for a mechanism to pull 200 tweets at a time up to the max tweets of 3200. I couldn't get it working. Sometimes I would get TwitterQryExceptions with different response. One time it would be a slew of XML information. Another time, I would get a bad request error. Heck, I even got the message that I reached my 150 max requests per client even though I successfully used 

DoSingleUserAuthorization  without any problems getting authenticated.


Thanks Dean
May 20, 2012 at 5:45 PM

Well, it looks like I am going to have to dump my use of Linqtotwitter. No one has been able to help me do something as simple as pulling bulk tweets from a list. I've perused all the documentation and the source code that I could find. There is not one useful example of grabbing all the tweets in a list in bulk. I've spent more time than I should trying to make Linqtotwitter do something this simple. The example code I provided has gotten no response. I take it that no one is really using Lingtotwitter in this way. I thought the idea of using a cursor would get me what I needed but it got me nowhere. The cursor never moves and the while loop executes only once. If I try to specify an extremely large could for the number of tweets to pull, I get mixed results. Sometimes I will get a TwitterQryExcception. Other times, I will get Bad Request errors. Sometime, I get statuses but the number varies. One would expect that if I request 3200 tweets with only 750 existing that I would get all 750. Nope, sometimes I get 720 or 738. With the everything hidden from me I cannot debug what is actually happening. I had big hopes of bypassing a large learning curve of the Twitter Api and using Linqtotwitter. However, it appears the simple task of pulling all the tweets from a certain tweetID up to the current moment is beyond its capabilities. If such capability does exist with Linqtotwitter, it certainly is not documented to be easily found. 

Coordinator
May 20, 2012 at 6:51 PM

Hi Dean,

I was looking at this as your post arrived.  Between normal work, the Windows 8 release, a certificate bug that's plaguing Windows Phone, doing code-camp presentations this weekend, and whatever else happens, free time has been rare.  I totally agree that this should not be so hard to do and that documentation and examples in this area should be improved.  As I tweeted a couple days ago, I'm still going to look into this.

Joe

May 20, 2012 at 8:05 PM

Joe,

Thanks for your reply. I appreciate your diligence. I understand about being swamped. As I dig further into Linqtotwitter I am impressed by all that you've done. Being a big fan of LINQ was what drew me to the project in the first place.

Currently, I'm in a bit of a panic mode because of my work tasks. All I am trying to do is authenticate to twitter via SingleUserAuthorizer and access a twitter list from our company owned account. I then need to pull all the tweets from that list since the tweet Id of the last time I attempted to pull them. Initially, I am hard-coding a tweet id value for my sinceID. Once the "pump is primed" so to speak, I will obtain the sinceID from our database. Basically, I only need to create a simple console tweet dumper to pull tweets from our twitter list and dump them into our database. The console app will be run periodically to grab new tweets and shove them into our DB. By doing this, our web application won't have to deal with interacting with Twitter and can just pull content from our local DB.

I would love to get past this bump on pulling tweets from a list. I do apologize for sounding frustrated. I've got work tasks piling up on me and this one is my current sticking point. It's frustrating because I feel like I'm on the verge of figuring it out but am not getting traction. I am sure there is a simple solution to the problem. If I were more knowledgeable of the inner workings of Linqtotwitter, I would probably figure out the solution. I tried pulling the source code from CodePlex but the folders show up empty. There's no code to peruse other than the demos which happen to not provide much more help in this area. Unfortunately, I can only glean so much from the Object Browser in VS2010. The rest I've learned from interacting with VS debugger.

I just pulled down the recent beta zips hoping there might be more help available. Unfortunately, the new DLL broke my code. Apparently there was some minor changes. I cleaned up most of my problems from that. I just need to know what the mappings from Friend and Follower are to Contributor and Contributee. It's not apparent to me which is which.

Anyway, I would like to help contribute to Linqtotwitter as soon as I get past my learning curve and this current stuck task. I'm delving into the parallel computing with TPL and am using it on my current task wherever possible to increase performance on our server. I'm not sure if you are using PLINQ in Linqtotwtter but if not, it might be something to consider.

Here's an update of the current method I'm using that is giving me headaches:

public static List<Status> GetTwitterListStatuses(TwitterContext twitterCtx, string slug, string ownerScreenName, ulong sinceID = minSinceID, int batchTweets = TwitterLists.maxtweetsPerPage, int maxTweets = TwitterLists.maxTweetsPerRequest)
{
    if (sinceID < TwitterLists.minSinceID) throw new ArgumentOutOfRangeException("sinceID");
    if (batchTweets > TwitterLists.maxtweetsPerPage) throw new ArgumentOutOfRangeException("batchTweets");
    if (maxTweets > TwitterLists.maxTweetsPerRequest) throw new ArgumentOutOfRangeException("maxTweets");

    var statuses  = new List<Status>();
    try
    {
        int pageNumber = 1;
        string nextCursor = "-1";
        while (!string.IsNullOrEmpty(nextCursor) && nextCursor != "0")
        {
            List statusList = (twitterCtx.List.Where(list => list.Type == ListType.Statuses 
                                                                && list.OwnerScreenName == ownerScreenName 
                                                                && list.Slug == slug 
                                                                && list.SinceID == sinceID 
                                                                //&& list.Page == pageNumber 
                                                                && list.Count == maxTweets 
                                                                && list.Cursor == nextCursor 
                                                        )).First();

            if (statusList == null)
                break;

            statuses.AddRange(statusList.Statuses);
            pageNumber++;
            nextCursor = statusList.CursorMovement.Next;
        }
    }
    catch (TwitterQueryException ex)
    {
        ex.LogError();
    }

    return statuses;
}

I thought the cursor was used to pull tweets in page chunks determined by list.Count and list.Page properties. I've tried setting the list.Count to values of 200  and specifying with pagenumber by assigning it to list.Page. This pulls some number of tweets (less than 200 .. about ~ 179 ) but seems to keep pulling the same set of tweets (even when I jump back to setting the nextCursor in the loop via the debugger). The nextCursor when changed with nextCursor = statusList.CursorMovement.Next; always results in "" value and terminates the while loop.

The most luck I've had with pulling bulk tweets is with specifying a large value such as 3200 for the list.Count. However, when I do that, I get varying success. As I've mentioned, sometimes I get TwitterQuerryExceptions and other times I successfully get tweets. However, when I do get tweets, I get a different number each time. I can pull 738 one time and 728 the next. I could understand if the numbers increased slightly  due to new or recent tweets. If I'm using the same sinceID, I wouldn't expect to see subsequent tweet pulls to have a lower number of tweets than in a different run. This part was baffling. I can expect that Twitter may have some network glitches to cause occasional bad requests resulting in the TwitterQuerryException being thrown. However, our scheduled runs of the console app would handle this because the occasional glitch would result in no update to the latest tweet ID and the next run would just pick up where it left off.

The calling code is this:

            Console.WriteLine("Calling GetTwitterListStatuses slug={0}, ownerScreenName={1} ... ", slug, ownerScreenName);
            var statuses = TwitterLists.GetTwitterListStatuses(TwitterCtx, slug, ownerScreenName, sinceID, TwitterLists.maxtweetsPerPage, maxTweets);
            statuses.ForEach(item => item.Display());

            //Convert statuses to Tweets
            List<Tweet> tweets = statuses.ConvertAll(status => status.ToTweet());
            return tweets;

My code utilizes a few extensions methods:

public static Tweet ToTweet(this LinqToTwitter.Status status);
public static void Display(this LinqToTwitter.Status status);
public static void Display(this List twitterList);
public static string AsString(this TwitterQueryException ex);
public static void Display(this TwitterQueryException ex);
public static void LogError(this TwitterQueryException ex);

 

The bulk of the issue really comes down to getting this method to work.

public static List<Status> GetTwitterListStatuses(TwitterContext twitterCtx, string slug, string ownerScreenName, ulong sinceID = minSinceID, int batchTweets = TwitterLists.maxtweetsPerPage, int maxTweets = TwitterLists.maxTweetsPerRequest);

 

If there is anything you can do to help me push past this bump let me know. I think whatever solution is determined will ultimately benefit all Linqtotwitter users. 

Thanks

Dean

Coordinator
May 20, 2012 at 8:41 PM

Hi Dean,

Here's what the problem is - ListType.Statuses doesn't use cursors. Some of the other ListType's do, but this one doesn't. API's have different ways to page and I didn't catch the difference on this one and apologize. Instead, you can use SinceID and MaxID to page through results. Here's an example of the proper way to page through the results:

        /// <summary>
        /// Gets a list of statuses for specified list
        /// </summary>
        /// <param name="twitterCtx">TwitterContext</param>
        private static void GetListStatusesDemo(TwitterContext twitterCtx)
        {
            int maxStatuses = 30;
            int lastStatusCount = 0;
            ulong sinceID = 204251866668871681; // last tweet processed on previous query
            ulong maxID;
            int count = 10;
            var statusList = new List<Status>();

            // only count
            var listResponse =
                (from list in twitterCtx.List
                 where list.Type == ListType.Statuses &&
                       list.OwnerScreenName == "JoeMayo" &&
                       list.Slug == "dotnettwittterdevs" &&
                       list.IncludeRetweets == true &&
                       list.Count == count
                 select list)
                .First();

            List<Status> newStatuses = listResponse.Statuses;
            maxID = newStatuses.Min(status => ulong.Parse(status.StatusID)) - 1; // first tweet processed on current query
            statusList.AddRange(newStatuses);

            do
            {
                // now add sinceID and maxID
                listResponse =
                    (from list in twitterCtx.List
                     where list.Type == ListType.Statuses &&
                           list.OwnerScreenName == "JoeMayo" &&
                           list.Slug == "dotnettwittterdevs" &&
                           list.IncludeRetweets == true &&
                           list.Count == count &&
                           list.SinceID == sinceID &&
                           list.MaxID == maxID
                     select list)
                    .First();

                newStatuses = listResponse.Statuses;
                maxID = newStatuses.Min(status => ulong.Parse(status.StatusID)) - 1; // first tweet processed on current query
                statusList.AddRange(newStatuses);

                lastStatusCount = newStatuses.Count;
            }
            while (lastStatusCount != 0 && statusList.Count < maxStatuses);

            for (int i = 0; i < statusList.Count; i++)
            {
                Status status = statusList[i];

                Console.WriteLine("{0, 4}. [{1}] User: {2}\nStatus: {3}",
                    i + 1, status.StatusID, status.User.Name, status.Text);
            }
        }

A quick overview of key points:

  1. Set SinceID to the last ID processed. If it has never been set, ensure it is set to a sufficiently low ID number to ensure you get all the tweets you need.
  2. Perform your first query with only Count - don't include SinceID or MaxID. You might experiment with including MaxID to avoid pulling duplicates when the number of new statuses in the list since the last query is less than count.
  3. If you don't specify true for IncludeRetweets, you'll receive count or less statuses, but true will return count statuses.

For reference, here are the technical specs for the lists/statuses endpoint:

https://dev.twitter.com/docs/api/1/get/lists/statuses

Here's some guidance that explains Twitter paging in more detail:

https://dev.twitter.com/docs/working-with-timelines

I apologize for accidentally pointing you in the wrong direction.  Hopefully, this will help.

Joe

May 21, 2012 at 1:56 AM
Edited May 21, 2012 at 2:09 AM

Joe

My sincerest thanks for your last reply. It was just what I needed to push me past my block and get me to doing batch status pulls from my list. It took me a little bit to understand how to effectively use the SinceID and MaxID values with the list. Each iteration of the loop ends up getting older and older statuses until I've reached the SinceID of the status from my last run.

Since I upgraded to the latest Beta of Linqtotwitter I found that IncludeRetweets was deprecated and is effectively set to true. I ended up removing its use from my list pull. It didn't seem to have any adverse affect on what I wanted. I also added some conditional parameters to prevent duplicates from being pulled into my List<Status>. Finally, I refactored it to create a new method to encapsulate the duplicated parts of the loop and obtain the new MaxID from the latest batch of statuses. 

Here's what I ended up with in case you or others may find it a useful example.

public class TwitterLists
{
    public const ulong minSinceID = 193818258279895040UL;
    public const int maxTweetsPerRequest = 3200;
    public const int maxtweetsPerBatch = 200;
}

/// <summary>
/// Gets a list of statuses for specified list
/// </summary>
/// <param name="twitterCtx">TwitterContext</param>
/// <param name="slug">The slug.</param>
/// <param name="ownerScreenName">ScreenName of the list owner.</param>
/// <param name="sinceID">The since ID.</param>
/// <param name="batchTweetCount">The # tweets per batch pull request.</param>
/// <param name="maxTweetCount">The maximum # of tweets to pull.</param>
/// <returns></returns>
public static List<Status> GetTwitterListStatuses(TwitterContext twitterCtx, string slug, string ownerScreenName, ulong sinceID = minSinceID, int batchTweetCount = TwitterLists.maxtweetsPerBatch, int maxTweetCount = TwitterLists.maxTweetsPerRequest)
{
    if (twitterCtx == null) throw new ArgumentNullException("twitterCtx");
    if (String.IsNullOrEmpty(slug)) throw new ArgumentNullException("slug");
    if (String.IsNullOrEmpty(ownerScreenName)) throw new ArgumentNullException("ownerScreenName");
    if (sinceID < TwitterLists.minSinceID) throw new ArgumentOutOfRangeException("sinceID");
    if (batchTweetCount > TwitterLists.maxtweetsPerBatch || batchTweetCount < 1 ) throw new ArgumentOutOfRangeException("batchTweets");
    if (maxTweetCount > TwitterLists.maxTweetsPerRequest || maxTweetCount < 1 ) throw new ArgumentOutOfRangeException("maxTweets");

    var statuses  = new List<Status>();
    //var statusList = new List<Status>();
    try
    {
        int count = batchTweetCount;
        List<Status> batchStatuses;

        List listResponse = (twitterCtx.List.Where(list => list.Type == ListType.Statuses 
                                                            && list.OwnerScreenName == ownerScreenName 
                                                            && list.Slug == slug 
                                                            && list.Count == count
                                                    )).First();

        // newMaxID is the last status ID processed on previous query
        var newMaxID = GetBatchedStatuses(statuses, listResponse, sinceID, out batchStatuses);

        do
        {
            // now use sinceID and maxID
            listResponse = (twitterCtx.List.Where(list => list.Type == ListType.Statuses
                                                            && list.OwnerScreenName == ownerScreenName
                                                            && list.Slug == slug
                                                            && list.Count == count
                                                            && list.SinceID == sinceID
                                                            && list.MaxID == newMaxID
                                                    )).First();

            newMaxID = GetBatchedStatuses(statuses, listResponse, sinceID, out batchStatuses);
        }
        while (batchStatuses != null  && batchStatuses.Count > 0 
                && statuses.Count < maxTweetCount  && newMaxID > sinceID);

    }
    catch (TwitterQueryException ex)
    {
        ex.LogError();
    }
    return statuses.Distinct().ToList();
}

private static ulong GetBatchedStatuses(List<Status> statuses, List listResponse, ulong sinceID, out List<Status> batchStatuses)
{
    if (sinceID < TwitterLists.minSinceID) throw new ArgumentOutOfRangeException("sinceID");
    if (statuses == null) throw new ArgumentNullException("statuses");
    if (listResponse == null) throw new ArgumentNullException("listResponse");
    ulong newMaxID = sinceID;
           
    if ((batchStatuses = listResponse.Statuses) != null && batchStatuses.Count > 0)
    {  
        newMaxID = batchStatuses.Min(status => ulong.Parse(status.StatusID)) - 1;
        string sinceIDString = sinceID.ToString(CultureInfo.InvariantCulture);
        var range = batchStatuses.Where(s => System.String.CompareOrdinal(s.StatusID, sinceIDString) >= 0);
        statuses.AddRange(range);
    }
    return newMaxID;
}

 

Thanks again for your help.

 

Dean

Nov 14, 2012 at 12:04 AM
Edited Nov 14, 2012 at 1:03 AM

Hi Dean & Joe,

I have read both of your discussions. I have to admit that is a piece of wonderful work. It is what i am looking for. I am doing a console application to retrieve bulk tweets from a public page timeline. I tried Joe's code from the other source using the since_id and max_id. I tried declaring since_id as the 400th id and maxStatuses as 3200 but i does not work for me. i always get this exception: Sequence contains no elements

 void UserStatusQueryDemo(TwitterContext twitterCtx)
        {
            int maxStatuses = 3200;
            int lastStatusCount = 0;
            ulong sinceID = 224798808125607937; // last tweet processed on previous query
            ulong maxID;
            int count = 10;
            var statusList = new List();

            // only count
            var userStatusResponse =
                (from tweet in twitterCtx.Status
                 where tweet.Type == StatusType.User &&
                       tweet.ScreenName == "abc" &&
                       tweet.Count == count
                 select tweet)
                .ToList();

            maxID = userStatusResponse.Min(status => ulong.Parse(status.StatusID)) - 1; // first tweet processed on current query
            statusList.AddRange(userStatusResponse);

            do
            {
                // now add sinceID and maxID
                userStatusResponse =
                    (from tweet in twitterCtx.Status
                     where tweet.Type == StatusType.User &&
                           tweet.ScreenName == "abc" &&
                           tweet.Count == count &&
                           tweet.SinceID == sinceID &&
                           tweet.MaxID == maxID
                     select tweet)
                    .ToList();

                maxID = userStatusResponse.Min(status => ulong.Parse(status.StatusID)) - 1; // first tweet processed on current query
                statusList.AddRange(userStatusResponse);

                lastStatusCount = userStatusResponse.Count;
            }
            while (lastStatusCount != 0 && statusList.Count < maxStatuses);

            for (int i = 0; i < statusList.Count; i++)
            {
                Status status = statusList[i];

                Console.WriteLine("{0, 4}. [{1}] User: {2}\nStatus: {3}",
                    i + 1, status.StatusID, status.User.Name, status.Text);
            }
        }

 

This returns me the 301 to 400 post but what i want is all the tweets. I have found out that the page currently has estimate 1900 tweets. I would like to retrieve all of the tweets(lets say i want latest 3200 tweets) and write them to a file. How can i do so? I do not know the ID of the last post. 

Appreciate for any help.

Thanks,

10e5x

Coordinator
Nov 14, 2012 at 2:09 AM

Hi,

Set your SinceID to at least 1 lower than the first tweet you want to get back. Count defaults to 20, but since you want many more tweets, set it to the max, which is 200. Getting this many tweets, you're likely to run into rate limits, which you can get by querying HelpType.RateLimits from the Help entity. I believe the rate limit is 15 queries every 15 minutes. This means you'll have to time your queries so that you don't go past your rate limit. Here's the Twitter docs on Rate Limiting:

https://dev.twitter.com/docs/rate-limiting/1.1

@JoeMayo

Nov 14, 2012 at 2:14 AM
Edited Nov 14, 2012 at 2:51 AM
JoeMayo wrote:

Hi,

Set your SinceID to at least 1 lower than the first tweet you want to get back. Count defaults to 20, but since you want many more tweets, set it to the max, which is 200. Getting this many tweets, you're likely to run into rate limits, which you can get by querying HelpType.RateLimits from the Help entity. I believe the rate limit is 15 queries every 15 minutes. This means you'll have to time your queries so that you don't go past your rate limit. Here's the Twitter docs on Rate Limiting:

https://dev.twitter.com/docs/rate-limiting/1.1

@JoeMayo

Hi joe,

Thanks for your reply. you are being very helpful. first tweet meaning the latest or the oldest? so if my latest tweeet id is 123, oldest i do not know lets say 345. what shoud i set my sinceID as? I have ried using numerous sinceId but most of it return me the result: Sequence contains no elements. is it because of the rate limit?

i did asked something similar over at one of your discussion:

what should i fill in for these variables? int maxStatuses ,int lastStatusCount , ulong sinceID , int count

I would appreciate if you can help me and explain these 4 variables to me.

Coordinator
Nov 14, 2012 at 11:56 AM

It would help if you studied this, which is Twitter's guidance on how to page through queries on various timelines:

Working with Timelines

Joe

Nov 14, 2012 at 3:12 PM
JoeMayo wrote:

It would help if you studied this, which is Twitter's guidance on how to page through queries on various timelines:

Working with Timelines

Joe

 

Hi joe,
Thanks for your help. I read through the links for three times. I am not going to act smart and say I understand. In fact as opposite I do not understand how it works. I got the rough idea. Maxid is the first tweet the request will process which is the latest tweet and sinceid is last tweet of the request which is the oldest and the Id the request begins to process from thereafter. After getting these two things figured out, I tried numerous time changing my count, changing my int maxStatuses and change My sinceid so many times.

Can you please guide me further please? No matter how I change it still returns me sequence contains no element or sometimes it returns me 303 to 400 which I do not know why:( 

Please help me. If you dont mind, point out to me which should i change? i will learn from there. Is it possible to get all latest tweet 1900 or 3200 in a request(meaning run the prog once)/ or can i run the linq to a notepad for better reference?

Thanks,

10e5x

Coordinator
Nov 15, 2012 at 3:50 AM

Think about how Twitter works. The first tweet probably had an ID of 0, the next tweet 1, then 2, and so on. Everytime someone tweets, a later number in the sequence becomes the next tweet id (or status_id). Earlier tweet IDs are smaller numbers than later tweet IDs.

Now think about what happens when you query a timeline if you didn't use any parameters. The query grabs the latest tweet available plus 19 more because the Count defaults to 20. So, you have twenty tweets, that were current at the time you made the query. However, if someone has a busy timeline, the next query will pick up the new tweets. What if there were two new tweets? Then you end up getting those two new tweets, plus 18 more that you already queries. Now you have duplicate tweets, which is a waste of time.

You can avoid reading tweets that you've already received by using the SinceID. After the first query, you can set SinceID to the last tweet ID that you received. Save SinceID in a database and then read it from the database for your next query. By saving SinceID to the database, you will always know the ID of the last tweet you already read. Another approach for knowing the SinceID is to query your database of tweets you've already received and take the Max from that database query. The technique you use to do this isn't as important as the fact that you always know what SinceID is before performing a query to make sure you don't read tweets that you've already queried. The very first time (ever) that you query, you can set SinceID to the lowest numbered tweet that you want.

That said, you might not be able to get that many tweets from the timeline because Twitter doesn't provide the entire history of tweets. So, the first tweet you get on the first query might not be the tweet with a tweet ID matching SinceID - it might be later.

That's how you manage SinceID. However, the first time you want to query to get the entire backlog of tweets, you'll also need to use MaxID. On this very first query you'll be grabbing blocks of tweets, starting from the most recent, until you reach the latest. The latest will be defined by SinceID because, if you recall what I just wrote, SinceID will be set to the earliest tweet you want on your very first query. The size of the block of tweets you want to get is defined by Count. Count defaults to 20, but in this case, you want it to be the maximum number of tweets because you're getting as many tweets as you can, which is 200.

For example, suppose you want to grab the last 500 tweets. Your first query will only use Count to get as many tweets as possible. After you get those tweets, you don't want to read them again because you already have them. You want to get the next older block of tweets. The way to make sure you don't get the newer tweets that you've already read is to set MaxID, which says "Don't give met tweets newer than this ID". So, the first query gets tweets 500 down to 301 because count is 200. You don't want 500 through 301 again because you just read them. To say you don't want the tweets you've already read, set MaxID to 1 tweet ID lower, which is 300.  You already have 500 through 301, but you don't have 300 yet. Your Count is still set to 200 because you still want to get the max number of tweets on subsequent queries. Since this is the first time you're getting the list, you want SinceID to be the lowest tweet you're looking for. In this case you would set SinceID to 0. On the second query, you would set MaxID to 300, Count to 200, and SinceID to 0. This would return 200 tweets numbered 300 through 101. Since you don't have all the tweets you're looking for, you require another iteration. Adjust MaxID to the next oldest ID, which is 100 (101-1). Again, leave Count at 200 or do the math to adjust it so you only get the exact number of tweets you're looking for. Since the remaining number of tweets would be 100 (MaxID - SinceID), you can set Count to 100. Leaving Count at 200 doesn't matter because SinceID will keep you from getting earlier tweets. SinceID is still 0, because that's the earliest tweet you want to get. Now that MaxID is 100, Count is 100 (or larger), and SinceID is 0, do another query. This will give you tweets 100 through 1. Now you're done and have the entire backlog of tweets. Now, set SinceID to the most recent tweet from all of the iterations, which is 500 in this example, and save SinceID for all subsequent queries. Since later tweets will be numbered 501, 502, 503, and on, setting SinceID at 500 makes sure that you don't read those old tweets again.

BTW, maxStatuses is just something I added to the demo to prevent it from getting too many tweets. You should remove it. Also, remember what I said in a previous post about rate limiting. Good luck.

@JoeMayo

Nov 15, 2012 at 5:39 AM

Thanks Joe. I am still digesting. I will fully understand what u trying to teach before i start to try it again. However maybe due to my needs i keep confuse myself.  I need to get all the tweets written down and save it inside a log file, while your demo is just writing lines to cmd prompt.

Thank you so much for being so helpful. I hope u will be here when i met with difficulties again.

Greatly appreciated,

10e5x

Nov 15, 2012 at 6:15 AM
Edited Nov 15, 2012 at 6:18 AM

Oh i got it already!! i think i really do. I know how maxId, SinceId and Count works:) As of now, i need to understand your demo code. I have two questions. 

Like you suggest, i perform an request for the latest 200 tweets using this method:

 

var statusTweets =
                   from tweet in twitterCtx.Status
                  where tweet.Type == StatusType.User &&
                        tweet.Count == 200 &&
                        tweet.ScreenName == "abc" 
                 select tweet;

                //PrintTweetsResults(statusTweets);
                foreach (var tweet in statusTweets)
                {
                    Console.WriteLine(
                        "(" + tweet.StatusID + ")" +
                        "[" + tweet.User.ID + "]" +
                        tweet.User.Name + ", " +
                        tweet.Text + ", " +
                        tweet.CreatedAt);
                }

this i will get the latest 200 tweets. Now its time to query the next batch of 200 older tweets. This time i know i will need to insert
SinceId and MaxId as parameters. From what i understand, MaxId should be one ID earlier than the 200th ID. But how can i get it? Cuz in 
log i only have the 1900th to 1701th id. And as for the sinceID i know i should give it the oldest TweetID which is lets say 312357. 
So i will use a loop, for every first query i will run the methods above, then on the second query onwards i will run the method below.

var statusTweets =
                   from tweet in twitterCtx.Status
 where tweet.Type == StatusType.User &&
                           tweet.ScreenName == "abc" &&
                           tweet.Count == 200 &&
                           tweet.SinceID == 312357 &&
                           tweet.MaxID == maxID 
                     select tweet)

Am i on the right track? I know i am wrong, i can feel it. If i am wrong, hope u can give some 
magic touch to my code. I will learnt and understand it.

Coordinator
Nov 16, 2012 at 2:09 AM

Getting MaxID is where this LINQ query comes in:

           maxID = statusTweets.Min(status => ulong.Parse(status.StatusID)) - 1;

Notice that the lambda for Min pulls out the StatusID of each status. After you get the lowest number from the group of tweets you just queried, statusTweets, subtract 1.

You're correct that you do the first query exactly as you have it. Then calculate MaxID, like I did above, and iterate through the rest of the queries. Remember to re-calculate maxID after every query.

@JoeMayo

 

Nov 16, 2012 at 2:31 AM
JoeMayo wrote:

Getting MaxID is where this LINQ query comes in:

           maxID = statusTweets.Min(status => ulong.Parse(status.StatusID)) - 1;

Notice that the lambda for Min pulls out the StatusID of each status. After you get the lowest number from the group of tweets you just queried, statusTweets, subtract 1.

You're correct that you do the first query exactly as you have it. Then calculate MaxID, like I did above, and iterate through the rest of the queries. Remember to re-calculate maxID after every query.

@JoeMayo

 

Hi joe,

thanks for your help. I used what u suggest with a combination of my method. For now i get what i wanted. 1800 tweets and increasing. however i need to format thje logs to make it looks nicer.