Visualizing 15k Instagram Posts with TrelliscopeJS

Dec 8, 2016 6 min read

This post shows a simple example of creating an interactive display that allows you to navigate thousands of instagram posts with just a few lines of code using TrelliscopeJS. The example comes from a hackathon in the DARPA XDATA program earlier this year.

If you missed the announcement about TrelliscopeJS, see here for more background.

Background

I’ve been involved in the DARPA XDATA program for the past few years which has been sponsoring the development of TrelliscopeJS and other projects in DeltaRho organization. XDATA has had several hackathons this year. At one of them, the goal was to use several data sources, including social media, to try to find evidence of violations of a ceasefire agreement in Yemen. This provided a great use case for TrelliscopeJS as a utility for quickly creating an interactive display for exploring social media posts.

Data

One of the datasets provided was metadata for public instagram posts made in Yemen. The initial set of posts was large, but they were whittled down to posts where the caption or comments contained some key words, including (the arabic equivalents of) ‘advance’, ‘al-Qaeda’, ‘artillery’, ‘attack’, ‘battalion’, ‘bomb’, ‘brigade’, ‘Brigadier General’, ‘camp’, ‘car bomb’, ‘clashes’, ‘colonel’, ‘company’, ‘Houthis’, ‘ISIS’, ‘Major General’, ‘massacre’, ‘mortar’, ‘operation’, ‘plane’, ‘regiment’, ‘Saudi’, ‘Saudi Arabia’, ‘soldier’, ‘violent’, ‘warplane’, and ‘Yemen’. Filtering to the time range of interest and public posts only containing these words, we ended up with a data frame that looks like this:

dplyr::glimpse(instadf)

Observations: 14,909
Variables: 15
$ image_id       <chr> "1175063814800865121_839721980", "11349722586818...
$ caption        <chr> "#صباح_الورد #صباح_الخير #صباح_النور #صباحكم_سعا...
$ created_time   <dttm> 2016-01-31 19:49:04, 2015-12-07 12:14:18, 2015-...
$ username       <chr> "my_blbl", "ma7fouz7akimi", "tareqmusaw", "tareq...
$ userid         <chr> "839721980", "1380532560", "333728029", "3337280...
$ lat            <dbl> 13.83330, 15.30576, 15.35472, 15.31457, 17.49170...
$ lon            <dbl> 44.68330, 44.19210, 44.20667, 44.18173, 44.13220...
$ location_name  <chr> "Sanaa, Yemen", "مدينة عدن / ツ ADEN City", "ACA...
$ likes_count    <int> 23, 23, 196, 200, 45, 300, 5, 6, 68, 546, 36, 11...
$ comments_count <int> 4, 1, 10, 9, 3, 2, 1, 1, 5, 55, 1, 0, 32, 15, 6,...
$ all_comments   <chr> "يسعد صبااااحك بكل خير | @hh_a18 وصباحك اسعد اخت...
$ image_link     <chr> "https://scontent.cdninstagram.com/t51.2885-15/s...
$ post_link      <chr> "https://www.instagram.com/p/BBOqlBVnvdh/", "htt...
$ keywords       <chr> "Yemen", "Yemen", "Yemen", "Yemen", "company", "...
$ n_keywords     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...

The variable names are pretty self-explanatory. We have data for nearly 15k posts.

Making a display of posts

If you have familiarized yourself with TrelliscopeJS, you might realize that this data is already in good shape to be used to create a Trelliscope display of instagram posts, where each row represents one post and each variable can be used in different ways to navigate the space of posts. All we need is a variable denoting what to plot for each post. In this case, we have a variable image_link, which provides a URL to the media shared with the instagram post - the perfect and logical thing to “visualize” in this case.

TrelliscopeJS has a function img_panel(), which allows us to cast image_link as a panel variable, and will cause the viewer to display the contents of the URL.

We can create a simple display with the following:

library(trellicopejs)
library(dplyr)

instadf %>%
  mutate(image_link = img_panel(image_link)) %>%
  arrange(-likes_count) %>%
  trelliscope(name = "posts", width = 320, height = 320, nrow = 3, ncol = 6,
    state = list(labels = c("caption", "post_link", "likes_count")))

Here in the mutate() we are update the image_link variable to be an image panel, then we sort the data in decreasing order of number of “likes” (which will set the default sort order of the display), and then pass the resulting data frame into trelliscope(). It’s as simple as that. The state argument to trelliscope() tells the viewer what labels to show under the panels by default.

Note that this runs very quickly, in under 3 seconds on my machine. This is because we do not need to generate the actual panels in this case – we are simply pointing to where the panel images reside elsewhere on the web.

The resulting display can be interacted with below, and a dedicated link to the application is available here.

posts

1 - 18 of 14909

First

Last

caption	انخلع كوعه عالهوا مباشره - - Rugby star Ben Ross breaks bone on live TV in arm wrestle - - #فيديو #q8#رعب#قتل#kuwait#جريمه#بنت#بنات#حوادث#حادث#هوشه#مشاجره#مشاجرات#الكويت#كويت#تبادل_اعلاني#رجل#video#مقاطع#لقطات#فيديو#فلو#تابعوني#لايك#لايكات#سناب_شات#شناب_جات#السعوديه#WTF#سناب
post_link
likes_count	12594

caption	. . هدف الجوكر #سلمان_الفرج ع #لخويا_القطري 😍😍😍💙💙💙💙 . تقييمكم للهدف من 10 ؟ . #الهلال#النصر #الاتحاد #الاهلي#الزعيم #ياسر_القحطاني #محمد_نور #السعوديه . عمل مع @n7.r . . . . .
post_link
likes_count	12232

caption	. . لاعب يمتلك العديد من المهارات الاسطوره عمر هوساوي 😐💔 . منشنوهم 😂😂😂 #النصر #الفقر #الهلال . عمل مع المبدع @hfc.video تابعوه . . . .
post_link
likes_count	12244

caption	. . أخطر أنواع العقارب السامه في السعوديه😨💔 منشن خويك للتنبيه 👏💙❤ . #النصر #الاتحاد #الاهلي . عمل مع @3bdhfc تابعوه💙 . هذا الشهر برعاية @mhmqzr
post_link
likes_count	12337

caption	. . قذايف #نواف_العابد 💣💣💣 برأيك ك هلالي وش أفضل قذيفه 😁👍💙 انا اشوف هدفه ع #اليمن ثم ع #النصر 💙 يارب يرجع نص مستوى هذا اللاعب 😢! . . عمل مع المبدع .@hlale_6868h . هذا الشهر برعاية @mhmqzr
post_link
likes_count	13570

caption	. . جماهير #الزعيم في عام 2005 😴💙 مع العلم هذي المباراه شهدت اول تيفو في الملاعب السعوديه💙💙👍👍 . #النصر #الاهلي #الاتحاد . عمل مع @n7.r . . حسابي الاحتياطي @dr.khaled_alghayeb . نبي الكومنت كله أزرق 💙💙💙
post_link
likes_count	11612

caption	. . . سحبة #طارق_التايب في #ابراهيم_غالب أفضل محور سعودي على كلام جماهير #النصر 😂😂😂😂 . منشنو أفضل محور 😂 وقولوا جحفلي @ibra__gh . BY @n7.r . مشنوهم #الفقر #طقطقتيشن #الاهلي #الاتحاد . . الراعي الرسمي لهذا الشهر ينزل مقاطع ضحك وفله 😂😂 @2qq @2qq تابعوه💙
post_link
likes_count	12886

caption	. . ‏#محمد_نور_في_كلمتين لم يخرج الا بعد خالد عزيز ولا كان كاتم على نفسه ما يقدر يسوي شيء أفضل محور سعودي سابقا 👍💙 رأيكم في هذا اللاعب ؟ . منشن اتحادي #الاتحاد . By @x_med11 . . الراعي الرسمي لهذا الشهر ينزل مقاطع ضحك وفله 😂😂 @2qq @2qq تابعوه💙
post_link
likes_count	11021

caption	. . وليد الفراج يقصف جمهور #النصر 😂😂😂 والله ماكذب ماطلع النصر إلا في ذي السنتين بس رجع #الزعيم الآن والجميع يركد😈😈 . منشنوهم #الاهلي #الاتحاد . الراعي الشهري ينزل أهداف ومهارات وبث مباشر 🔥 @iffhs @iffhs تابعوه😍😍
post_link
likes_count	16509

caption	. . الدكتور غازي القحطاني والطاقم الطبي في المستشفى السعودي الالماني الف شكر على نجاح المرحله الاولى في علاجي
post_link
likes_count	12133

caption	. . عشاي انا وبني عمي للشيخ محمد بن ناصر ال شافي بني هاجر واخوه عبدالله بن ناصر ال شافي في الضيافه التابعه للمستشفى السعودي الالماني الليله
post_link
likes_count	12933

caption	. . الف مبروك يادمي ومنها الى اعلى يابوناصر العقيد سلطان بن ناصر بن فرحان بن الحمري
post_link
likes_count	11929

caption	. . العقيد حسين بن ملفي مدير عام أدارة الشؤون الوقائيه في منطقة عسير وابنه محمد
post_link
likes_count	13479

caption	منشن اللي كلش ما يخافون من هالسوالف 😰 - - Exorcism-SHOCKING Real Life Exorcism! - - - #فيديو #q8#رعب#قتل#kuwait#جريمه#بنت#بنات#حوادث#حادث#هوشه#مشاجره#مشاجرات#الكويت#كويت#تبادل_اعلاني#رجل#video#مقاطع#لقطات#فيديو#فلو#تابعوني#لايك#لايكات#سناب_شات#شناب_جات#السعوديه#WTF#سناب
post_link
likes_count	12181

caption	. . العقيد محمد ابوراعي القحطاني والعقيد عبدالله بن منصور اليامي
post_link
likes_count	14385

caption	. . . . ﴿ وَالَّذِينَ هُم عَلَى صَلَواتِهِم يُحافِظُونَ ﴾ موظف الخطوط السعودية عندما حان وقت الصلاة بسط سجادته وطوى دنياه
post_link
likes_count	16913

caption	وضع الجني تشارلي مع الشعب السعودي😂💔 ازعجتوه 😂 (العبه ذي حرام واللي يسويها قاصد لا تقبل صلاته ٤٠ يوم لانك تستعين بغير الله) انتبهوا❤️ المهم ازعجتوه المسكين😂💔 @qedoo1 @qedoo1 #تشارلي
post_link
likes_count	13591

caption	WTF!!!! - - - - #فيديو #q8#رعب#قتل#kuwait#جريمه#بنت#بنات#حوادث#حادث#هوشه#مشاجره#مشاجرات#الكويت#كويت#تبادل_اعلاني#رجل#video#مقاطع#لقطات#فيديو#فلو#تابعوني#لايك#لايكات#سناب_شات#شناب_جات#السعوديه#WTF#سناب
post_link
likes_count	13383

Sorting on:

likes_count

Go ahead and try interacting with the display and see if you can find anything noteworthy.

Some things to try / investigate:

Filter on keyword “bomb” and look around - do you see any posts that appear to legimately be talking about a bombing that seems to have happened at or around the time of posting?
What users post the most and what do their posts look like?
Are there commonalities across posts at specific location_names?
Use the post_link label to open the original Instagram post in a new window. There you can more easily see the full content of comments, etc., and can even use Google Translate on the page (although in my experience the translations are terrible)

Filtering based on the caption or comments is also potentially very interesting if you happen to know Arabic (I don’t).

Notes

First of all, please note that all the data and social media used in this example is entirely public. The goal for the analyses done with these data was pursuing the peace process – nothing nefarious or creepy. Also note that while Instagram is supposed to root out offensive content, they don’t catch it all and I haven’t looked at all the posts so I don’t know if there will be any surprises.

This is an interesting display in that it is larger than the other examples I’ve shown of TrelliscopeJS so far. The application handles this many panels smoothly. Some of the variables (particularly caption and comments) can be very long, causing the size of this 15k-row dataset to be larger than a more traditional one.

This is just a single plot – it is not intented to illustrate any major finding or result with respect to the hackathon goal, but like all exploratory visualizations, it provides a nice basis for further exploration or as a reference when studying other aspects of the data. There are a lot of potentially interesting analyses that could be done using the original data frame that could result in new variables or modes for new uses of the display. For example, one might do an object or context detection on the images using something like Google Vision, and then you could use the results in a display to visually verify the accuracy of the machine learning outputs.

Things to do

This example highlights the need for some obvious additional filter tools in TrelliscopeJS, namely date / time filters and map filters for geographic coordinates. All in due time…

r visualization

Visualizing 15k Instagram Posts with TrelliscopeJS

Background

Data

Making a display of posts

Notes

Things to do

Ryan Hafen

Data Scientist, Statistical Consultant

Related