Penetrating the Fog

Phil Long and I have an article in EDUCAUSE Review on learning analytics in education and learning. From the article:

Higher education, a field that gathers an astonishing array of data about its “customers,” has traditionally been inefficient in its data use, often operating with substantial delays in analyzing readily evident data and feedback. Evaluating student dropouts on an annual basis leaves gaping holes of delayed action and opportunities for intervention. Organizational processes—such as planning and resource allocation—often fail to utilize large amounts of data on effective learning practices, student profiles, and needed interventions.

The article is intended to be an introduction to learning analytics, rather than a detailed technical analysis. For those of you interested in the latter, I recommend the LAK12 conference in Vancouver :).

Learning and Academic Analytics

Analytics in education take on at least three distinct terms:
1. educational datamining – this is a fairly developed community, having run conferences for over four years. They also run their own journal. From their website: Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.
2. Learning analytics – after a successful conference this year, we are planning our second conference in Vancouver next year (call for papers is now open). This community is still developing, but interest in learning analytics is high in various government and educational reform settings. Learning analytics is defined as the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs.
3. Academic analytics – this term has been around for about a decade, based on early work by Diana Oblinger and John Campbell. As initially presented, the concept addresses a mix of administrative and learning analytics. For clarity sake, this concept is now closest to what is called business intelligence in corporate settings.

How are these three items related? Educational data mining has a role in both learning analytics and academic analytics. The table below gets at that relationship. I don’t see the relationship as starkly or as clearly demarcated as the table indicates. I’m trying to get at the distinction between learning analytics as focusing on activity at the learner-educator level, academic analytics as focusing on organizational efficiency, and datamining as have some role in both spaces.

Type of Analytics

Level or Object of Analysis Who Benefits?
Learning Analytics  


Educational data mining




Course-level: social networks, conceptual development, discourse analysis, “intelligent curriculum” Learners, faculty
Departmental: predictive modeling, patterns of success/failure Learners, faculty
Academic Analytics Institutional: learner profiles, performance of academics, knowledge flow Administrators, funders, marketing
Regional (state/provincial): comparisons between systems Funders, administrators
National and International National governments, education authorities


An introduction to learning analytics

The presentation I delivered to ED-MEDIA this week on learning analytics:

After the presentation, an individual approached me to emphasize the limitations of analytics – particularly in the stock market (2008 and Long Term Capital). Of course analytics aren’t the salvation to the problems of education. They are one (significant) approach to understanding the complex ecosystem of teaching, learning, research, and knowledge generation. Numerous ethical concerns exist. And we haven’t really fully defined situations and context of use – i.e. the different settings in which different approaches or analytics models are required.

Knewton – the future of education?

During the learning analytics conference in Banff, several presenters mentioned the speed at which analytics are moving into policy level decisions in universities and schools. Malcolm Brown, from EDUCAUSE, made the statement that learning analytics “are moving faster than any of us realize”.

This rapid development is due to a variety of factors: data mining focus in the technology sector, business intelligence, increased calls for accountability of the publicly funded education sector, organizations and foundations targeting analytics in research projects (Digging into Data, Gates Foundation, EDUCAUSE), and increased entrepreneurial activity in the educational technology sector.

Taken together, these trends produce compelling pressure for change in education and learning (compelling enough that learning analytics might even survive the coming “death by hype and consultants” wave).

A few years ago I wrote a post on technology externalized knowledge and learning, arguing that classrooms need to give way to wearable, adaptive, personal, ubiquitous learning systems comprised of agents that track our activity and automatically provide information based on context, knowledge needs, and our previous learning activity. Earlier this year, I wrote a short article on the role of learning analytics as an alternative to traditional curriculum design and teaching methods. The core message of both TEKL and learning analytics: the traditional model of education is positioned for dramatic transformation – a transformation that will invalidate many of our current assumptions of classrooms, learning content, teaching, and schools/universities.

While planning the Learning Analytics 2011 conference, the steering committee spent quite a bit of time debating the question “what are learning analytics?”. We didn’t come to a full agreement, but settled on the following broad definition: “learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimising learning and the environments in which it occurs”. This is an effective definition for the conference, but it only alludes to an important component: content. Learning analytics should lead to alteration of course content. This is hardly a new idea – in the late 90’s we were talking about content personalization under the umbrella of computer-based training. What has changed, however, is the growth of semantic content, ontologies (knowledge domain structures), educational data mining models/algorithms, and greater amount of digitally-captured learner activity.

The core concept of learning analytics has existing for several decades in education theory, computer science, human-computer interaction, and what’s now called web sciences. University research centres and labs, however, often make poor use of their intellectual resources for commercial purposes or broad application.

Over the last week, I’ve been exploring various companies to profile on FutureLearn. I kept encountering companies in the test prep (SAT, GRE, GMAT, etc) space. The lower-hanging fruit of innovation is to take an existing system (such as test prep) and make it more effective/efficient. After all, it’s easier to get schools/universities/individuals to pay for a product that will help students do well on existing test models than it is to get someone to buy something vague like learning analytics.

One company kept surfacing: Knewton. Knewton is often cited as an innovative test prep company using advanced algorithms to provide personalized learning. They collect up to 150000 data points on a single student studying one afternoon. This data is then used to adapt the learning experience to each student. As far as I can tell, they are doing quite well with this test prep model. I’m far more interested, however, in their adaptive learning platform.

Educational technology and content providers have an uneasy relationship. Large companies like Pearson provide their own learning platform, but also offer content for LMS providers like Blackboard and D2L. LMS providers, recognizing that Moodle and other open systems have commodified their systems, have started to adopt a tightly-integrated ecosystem of services in order to maintain value points for customers. In fulfilling this vision, Blackboard has acquired synchronous tools like Wimba and Elluminate as well as analytics platforms like iStrategy. Content providers, like Pearson and Thomson Reuters, are actively trying to reposition as learning or intelligence companies. Pearson, for example, is (will be) offering degrees in UK. These three industry segments (LMS, content, intelligence platforms) are heading for a convergence/collision. They are battling over the multi-billion dollar education market of content, teaching, and testing. Universities may well be reduced to degree granting systems as they simply have not been capable, as a whole, of adapting to the digital world (in terms of teaching, at least – they’ve adapted quite well in terms of research).

Knewton, with their adaptive learning platform, sits a bit outside of the LMS/content space. They now offer universities, organizations, and content publishers the opportunity to use their platform for providing customized, personalized learning. I’ve signed up for the service in order to learn more about how it works, but haven’t received a response yet. From the videos on the site, the best I can glean is that learning content from publishers and universities is repurposed into Knewton’s platform and the platform then algorithmically personalizes content (real-time) based on student’s activities.

I’m curious to find out how they do this – do they scan and automatically classify content according to an ontology? Do developers have to code content by various levels (basic, intermediate, advanced) so that the system can deliver customized content to learners (though this model wouldn’t be true personalization – it would be more about classifying the student at a certain level and then matching content to the classification scheme, not personalized to the actual student)? Or do they computationally generate content for each learner? I suspect it’s a combination of the first two approaches (i.e. ontology with learner classification model). The Wolfram computational vision of learning content and adaptivity is still a bit in the future.

Regardless of the model used, Knewton has effectively positioned itself short term as a partner to publishers, schools, universities, and organizations who recognize the value of analytics, but don’t know how to start. Long term, Knewton is a takeover target for Pearson, Thomson, or possibly Blackboard. If they can resist that temptation, they may well create an entirely separate category (learning and knowledge analytics) in the learning space.

Call for papers: Journal of Education, Technology, and Society

I’m pleased to announce an upcoming special issue of the Journal of Education, Technology & Society on the topics of Learning and Knowledge Analytics. Details are below…or if you prefer, a Word doc version of call for papers is available here.

Call for Papers

Journal of Educational Technology & Society

(ISSN: 1436-4522 (online) and 1176-3647 (print))

Special Issue on

“Learning and Knowledge Analytics”

The growth of digital data creates unprecedented opportunities for analysis. This is particularly evident in teaching, training and development, and learning. Learning institutions and corporations make little use of the data learners generate in the process of accessing and using learning materials, interacting with educators and peers, and creating new content. Learning analytics are an important lens through which to view and plan for change at individual learning paths and educational institutions’ courses. Furthermore, in corporate settings, learning analytics can play a role in highlighting the development of employees through their learning. Information flow and social interactions can yield novel insights into organizational effectiveness and capacity to address new challenges or adapt rapidly when unanticipated events arise.

Advances in knowledge modeling and representation, the semantic web, data mining, analytics, and open data and processes form a foundation for new models of knowledge development and analysis. The advances also create new opportunities for interaction, collaboration, and sharing in learning, but those advances need to be used in a pedagogically sound manner. Pedagogical and social impact of the advances can only be understood if new and/or appropriate research methods and instruments are used. These technical, pedagogical, and social domains of analytics and interventions must be brought into dialogue with each other to ensure that interventions and organizational systems serve the needs of all stakeholders.

Learning analytics is the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs. Knowledge analytics is the utilization of advanced approaches (e.g., text/data mining, information retrieval machine learning, or linked data) for processing data to provide representations in forms from which conclusions can be drawn in an automated and domain-aware way. When integrated, learning and knowledge analytics form the foundation for adaptive and personal learning by providing learners with relevant learning choices to address gaps between existing knowledge and knowledge needed within a field or domain. Through the use of analytics, organizations also stand to gain new insights into how the work of individuals contributes to organizational capacity for change and innovation.

Although the importance of learning analytics is increasingly recognized by governments, educators, and funding agencies, research into practical applications in learning settings, needed software, and methods of deployment at a systems level are still largely lacking. This special issue is dedicated to trends and innovations in learning and knowledge analytics. We invite original RESEARCH articles in relevant topics, which include but are not limited to:


  • Software development for use in analytics
  • The role of knowledge representation and ontologies in learning analytics
  • The semantic web and linked data: meaning in connections
  • Data mining in education
  • Artificial intelligence & tutors in learning analytics
  • Internet of things (sensors) and learning analytics
  • “Big Data” applications and opportunities in learning and education
  • Latent semantic analysis/natural language processing
  • Attention metadata
  • Software needed to advance adoption of learning analytics


  • Visualization: data, learner networks, conceptual knowledge
  • Predictive applications of data
  • Interventions based on analytics
  • Social and technical systems to manage information abundance
  • Personalization and adaptivity in the learning process
  • Corporate and higher education case studies of learning analytics
  • Learning analytics for intelligent tutoring systems
  • Open data: data access for learners
  • Harmonizing individual learning with organizational learning
  • Importing insights for existing analytics
  • Use of learning analytics in centralized (learning management systems) and decentralized (personal learning environments) settings
  • Models of corporate adoption of learning and knowledge analytics
  • The benefit and impact of organizational adoption of learning and knowledge analytics

Conceptual & Pedagogical:

  • The relationship between learning analytics and existing theories and approaches (such as pedagogical models and learning sciences)
  • Social network analysis
  • Harnessing the power of context and location aware systems
  • Informal learning: integrating learning and knowledge systems
  • Privacy & ethics in learning analytics
  • The influence of analytics on designing for learning
  • The influence of analytics on delivery and support of learning
  • New research instruments and methods for use in analytics-intensive learning environments

Special Issue Guest Editors

George Siemens, Technology Enhanced Knowledge Research Institute, Athabasca University (

Dragan Gašević, School of Computing and Information Systems, Athabasca University (

Important dates

Submissions due:                                        10 August 2011

First decision:                                              10 October 2011

Revised manuscripts due:                         15 November 2011

Feedback on revised manuscripts:         20 December 2011

Final manuscript due:                               30 January 2012

Submission guideline

The manuscripts should be original, unpublished, and not in consideration for publication elsewhere at the time of submission to Educational Technology & Society and during the review process.

The manuscripts must be within 7000 words (including everything – title, author names, affiliations, abstract, keywords, main body, references, appendices – everything).

Please carefully follow the author guidelines at while preparing your manuscript. To get familiarity with the style of the journal, please see a previous issue at

All manuscripts will be subject to the usual high standards of peer review at ETS Journal. Each paper will undergo double blind review.

All manuscripts should be in WORD format and submitted via email to the Guest Editors (George Siemens:, Dragan Gašević:

The Educational Technology & Society Journal is included in the Thomson Scientific Social Sciences Citation Index (SSCI) with impact factor of 1.067 according to Thomson Scientific 2009 Journal Citations Report.

Where do we find good critiques of learning analytics?

Concepts need strong critiques. In discussing connectivism, for example, I’ve found critical comments from Plon Verhagen, Bill Kerr, Rita Kopp, Frances Bell, among others, to be very valuable.

Analytics in technology fields are growing in prominence, as revealed by distributed computing (Hadoop, MapReduce), techniques and methods (social network analysis, pattern detection, clustering, classification approaches, language analysis), tools (Gephi, R), open data (WorldBank, open government, OECD), and big data. These developments are increasing the attention devoted to analytics for use in business settings (business intelligence) and in learning and knowledge (the focus of our conference and this course).

Jacques Ellul’s Technological Society is the most effective critique I’ve encountered on the shortcomings (and dangers) of a technique-driven society. Technology instantiates technique. And technique rapidly encroaches in all areas of live (standards-based testing in education, citation analytics in higher education (see Thompson Reuters InCites initiates). If it can’t be measured, it doesn’t get attention or funding (yes, feel free to insert “that” Einstein quote about measurement and what counts into the comments).

Where are the critiques of analytics – particularly in relation to learning and education? This missing element was brought to the the forefront in our Friday wrap up conversation in Elluminate…and in a blog post by Martin Weller on the upcoming conference. Scott Leslie initiated a thread in the comments section dismissing learning analytics. I would prefer a substantive critique, rather than a general reaction to the concept, but Scott’s comment served to raise the profile (in my mind at least) of why we need to critique, not only explore favorably, learning and knowledge analytics. I’ve started a thread in the moodle forum – please drop in and voice your concerns with the concept of analytics as applied to learning.

Artifacts of sensemaking

Now that we are nearing the end of week 1 in LAK11, we’re starting to see a few attempts at making sense of the flow of activity in different forums. These sensemaking attempts include: blog posts, summary Moodle forum posts, images, analysis of discussion forum activity, social network analysis, etc. As we progress in the course, we’ll encounter numerous tools for playing with data and text. Creating and sharing artifacts of sensemaking is an important activity in open online courses.


Higher education generally homogenizes learners through pre-requisites or subject streams (programs). Most learners in a course will be at a roughly similar stage – or so the program structure suggests. In reality, learners are a diverse group, even in reasonably small classes. They come to a course with different beliefs, live experiences, knowledge, aspirations, and learning habits. The uniformity of university program tracks masks the differences of learners.

In an open course, participants aren’t filtered in the same way. Participants range from “absolutely new to the topic” to “have written many books on the topic”. As a result, filtering (or forming sub-networks/groups/discussion clusters) happens once the course is underway. The first few weeks are a bit tumultuous – it’s really a sociological and psychological process of identifying yourself to others and positioning yourself meaningfully in the conversation. It’s not unlike attending a conference or a large social gathering – we reveal aspects of ourselves/our knowledge, we offer tentative views/positions to see if they will resonate with others, we begin to connect with those who respond favorably, we gravitate toward those who we find interesting (but not so interesting that we feel no connection), and so on.

One of the primary ways of connecting with others in an open course is through creating and sharing artifacts of sensemaking. These artifacts are resources produced by individual learners (diagrams, summary posts, podcasts, videos) that reflect their attempts to make sense of the course from her/his perspective. Given the diversity of participants, each learner plays a dual teaching-learning role. When our learning is transparent, we become teachers. We have over 600 participants in this course, which means you will connect with others. You will find people at a similar stage of knowledge. You might even find people in your own community. Essentially, we form small sub-networks that connect (lattice-like) to other sub-networks. Novices engage with novices…but simultaneously, they move into expert networks when a topic warrants. This fluidity of interaction across novice-intermediate-expert networks is one of the main points of value in open courses. And one of the main differentiators from traditional courses.

Reflections on Open Courses: Curation, Ombuds, and Concierges

Part of the focus in LAK11 is to explore how we can better use data to make sense of complex topics such as:

  • How students interact with social and technological systems, information, and each other
  • Which patterns of activity on the part of the learner produce the best performance (still largely defined by grades)
  • How knowledge is “grown” as individuals interact with others
  • How individual learners develop their conceptual understanding of a topic
  • How teams solve complex problems (stages of development and group formation)
  • The tools and activities that are most effective in solving a particular problem in a particular context
  • How individual learners “eliminate” unneeded or irrelevant ideas and concepts
  • How learners orient themselves in complex environments – wayfinding and social sensemaking

This list could go on for quite a while – essentially, any activity that involves information exchange and communication in the process of solving a problem or expanding knowledge in a domain is worthy of analysis through the data trails that are left by individuals.

In this course, we will explore various methods for analyzing data produced by learners and numerous tools that aid that analysis. However, it’s worth considering some of the limitations of an algorithmically-defined world of education. Google search and automated news service are poster children of what an algorithm can achieve. The task for Google has not been easy – as soon as any service becomes popular, marketers and spammers aren’t far behind (Twitter/Quora are starting to experience this too, but since they control who access their sites, they are more successful at eliminating noise). Many of the companies that rely on Google for customers find a change in the (Google’s) algorithms for search can have a devastating impact on sales.

Google’s search algorithm has been ruined argues that:

What has happened is that Google’s ranking algorithm, like any trading algorithm, has lost its alpha. It no longer has lists to draw and, on its own, it no longer generates the same outperformance — in part because it is, for practical purposes, reverse-engineered, well-understood and operating in an adaptive content landscape. Search results in many categories are now honey pots embedded in ruined landscapes — traps for the unwary. It has turned search back into something like it was in the dying days of first-generation algorithmic search, like Excite and Altavista: results so polluted by spam that you often started looking at results only on the second or third page — the first page was a smoking hulk of algo-optimized awfulness.

What’s the solution? Well, a return to curation, of course. We trust people more than technology. I have a friend who is a pilot for a major airline. Apparently, it is possible – in certain airports around the world – for certain planes to land automatically without pilot involvement. Most people, however, would likely find it unnerving to get into a plane without a pilot. A fully automated flight?? Never! But…most of what happens during a flight is already automated. The huge number of adjustments that an airplane (autopilot) makes to compensate for turbulence are largely automated. Why do we still like to see a person in the cockpit?

When Google first announced their automated news site, many journalists and news site ridiculed the idea of non-curated news. Overall, it has worked fairly well. But over the last five years, social networks and social media have taken over the web. Google is driven by the mission to organize the world’s information. Facebook is driven by the mission to “help you connect and share with the people in your life”. The two companies are on a collision course: is the future informationally or socially based? Eventually, social bleeds into informational. And vice versa.

Political discussion is a great example of how this works: pundits within the US political spectrum often express surprise at how “the other party’s followers” defy logic as they follow Beck, Olbermann, Stewart, Limbaugh, Maddow, etc. We trust people we like, people with whom we feel a connection or shared concern/identity. Beck or Olbermann are curators – they present their views and spin existing stories within the framework of their beliefs. All social interactions are information. Many information interactions are social.

What does this have to do with LAK11?

Since Stephen Downes, Dave Cormier, and I, first started offering open online courses, we’ve used a variety of techniques to provide “temporary centres”. Social and technological networks don’t have a centre. When we learn in a classroom or in a learning management system (LMS), a central place exists where we can go for readings and discussions. When a course is distributed – such as LAK11 – across Google Groups, Netvibes, blogs, Moodle, Elluminate, Facebook, Twitter – we encounter the problem of how to create temporary centres that will help us to understand what’s happening in the course. I addressed this concern in Activity Streams: splicing information and social relations.

In the open courses I’ve taught with Stephen, we’ve used his OLDaily and gRSShopper software (in addition to Moodle, Twitter, Facebook, Second Life, etc). When participants start the course, they provide their blog feed and any updates on their blog, related to the course, is automatically included in the Daily. Similarly, Moodle discussion topics and the use of the course tag on Twitter are also included in the email. Stephen and I provide some commentary or facilitator posts to the Daily as well. Basically, for course participants, the Daily is a temporary centre, tying together the many strands of activity in the course. Additionally, and one of the biggest benefits of this model, a full archive of course activity – by date – is available for future analysis. Have a look at the archive of CCK08, CCK09, Critical Literacies, and PLENK10. These courses artifacts are ripe for analysis. We just need to decide what types of questions we want answered!

In LAK11, we’ve taken a different approach. We’ve retained similar course design elements to previous open online courses (OOCs – I’m starting to think that M=Massive part of MOOCs is misleading or even off-putting. Plus someone mentioned that in Catalan, MOOC means mucus :)). In LAK11 we have (course links can be accessed here):

  • The Daily (Google Groups)
  • Moodle Forums
  • Course blog
  • Elluminate sessions
  • Facebook groups
  • Conversations on Twitter/Diigo/Delicious – tied together with the LAK11 course tag

…and so on. For those participants who have taken an open course with me/Stephen/Dave in the past, this format will be familiar. What will be less familiar, and a model that Dave and I experimented with in our EdFutures course, is the lack of archive and integration of conversations in other spaces in the Daily email. There are two primary reasons for this:

  1. We want to demonstrate that if someone wants to offer an open online course, they do not need to run their own server or write their own software.
  2. We didn’t ask Stephen if we could run this course on his site

What we gain in our decision to run this course on various sites, using more or less accessible tools, is the demonstration that anyone with an interesting topic/idea and a willingness to experiment can open up a course for a broader audience. About half of the courses that we’ve run over the last three years have been in for-credit programs in a public university. The other half have been focused on professional development without formal university credit. We’re trying to model that open online courses are broadly accessible – for both learning and teaching.

What we lose – and I’m still uneasy about this trade off – is the integrated archive of activity in the course. I still send out a Daily email to the Google group. I aggregate blogs/delicious/diigo/twitter links and commentary on my netvibes page. The problem, however, is that netvibes is rather dumb. It just leaves the content on the page until something new is posted. If you’re tracking activity on Netvibes, you’ll likely encounter much of the same content until it has been updated with new content. Activity is not archived by date.

Making some improvements

With each offering of an OOC, we try to play with one aspect of the course format. In the second year of offering the connectivism and connective knowledge course, we decided to offer a series of “mini-conferences” within the course, in addition to improving the Daily with better distinctions between facilitator posts, Twitter course tag use, and blogs/Moodle. In PLENK, we were more focused on tracking learner activity: how many new participants joined each week, activity in the moodle forum, number of blogs being aggregated, attendees at the live sessions, and so on. For CCK11 (starting on Monday), we’re going to experiment with running the course without an LMS and using only gRSShopper for interaction. In LAK11, one of the key additions seems to be the role of “course ombud” that Dave Cormier is serving.

Dave is basically acting as a course curator. He takes elements of the course that resonate with him and shares his experiences with other participants. Tony Searl is starting to play a similar role by aggregating the course blogs and content based on his interests. Fascinating stuff. In open courses so far, we’ve tried to open up content and interaction. We’re starting to see people create their own Daily and sharing those artifacts with each other.

Back to where we started

Complexity cannot be understood solely through algorithms. Algorithms don’t instill within us the sense of trust that we require to make decisions and act meaningfully. Curation is an important component in the process. Through social distributed networks, as we each try and make sense of that part of the course that resonates with us, the multi-narrative begins to serve as a sensemaking agent. Curation is important – yes, it’s biased, yes it misses contributions, but it’s personal. (I addressed curation briefly in this paper (.pdf), in Curatorial Teaching, and in this short presentation).

While information is growing in abundance and tools and algorithms (data mining, visualization) are being developed as solutions, we can’t overlook the importance of wayfinding and sensemaking in social systems. As we progress in this course – especially next week when our topic is “big data” – I think we need to also focus on the human aspect of data, sensemaking, curation, and trust.

Course Syllabus

The course syllabus for the open online course Learning and Knowledge Analytics is now available:

If you haven’t done so yet, join this google group: LAK11. All updates will be sent to this group.

The moodle discussion forum (free to create an account):

Important links for LAK11

Bookmark this post, it includes important links for the course:

Netvibes Aggregation Page

Elluminate (for Live Sessions)

Moodle (for asynchronous interactions)

Google Groups (The Daily email)