Article

Seek for Success: A Visualization Approach for Understanding the Dynamics of Academic Careers

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

How to achieve academic career success has been a long-standing research question in social science research. With the growing availability of large-scale well-documented academic profiles and career trajectories, scholarly interest in career success has been reinvigorated, which has emerged to be an active research domain called the Science of Science (i.e., SciSci). In this study, we adopt an innovative dynamic perspective to examine how individual and social factors will influence career success over time. We propose ACSeeker , an interactive visual analytics approach to explore the potential factors of success and how the influence of multiple factors changes at different stages of academic careers. We first applied a Multi-factor Impact Analysis framework to estimate the effect of different factors on academic career success over time. We then developed a visual analytics system to understand the dynamic effects interactively. A novel timeline is designed to reveal and compare the factor impacts based on the whole population. A customized career line showing the individual career development is provided to allow a detailed inspection. To validate the effectiveness and usability of ACSeeker , we report two case studies and interviews with a social scientist and general researchers.

No full-text available

To read the full-text of this research,
you can request a copy directly from the authors.

... Another line of research focuses on surveying visualization researchers to gather insights on trending topics such as immersive analytics [17] and big data visual analytics [2]. Furthermore, some visualization approaches have been developed to understand the research profiles [34] and career paths [78] of visualization researchers. ...
... Visual representation 10 New visualization methods were developed to assist users to further understand critical factor data [42]. Design study (Problem abstraction) 9 We characterize the problem domain of visual analytics of time-varying effects of multiple factors on academic career success [78]. Data mining algorithm 8 We propose a dynamic clustering algorithm to enable the efficient clustering of fast-paced incoming streaming data [30]. ...
... According to Sedlmair et al. [62], methodology is like a recipe describing "strategy, plan of action, process, or design lying behind the choice and use of particular methods" and methods are like ingredients. The common methodology in our surveyed papers includes design study (e.g., [12,15,69]), user-centered design and its extension (e.g., [21,55,76]), and the nested model of visualization design [78]. Methods for understanding the domain problems are primarily interviews (discussions and meetings) and literature review. ...
Preprint
Full-text available
The last decade has witnessed many visual analytics (VA) systems that make successful applications to wide-ranging domains such as urban analytics and explainable AI. However, those systems are often designed, developed, and evaluated on an ad-hoc basis, provoking and spotlighting criticisms about the research rigor and contributions within the visualization community. We come in defence of VA systems by contributing two interview studies with VA researchers to gather critics and replies to those critics. First, we interview 24 researchers about criticisms for VA systems they have received from peers. Through an iterative coding and refinement process, we summarize the interview data into a list of 36 common criticisms. Second, we interview 17 researchers to validate our list and collect replies to those criticisms. We conclude by discussing eight important problems and future research opportunities to advance the theoretical and practical underpinnings of VA systems. We highlight that the presented knowledge is deep, extensive, but also imperfect, provocative, and controversial, and thus recommend reading with an inclusive and critical eye. We hope our work can provide solid foundations and spark discussions to promote the research field forward more rigorously and vibrantly.
... Visualization for scientific data. Many visualization techniques have been developed to reveal insights from scientific databases, such as dynamic and heterogeneous networks [25,76], sequences and time series [70,75], multi-dimension measures [54], and texts [26,32]. These studies introduce general visualization techniques for specific data structures and use scientific datasets as illustrative examples. ...
Article
Full-text available
Science has long been viewed as a key driver of economic growth and rising standards of living. Knowledge about how scientific advances support marketplace inventions is therefore essential for understanding the role of science in propelling real-world applications and technological progress. The increasing availability of large-scale datasets tracing scientific publications and patented inventions and the complex interactions among them offers us new opportunities to explore the evolving dual frontiers of science and technology at an unprecedented level of scale and detail. However, we lack suitable visual analytics approaches to analyze such complex interactions effectively. Here we introduce InnovationInsights, an interactive visual analysis system for researchers, research institutions, and policymakers to explore the complex linkages between science and technology, and to identify critical innovations, inventors, and potential partners. The system first identifies important associations between scientific papers and patented inventions through a set of statistical measures introduced by our experts from the field of the Science of Science. A series of visualization views are then used to present these associations in the data context. In particular, we introduce the Interplay Graph to visualize patterns and insights derived from the data, helping users effectively navigate citation relationships between papers and patents. This visualization thereby helps them identify the origins of technical inventions and the impact of scientific research. We evaluate the system through two case studies with experts followed by expert interviews. We further engage a premier research institution to test-run the system, helping its institution leaders to extract new insights for innovation. Through both the case studies and the engagement project, we find that our system not only meets our original goals of design, allowing users to better identify the sources of technical inventions and to understand the broad impact of scientific research; it also goes beyond these purposes to enable an array of new applications for researchers and research institutions, ranging from identifying untapped innovation potential within an institution to forging new collaboration opportunities between science and industry.
... Clicking on a gray bubble can highlight the nodes in the cluster. Alternative Design: We have also considered utilizing chronological order as the primary horizontal axis to layout the nodes and representing similarities with edges between generations, which is similar to the CareerLine design [65]. Nonetheless, when the number of generations is large, such edges may result in severe visual clutter. ...
Preprint
Full-text available
Evolutionary multi-objective optimization (EMO) algorithms have been demonstrated to be effective in solving multi-criteria decision-making problems. In real-world applications, analysts often employ several algorithms concurrently and compare their solution sets to gain insight into the characteristics of different algorithms and explore a broader range of feasible solutions. However, EMO algorithms are typically treated as black boxes, leading to difficulties in performing detailed analysis and comparisons between the internal evolutionary processes. Inspired by the successful application of visual analytics tools in explainable AI, we argue that interactive visualization can significantly enhance the comparative analysis between multiple EMO algorithms. In this paper, we present a visual analytics framework that enables the exploration and comparison of evolutionary processes in EMO algorithms. Guided by a literature review and expert interviews, the proposed framework addresses various analytical tasks and establishes a multi-faceted visualization design to support the comparative analysis of intermediate generations in the evolution as well as solution sets. We demonstrate the effectiveness of our framework through case studies on benchmarking and real-world multi-objective optimization problems to elucidate how analysts can leverage our framework to inspect and compare diverse algorithms.
... Visualization for scientific data. Many visualization techniques have been developed to reveal insights from scientific databases, such as dynamic and heterogeneous networks [25,76], sequences and time series [70,75], multi-dimension measures [54], and texts [26,32]. These studies introduce general visualization techniques for specific data structures and use scientific datasets as illustrative examples. ...
Preprint
Full-text available
Science has long been viewed as a key driver of economic growth and rising standards of living. Knowledge about how scientific advances support marketplace inventions is therefore essential for understanding the role of science in propelling real-world applications and technological progress. The increasing availability of large-scale datasets tracing scientific publications and patented inventions and the complex interactions among them offers us new opportunities to explore the evolving dual frontiers of science and technology at an unprecedented level of scale and detail. However, we lack suitable visual analytics approaches to analyze such complex interactions effectively. Here we introduce InnovationInsights, an interactive visual analysis system for researchers, research institutions, and policymakers to explore the complex linkages between science and technology, and to identify critical innovations, inventors, and potential partners. The system first identifies important associations between scientific papers and patented inventions through a set of statistical measures introduced by our experts from the field of the Science of Science. A series of visualization views are then used to present these associations in the data context. In particular, we introduce the Interplay Graph to visualize patterns and insights derived from the data, helping users effectively navigate citation relationships between papers and patents. This visualization thereby helps them identify the origins of technical inventions and the impact of scientific research. We evaluate the system through two case studies with experts followed by expert interviews. We further engage a premier research institution to test-run the system, helping its institution leaders to extract new insights for innovation.
... Therefore, the tour view follows the design space of visual analytics techniques for event sequence data (Guo et al. 2021b). Guo et al. (2021b) summarized five types of representations for event sequences, namely chart-, timeline- (Wang et al. 2022), hierarchy- (Wongsuphasawat et al. 2011), sankey-(Wang et al. 2021bWu et al. 2022), and matrix-based visualizations. We choose the timeline-based since it is the most intuitive visualizations. ...
Article
Full-text available
Planning an ideal tour schedule is a tedious process, where the attractions to visit and the order of visits need to be carefully determined. In this paper, we propose a novel interactive approach for tour planning. We first extract prior tourists’ experiences from the crowdsourcing tour data on the Web using frequent substring mining. We then design and implement a planning tool equipped with interactive visualizations, enabling users to learn the extracted experiences and plan their own tours. Our approach is evaluated with two usage scenarios on real-world tour data in two cities. Compared with previous methods, our approach strikes a balance between efficiency and reliability. On the one hand, we support the interactive manipulation of attraction sequence (i.e., multiple attractions at a time), thereby ensuring efficiency. On the other hand, we keep humans in the loop of the tour planning process via interactive visualizations. This paper shows the value of tour data published by tourists on the Web for personalized tour planning.Graphic abstract
... Khulusi et al. [30] used network analysis and a novel visualization design to define groups interactively for musicians' biography. CareerLens [46] and ACSeeker [47] utilize time-series analysis to explore career trajectories. ...
Preprint
In history research, cohort analysis seeks to identify social structures and figure mobilities by studying the group-based behavior of historical figures. Prior works mainly employ automatic data mining approaches, lacking effective visual explanation. In this paper, we present CohortVA, an interactive visual analytic approach that enables historians to incorporate expertise and insight into the iterative exploration process. The kernel of CohortVA is a novel identification model that generates candidate cohorts and constructs cohort features by means of pre-built knowledge graphs constructed from large-scale history databases. We propose a set of coordinated views to illustrate identified cohorts and features coupled with historical events and figure profiles. Two case studies and interviews with historians demonstrate that CohortVA can greatly enhance the capabilities of cohort identifications, figure authentications, and hypothesis generation.
... This view contains a flow chart that represents the variation in Player A's scoring rate after he or she adjusts a return during his or her tactics (Fig. 3C). Flow charts are widely used to visualize temporal variations in variables [50], [51], [52]. In the accumulation view, the flow chart intuitively illustrates the process through which impacts accumulate gradually over the course of different tactics and finally aggregate into the total impact. ...
Article
We propose SimuExplorer, a visualization system to help analysts explore how player behavior impacts scoring rates in table tennis. Such analysis is indispensable for analysts and coaches, who aim to formulate training plans that can help players improve. However, it is challenging to identify the impacts of individual behaviors, as well as to understand how these impacts are generated and accumulated gradually over the course of a game. To address these challenges, we worked closely with domain analysts who used to work for a top national table tennis team to design SimuExplorer. The SimuExplorer system integrates a Markov chain model to simulate individual and cumulative impacts of particular behaviors. It then provides flow and matrix views to help users visualize and interpret these impacts. We demonstrate the usefulness of the system with three case studies. The domain analysts think highly of the system and have identified insights using it.
... Multiple graphs in different time windows constitute a dynamic graph where the structure can change over time [8]. The experts need to relate the causal graphs to the time for effective time-oriented exploration [69,70,77]. They also want to learn the temporal variations of the causal structures, such as periodicity and stability. ...
Article
Full-text available
The spatial time series generated by city sensors allow us to observe urban phenomena like environmental pollution and traffic congestion at an unprecedented scale. However, recovering causal relations from these observations to explain the sources of urban phenomena remains a challenging task because these causal relations tend to be time-varying and demand proper time series partitioning for effective analyses. The prior approaches extract one causal graph given long-time observations, which cannot be directly applied to capturing, interpreting, and validating dynamic urban causality. This paper presents Compass, a novel visual analytics approach for in-depth analyses of the dynamic causality in urban time series. To develop Compass, we identify and address three challenges: detecting urban causality, interpreting dynamic causal relations, and unveiling suspicious causal relations. First, multiple causal graphs over time among urban time series are obtained with a causal detection framework extended from the Granger causality test. Then, a dynamic causal graph visualization is designed to reveal the time-varying causal relations across these causal graphs and facilitate the exploration of the graphs along the time. Finally, a tailored multi-dimensional visualization is developed to support the identification of spurious causal relations, thereby improving the reliability of causal analyses. The effectiveness of Compass is evaluated with two case studies conducted on the real-world urban datasets, including the air pollution and traffic speed datasets, and positive feedback was received from domain experts.
Article
Full-text available
This paper considers the career paths of four academics at different stages in their career, examining key aspects in the trajectory of their journey to where they are now. The paper considers a range of key issues, pitfalls and barriers, and challenges they have faced in order to provide an insight into the differing journeys that academics may take. The research uses a combined auto-ethnographic and reflective approach to gather and interpret the experiences of the four individuals, in essence developing a reflective account on their personal journeys. The four academics were specifically chosen based on their different career paths, providing important opportunities to develop more in-depth reflective accounts of their stories. While they have all taken different trajectories, findings suggest significant overlap exists around issues such as imposter syndrome, psychological contract and identity. These issues, it would appear, have an interrelated impact upon the individual and, as such, cannot be separated effectively. The paper contributes to understandings of how academic careers progress, and may provide invaluable guidance to new entrants, or those considering entry into the world of academia.
Article
Full-text available
We conduct two in-lab experiments (N=93) to evaluate the effectiveness of Gantt charts, extended Gantt charts, and stringline charts for visualizing fixed-order event sequence data. We first formulate five types of event sequences and define three types of sequence elements: point events, interval events, and the temporal gaps between them. Our two experiments focus on event sequences with a pre-defined, fixed order, and measure task error rates and completion time. The first experiment shows single sequences and assesses the three charts' performance in comparing event duration or gap. The second experiment shows multiple sequences and evaluates how well the charts reveal temporal patterns. The results suggest that when visualizing single fixed-order event sequences, 1) Gantt and extended Gantt charts lead to comparable error rates in the duration-comparing task; 2) Gantt charts exhibit either shorter or equal completion time than extended Gantt charts; 3) both Gantt and extended Gantt charts demonstrate shorter completion times than stringline charts; 4) however, stringline charts outperform the other two charts with fewer errors in the comparing task when event type counts are high. Additionally, when visualizing multiple point-based fixed-order event sequences, stringline charts require less time than Gantt charts for people to find temporal patterns. Based on these findings, we discuss design opportunities for visualizing fixed-order event sequences and discuss future avenues for optimizing these charts.
Article
Evolutionary multi-objective optimization (EMO) algorithms have been demonstrated to be effective in solving multi-criteria decision-making problems. In real-world applications, analysts often employ several algorithms concurrently and compare their solution sets to gain insight into the characteristics of different algorithms and explore a broader range of feasible solutions. However, EMO algorithms are typically treated as black boxes, leading to difficulties in performing detailed analysis and comparisons between the internal evolutionary processes. Inspired by the successful application of visual analytics tools in explainable AI, we argue that interactive visualization can significantly enhance the comparative analysis between multiple EMO algorithms. In this paper, we present a visual analytics framework that enables the exploration and comparison of evolutionary processes in EMO algorithms. Guided by a literature review and expert interviews, the proposed framework addresses various analytical tasks and establishes a multi-faceted visualization design to support the comparative analysis of intermediate generations in the evolution as well as solution sets. We demonstrate the effectiveness of our framework through case studies on benchmarking and real-world multi-objective optimization problems to elucidate how analysts can leverage our framework to inspect and compare diverse algorithms.
Article
Goods turnover is the core of digital warehouse operation, including many processes, such as receiving, picking, and packing of goods. Analyzing goods turnover data can generate valuable insights for optimizing warehouse management, thereby improving operation efficiency. However, most existing methods focus on partial processes, making it hard for warehouse managers to understand the operation state and the goods turnover patterns, which often require the analysis of the interrelated processes of goods turnover. In this paper, we abstract six types of goods turnover events to describe the warehouse operation workflow and present WarehouseLens, a visual analytics system to analyze goods turnover from an overall perspective. To understand the warehouse operation state, we propose a temporal visualization method consisting of a novel state calendar view and an improved circular heat map to reflect the trend and periodicity pattern of the operation state. To explore the goods turnover patterns, we provide an improved parallel coordinate plot for users to view the attribute distribution of goods to filter key goods and a tailored mode circle view to discover the frequent outbound mode of goods. Three case studies and expert interviews on a real-world warehouse dataset demonstrate the usefulness and effectiveness of WarehouseLens in revealing the warehouse operation state and goods turnover patterns.Graphical abstract
Article
Given a large number of applications and complex processing procedures, how to efficiently shift and schedule tax officers to provide good services to taxpayers is now receiving more attention from tax authorities. The availability of historical application data makes it possible for tax managers to shift and schedule staff with data support, but it is unclear how to properly leverage the historical data. To investigate the problem, this study adopts a user-centered design approach. We first collect user requirements by conducting interviews with tax managers and characterize their requirements of shifting and scheduling into time series prediction and resource scheduling problems. Then, we propose Tax-Scheduler, an interactive visualization system with a time-series prediction algorithm and genetic algorithm to support staff shifting and scheduling in the tax scenarios. To evaluate the effectiveness of the system and understand how non-technical tax managers react to the system with advanced algorithms and visualizations, we conduct user interviews with tax managers and distill several implications for future system design.
Article
In history research, cohort analysis seeks to identify social structures and figure mobilities by studying the group-based behavior of historical figures. Prior works mainly employ automatic data mining approaches, lacking effective visual explanation. In this paper, we present CohortVA, an interactive visual analytic approach that enables historians to incorporate expertise and insight into the iterative exploration process. The kernel of CohortVA is a novel identification model that generates candidate cohorts and constructs cohort features by means of pre-built knowledge graphs constructed from large-scale history databases. We propose a set of coordinated views to illustrate identified cohorts and features coupled with historical events and figure profiles. Two case studies and interviews with historians demonstrate that CohortVA can greatly enhance the capabilities of cohort identifications, figure authentications, and hypothesis generation.
Article
The last decade has witnessed many visual analytics (VA) systems that make successful applications to wide-ranging domains like urban analytics and explainable AI. However, their research rigor and contributions have been extensively challenged within the visualization community. We come in defence of VA systems by contributing two interview studies for gathering critics and responses to those criticisms. First, we interview 24 researchers to collect criticisms the review comments on their VA work. Through an iterative coding and refinement process, the interview feedback is summarized into a list of 36 common criticisms. Second, we interview 17 researchers to validate our list and collect their responses, thereby discussing implications for defending and improving the scientific values and rigor of VA systems. We highlight that the presented knowledge is deep, extensive, but also imperfect, provocative, and controversial, and thus recommend reading with an inclusive and critical eye. We hope our work can provide thoughts and foundations for conducting VA research and spark discussions to promote the research field forward more rigorously and vibrantly.
Article
Full-text available
The precise prevention and control of air pollution is a great challenge faced by environmental experts in recent years. Understanding the air quality evolution in the urban agglomeration is important for coordinated control of air pollution. However, the complex pollutant interactions between different cities lead to the collaborative evolution of air quality. The existing statistical and machine learning methods cannot well support the comprehensive analysis of the dynamic air quality evolution. In this study, we propose AirLens, an interactive visual analytics system that can help domain experts explore and understand the air quality evolution in the urban agglomeration from multiple levels and multiple aspects. To facilitate the cognition of the complex multivariate spatiotemporal data, we first propose a multi‐run clustering strategy with a novel glyph design for summarizing and understanding the typical pollutant patterns effectively. On this basis, the system supports the multi‐level exploration of air quality evolution, namely, the overall level, stage level and detail level. Frequent pattern mining, city community extraction and useful filters are integrated into the system for discovering significant information comprehensively. The case study and positive feedback from domain experts demonstrate the effectiveness and usability of AirLens.
Article
Full-text available
Abstract With the growing popularity of visualizations in various fields, visualization comprehension has gained considerable attention. In this work, we focus on the effect of data size and pattern salience on comprehension of scatterplot, a popular visualization type. We began with a preliminary study in which we interviewed 50 people in terms of comprehension difficulties of 90 different visualizations. The results reveal that data size is one of the top three factors affecting visualization comprehension. Besides, the effect of data size probably depends on the pattern salience within the data. Therefore, we carried out our experiment on the effect of data size and data-related pattern salience on three intermediate-level com- prehension tasks, namely finding anomalies, judging correlation, and identifying clusters. The tasks were conducted on the scatterplot due to its familiarity to users and ability to support diverse tasks. Through the experiment, we found a significant interaction effect of data size and pattern salience on the comprehension of the trends in scatterplots. In specific conditions of pattern salience, data size impacts the judgment of anomalies and cluster centers. We discussed the findings in our experiment and further summarized the factors in visualization comprehension.
Article
Full-text available
Event sequence mining is often used to summarize patterns from hundreds of sequences but faces special challenges when handling racket sports data. In racket sports (e.g., tennis and badminton), a player hitting the ball is considered a multivariate event consisting of multiple attributes (e.g., hit technique and ball position). A rally (i.e., a series of consecutive hits beginning with one player serving the ball and ending with one player winning a point) thereby can be viewed as a multivariate event sequence. Mining frequent patterns and depicting how patterns change over time is instructive and meaningful to players who want to learn more short-term competitive strategies (i.e., tactics) that encompass multiple hits. However, players in racket sports usually change their tactics rapidly according to the opponent's reaction, resulting in ever-changing tactic progression. In this work, we introduce a tailored visualization system built on a novel multivariate sequence pattern mining algorithm to facilitate explorative identification and analysis of various tactics and tactic progression. The algorithm can mine multiple non-overlapping multivariate patterns from hundreds of sequences effectively. Based on the mined results, we propose a glyph-based Sankey diagram to visualize the ever-changing tactic progression and support interactive data exploration. Through two case studies with four domain experts in tennis and badminton, we demonstrate that our system can effectively obtain insights about tactic progression in most racket sports. We further discuss the strengths and the limitations of our system based on domain experts' feedback.
Article
Full-text available
In table tennis, tactics specified by three consecutive strokes represent the high-level competition strategies in matches. Effective detection and analysis of tactics can reveal the playing styles of players, as well as their strengths and weaknesses. However, tactical analysis in table tennis is challenging as the analysts can often be overwhelmed by the large quantity and high dimension of the data. Statistical charts have been extensively used by researchers to explore and visualize table tennis data. However, these charts cannot support efficient comparative and correlation analysis of complicated tactic attributes. Besides, existing studies are limited to the analysis of one match. However, one player's strategy can change along with his/her opponents in different matches. Therefore, the data of multiple matches can support a more comprehensive tactical analysis. To address these issues, we introduced a visual analytics system called Tac-Miner to allow analysts to effectively analyze, explore, and compare tactics of multiple matches based on the advanced embedding and dimension reduction algorithms along with an interactive glyph. We evaluate our glyph's usability through a user study and demonstrate the system's usefulness through a case study with insights approved by coaches and domain experts.
Article
Full-text available
Many spatiotemporal events can be viewed as contagions. These events implicitly propagate across space and time by following cascading patterns, expanding their influence, and generating event cascades that involve multiple locations. Analyzing such cascading processes presents valuable implications in various urban applications, such as traffic planning and pollution diagnostics. Motivated by the limited capability of the existing approaches in mining and interpreting cascading patterns, we propose a visual analytics system called VisCas. VisCas combines an inference model with interactive visualizations and empowers analysts to infer and interpret the latent cascading patterns in the spatiotemporal context. To develop VisCas, we address three major challenges, 1) generalized pattern inference, 2) implicit influence visualization, and 3) multifaceted cascade analysis. For the first challenge, we adapt the state-of-the-art cascading network inference technique to general urban scenarios, where cascading patterns can be reliably inferred from large-scale spatiotemporal data. For the second and third challenges, we assemble a set of effective visualizations to support location navigation, influence inspection, and cascading exploration, and facilitate the in-depth cascade analysis. We design a novel influence view based on a three-fold optimization strategy for analyzing the implicit influences of the inferred patterns. We demonstrate the capability and effectiveness of VisCas with two case studies conducted on real-world traffic congestion and air pollution datasets with domain experts.
Article
Full-text available
The increased availability of quantitative historical datasets has provided new research opportunities for multiple disciplines in social science. In this article, we work closely with the constructors of a new dataset, CGED-Q (China Government Employee Database-Qing), that records the career trajectories of over 340,000 government officials in the Qing bureaucracy in China from 1760 to 1912. We use these data to study career mobility from a historical perspective and understand social mobility and inequality. However, existing statistical approaches are inadequate for analyzing career mobility in this historical dataset with its fine-grained attributes and long time span, since they are mostly hypothesis-driven and require substantial effort. We propose CareerLens , an interactive visual analytics system for assisting experts in exploring, understanding, and reasoning from historical career data. With CareerLens , experts examine mobility patterns in three levels-of-detail, namely, the macro-level providing a summary of overall mobility, the meso-level extracting latent group mobility patterns, and the micro-level revealing social relationships of individuals. We demonstrate the effectiveness and usability of CareerLens through two case studies and receive encouraging feedback from follow-up interviews with domain experts.
Article
Full-text available
The observation that a socioeconomic agent with a high reputation gets a disproportionately higher recognition for the same work than an agent with lower reputation is typical in career development and wealth. This phenomenon, which is known as Matthew effect in the literature, leads to an increasing inequality over time. The present paper employs an optimal control model to study the implications of the Matthew effect on the optimal efforts of a scientist into reputation. The solution of the model exhibits, for sufficiently low effort costs, a new type of unstable equilibrium at which effort is at its upper bound. This equilibrium, which we denote as Stalling Equilibrium, serves as a threshold level separating success and failure in academia. In addition we show that at the Stalling Equilibrium the solution can be abnormal. We provide a clear economic interpretation for this solution characteristic.
Article
Full-text available
Temporal event sequence alignment has been used in many domains to visualize nuanced changes and interactions over time. Existing approaches align one or two sentinel events. Overview tasks require examining all alignments of interest using interaction and time or juxtaposition of many visualizations. Furthermore, any event attribute overviews are not closely tied to sequence visualizations. We present SEQUENCE BRAIDING, a novel overview visualization for temporal event sequences and attributes using a layered directed acyclic network. SEQUENCE BRAIDING visually aligns many temporal events and attribute groups simultaneously and supports arbitrary ordering, absence, and duplication of events. In a controlled experiment we compare SEQUENCE BRAIDING and IDMVis on user task completion time, correctness, error, and confidence. Our results provide good evidence that users of SEQUENCE BRAIDING can understand high-level patterns and trends faster and with similar error. A full version of this paper with all appendices; the evaluation stimuli, data, and analysis code; and source code are available at osf.io/mq2wt.
Article
Full-text available
With the rapid development of sensing technologies, massive spatiotemporal data have been acquired from the urban space with respect to different domains, such as transportation and environment. Numerous co-occurrence patterns (e.g., traffic speed < 10km/h, weather = foggy, and air quality = unhealthy) between the transportation data and other types of data can be obtained with given spatiotemporal constraints (e.g., within 3 kilometers and lasting for 2 hours) from these heterogeneous data sources. Such patterns present valuable implications for many urban applications, such as traffic management, pollution diagnosis, and transportation planning. However, extracting and understanding these patterns is beyond manual capability because of the scale, diversity, and heterogeneity of the data. To address this issue, a novel visual analytics system called CorVizor is proposed to identify and interpret these co-occurrence patterns. CorVizor comprises two major components. The first component is a co-occurrence mining framework involving three steps, namely, spatiotemporal indexing, co-occurring instance generation, and pattern mining. The second component is a visualization technique called CorView that implements a level-of-detail mechanism by integrating tailored visualizations to depict the extracted spatiotemporal co-occurrence patterns. The case studies and expert interviews are conducted to demonstrate the effectiveness of CorVizor.
Article
Full-text available
Recently we have witnessed growing adoption of deep sequence models (e.g. LSTMs) in many application domains, including predictive health care, natural language processing, and log analysis. However, the intricate working mechanism of these models confines their accessibility to the domain experts. Their black-box nature also makes it a challenging task to incorporate domain-specific knowledge of the experts into the model. In ProtoSteer (Prototype Steering), we tackle the challenge of directly involving the domain experts to steer a deep sequence model without relying on model developers as intermediaries. Our approach originates in case-based reasoning, which imitates the common human problem-solving process of consulting past experiences to solve new problems. We utilize ProSeNet (Prototype Sequence Network), which learns a small set of exemplar cases (i.e., prototypes) from historical data. In ProtoSteer they serve both as an efficient visual summary of the original data and explanations of model decisions. With ProtoSteer the domain experts can inspect, critique, and revise the prototypes interactively. The system then incorporates user-specified prototypes and incrementally updates the model. We conduct extensive case studies and expert interviews in application domains including sentiment analysis on texts and predictive diagnostics based on vehicle fault logs. The results demonstrate that involvements of domain users can help obtain more interpretable models with concise prototypes while retaining similar accuracy.
Article
Full-text available
Simulative analysis in competitive sports can provide prospective insights, which can help improve the performance of players in future matches. However, adequately simulating the complex competition process and effectively explaining the simulation result to domain experts are typically challenging. This work presents a design study to address these challenges in table tennis. We propose a well-established hybrid second-order Markov chain model to characterize and simulate the competition process in table tennis. Compared with existing methods, our approach is the first to support the effective simulation of tactics, which represent high-level competition strategies in table tennis. Furthermore, we introduce a visual analytics system called Tac-Simur based on the proposed model for simulative visual analytics. Tac-Simur enables users to easily navigate different players and their tactics based on their respective performance in matches to identify the player and the tactics of interest for further analysis. Then, users can utilize the system to interactively explore diverse simulation tasks and visually explain the simulation results. The effectiveness and usefulness of this work are demonstrated by two case studies, in which domain experts utilize Tac-Simur to find interesting and valuable insights. The domain experts also provide positive feedback on the usability of Tac-Simur. Our work can be extended to other similar sports such as tennis and badminton.
Article
Full-text available
Emotions play a key role in human communication and public presentations. Human emotions are usually expressed through multiple modalities. Therefore, exploring multimodal emotions and their coherence is of great value for understanding emotional expressions in presentations and improving presentation skills. However, manually watching and studying presentation videos is often tedious and time-consuming. There is a lack of tool support to help conduct an efficient and in-depth multi-level analysis. Thus, in this paper, we introduce EmoCo , an interactive visual analytics system to facilitate efficient analysis of emotion coherence across facial, text, and audio modalities in presentation videos. Our visualization system features a channel coherence view and a sentence clustering view that together enable users to obtain a quick overview of emotion coherence and its temporal evolution. In addition, a detail view and word view enable detailed exploration and comparison from the sentence level and word level, respectively. We thoroughly evaluate the proposed system and visualization techniques through two usage scenarios based on TED Talk videos and interviews with two domain experts. The results demonstrate the effectiveness of our system in gaining insights into emotion coherence in presentations.
Conference Paper
Full-text available
The understanding of job mobility can benefit talent management operations in a number of ways, such as talent recruitment, talent development, and talent retention. While there is extensive literature showing the predictability of the organization-level job mobility patterns (e.g., in terms of the employee turnover rate), there are no effective solutions for supporting the understanding of job mobility at an individual level. To this end, in this paper, we propose a hierarchical career-path-aware neural network for learning individual-level job mobility. Specifically, we aim at answering two questions related to individuals in their career paths: 1) who will be the next employer? 2) how long will the individual work in the new position? Specifically, our model exploits a hierarchical neural network structure with embedded attention mechanism for characterizing the internal and external job mobility. Also, it takes personal profile information into consideration in the learning process. Finally, the extensive results on real-world data show that the proposed model can lead to significant improvements in prediction accuracy for the two aforementioned prediction problems. Moreover, we show that the above two questions are well addressed by our model with a certain level of interpretability. For the case studies, we provide data-driven evidence showing interesting patterns associated with various factors (e.g., job duration, firm type, etc.) in the job mobility prediction process.
Article
Full-text available
The Curriculum Vitae (CV, also referred to as “résumé”) is an established representation of a person's academic and professional history. A typical CV is comprised of multiple sections associated with spatio‐temporal, nominal, hierarchical, and ordinal data. The main task of a recruiter is, given a job application with specific requirements, to compare and assess CVs in order to build a short list of promising candidates to interview. Commonly, this is done by viewing CVs in a side‐by‐side fashion. This becomes challenging when comparing more than two CVs, because the reader is required to switch attention between them. Furthermore, there is no guarantee that the CVs are structured similarly, thus making the overview cluttered and significantly slowing down the comparison process. In order to address these challenges, in this paper we propose “CV3”, an interactive exploration environment offering users a new way to explore, assess, and compare multiple CVs, to suggest suitable candidates for specific job requirements. We validate our system by means of domain expert feedback whose results highlight both the efficacy of our approach and its limitations. We learned that CV3 eases the overall burden of recruiters thereby assisting them in the selection process.
Article
Full-text available
We survey state-of-the-art approaches to study trajectories in their entirety, adopting a holistic perspective, and discuss their strengths and weaknesses. We begin by considering sequence analysis (SA), one of the most established holistic approaches. We discuss the inherent problems arising in SA, particularly in the study of the relationship between trajectories and covariates. We describe some recent developments combining SA and Event History Analysis, and illustrate how weakening the holistic perspective—focusing on sub-trajectories—might result in a more flexible analysis of life courses. We then move to some model-based approaches (included in the broad classes of multistate and of mixture latent Markov models) that further weaken the holistic perspective, assuming that the difficult task of predicting and explaining trajectories can be simplified by focusing on the collection of observed transitions. Our goal is twofold. On one hand, we aim to provide social scientists with indications for informed methodological choices and to emphasize issues that require consideration for proper application of the described approaches. On the other hand, by identifying relevant and open methodological challenges, we highlight and encourage promising directions for future research.
Chapter
Full-text available
This introductory chapter briefly describes the development of sequence analysis in social sciences from the pioneering contributions made by Andrew Abbott to date. We then discuss the future of sequence analysis, which, from our point of view, calls for a tighter interaction with other methods for longitudinal data. Lastly, we show how the papers in this bundle set the foundation towards this future.
Article
Full-text available
Academic career development refers to the process by which employers as well as scholars working in research, teaching, and/or administrative roles in academic and higher education contexts manage various tasks, behaviors, and experiences within and across jobs and organizations over time, with implications for scholars' work-related identity. In this review article, we address the question: to what extent has conceptual and empirical research on academic career development captured central constructs and processes outlined by two important and comprehensive career development theories? Using social cognitive career theory and life-span, life-space theory as guiding frameworks, we categorized relevant articles published in academic journals into five thematic clusters: (a) individual characteristics, (b) contextual factors, (c) active regulation of behavior, (d) career stages, and (e) work and nonwork roles. Within these thematic clusters, major topics in the existing literature on academic career development include gender differences and women's experiences, mentoring and other career development interventions, and career development in the field of medicine. In contrast, social and cognitive processes, action regulation, later career stages, and the work-nonwork interface have been neglected in the literature on academic career development. We conclude by outlining an agenda for future research, including theoretical and methodological considerations.
Article
Full-text available
Research impact plays a critical role in evaluating the research quality and influence of a scholar, a journal, or a conference. Many researchers have attempted to quantify research impact by introducing different types of metrics based on citation data, such as h-index, citation count, and impact factor. These metrics are widely used in the academic community. However, quantitative metrics are highly aggregated in most cases and sometimes biased, which probably results in the loss of impact details that are important for comprehensively understanding research impact. For example, which research area does a researcher have great research impact on? How does the research impact change over time? How do the collaborators take effect on the research impact of an individual? Simple quantitative metrics can hardly help answer such kind of questions, since more detailed exploration of the citation data is needed. Previous work on visualizing citation data usually only shows limited aspects of research impact and may suffer from other problems including visual clutter and scalability issues. To fill this gap, we propose an interactive visualization tool, ImpactVis, for better exploration of research impact through citation data. Case studies and in-depth expert interviews are conducted to demonstrate the effectiveness of ImpactVis.
Article
Full-text available
The whys and wherefores of SciSci The science of science (SciSci) is based on a transdisciplinary approach that uses large data sets to study the mechanisms underlying the doing of science—from the choice of a research problem to career trajectories and progress within a field. In a Review, Fortunato et al. explain that the underlying rationale is that with a deeper understanding of the precursors of impactful science, it will be possible to develop systems and policies that improve each scientist's ability to succeed and enhance the prospects of science as a whole. Science , this issue p. eaao0185
Conference Paper
Full-text available
This paper presents a comparative study on personal visualizations of bibliographic data. We consider three designs for egocentric visualization: node-link diagrams, adjacency matrices, and botanical trees to depict one's academic career in terms of his/her publication records. Case studies are conducted to compare the effectiveness of resulting visualizations for conveying particular aspect of a researcher's bibliographic records. Based on our study, we find that node-link diagrams are better at revealing the overall distribution of certain attributes; adjacency matrices can convey more information with less clutter; and botanical trees are visually attractive and provide the best at a glance characterization of the mapped data, but mapping data to tree features must be carefully done to derive expressive visualization.
Conference Paper
Full-text available
The egocentric analysis of dynamic networks focuses on discovering the temporal patterns of a subnetwork around a specific central actor (i.e., an ego-network). These types of analyses are useful in many application domains, such as social science and business intelligence, providing insights about how the central actor interacts with the outside world. We present EgoLines, an interactive visualization to support the egocentric analysis of dynamic networks. Using a "subway map" metaphor, a user can trace an individual actor over the evolution of the ego-network. The design of EgoLines is grounded in a set of key analytical questions pertinent to egocentric analysis, derived from our interviews with three domain experts and general network analysis tasks. We demonstrate the effectiveness of EgoLines in egocentric analysis tasks through a controlled experiment with 18 participants and a use-case developed with a domain expert.
Article
Full-text available
In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design, operation, and management of water resources, this research develops a time series outlier detection method for hydrologic data that can be used to identify data that deviate from historical patterns. The method first built a forecasting model on the history data and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a given prediction confidence interval (PCI), which can be calculated by the predicted value and confidence coefficient. The use of PCI as threshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to address the suitable threshold selection problem. The method performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no preclassification of anomalies. Experiments with different hydrologic real-world time series showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time series analysis.
Article
Full-text available
Ego-network, which represents relationships between a specific individual, i.e., the ego, and people connected to it, i.e., alters, is a critical target to study in social network analysis. Evolutionary patterns of ego-networks along time provide huge insights to many domains such as sociology, anthropology, and psychology. However, the analysis of dynamic ego-networks remains challenging due to its complicated time-varying graph structures, for example: alters come and leave, ties grow stronger and fade away, and alter communities merge and split. Most of the existing dynamic graph visualization techniques mainly focus on topological changes of the entire network, which is not adequate for egocentric analytical tasks. In this paper, we present egoSlider, a visual analysis system for exploring and comparing dynamic ego-networks. egoSlider provides a holistic picture of the data through multiple interactively coordinated views, revealing ego-network evolutionary patterns at three different layers: a macroscopic level for summarizing the entire ego-network data, a mesoscopic level for overviewing specific individuals' ego-network evolutions, and a microscopic level for displaying detailed temporal information of egos and their alters. We demonstrate the effectiveness of egoSlider with a usage scenario with the DBLP publication records. Also, a controlled user study indicates that in general egoSlider outperforms a baseline visualization of dynamic networks for completing egocentric analytical tasks.
Article
Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications.
Article
In soccer, passing is the most frequent interaction between players and plays a significant role in creating scoring chances. Experts are interested in analyzing players' passing behavior to learn passing tactics, i.e., how players build up an attack with passing. Various approaches have been proposed to facilitate the analysis of passing tactics. However, the dynamic changes of a team's employed tactics over a match have not been comprehensively investigated. To address the problem, we closely collaborate with domain experts and characterize requirements to analyze the dynamic changes of a team's passing tactics. To characterize the passing tactic employed for each attack, we propose a topic-based approach that provides a high-level abstraction of complex passing behaviors. Based on the model, we propose a glyph-based design to reveal the multi-variate information of passing tactics within different phases of attacks, including player identity, spatial context, and formation. We further design and develop PassVizor, a visual analytics system, to support the comprehensive analysis of passing dynamics. With the system, users can detect the changing patterns of passing tactics and examine the detailed passing process for evaluating passing tactics. We invite experts to conduct analysis with PassVizor and demonstrate the usability of the system through an expert interview.
Article
Storyline visualizations are an effective means to present the evolution of plots and reveal the scenic interactions among characters. However, the design of storyline visualizations is a difficult task as users need to balance between aesthetic goals and narrative constraints. Despite that the optimization-based methods have been improved significantly in terms of producing aesthetic and legible layouts, the existing (semi-) automatic methods are still limited regarding 1) efficient exploration of the storyline design space and 2) flexible customization of storyline layouts. In this work, we propose a reinforcement learning framework to train an AI agent that assists users in exploring the design space efficiently and generating well-optimized storylines. Based on the framework, we introduce PlotThread, an authoring tool that integrates a set of flexible interactions to support easy customization of storyline visualizations. To seamlessly integrate the AI agent into the authoring process, we employ a mixed-initiative approach where both the agent and designers work on the same canvas to boost the collaborative design of storylines. We evaluate the reinforcement learning model through qualitative and quantitative experiments and demonstrate the usage of PlotThread using a collection of use cases.
Article
Using causal relations to guide decision making has become an essential analytical task across various domains, from marketing and medicine to education and social science. While powerful statistical models have been developed for inferring causal relations from data, domain practitioners still lack effective visual interface for interpreting the causal relations and applying them in their decision-making process. Through interview studies with domain experts, we characterize their current decision-making workflows, challenges, and needs. Through an iterative design process, we developed a visualization tool that allows analysts to explore, validate, and apply causal relations in real-world decision-making scenarios. The tool provides an uncertainty-aware causal graph visualization for presenting a large set of causal relations inferred from high-dimensional data. On top of the causal graph, it supports a set of intuitive user controls for performing what-if analyses and making action plans. We report on two case studies in marketing and student advising to demonstrate that users can effectively explore causal relations and design action plans for reaching their goals.
Article
Ordinary users fill some intervals on the time continuum by engaging in an online behavior and leave other intervals empty by disengaging from the behavior. Existing time-based measurements of online behaviors exclusively focus on characterizing filled time intervals and completely ignore the information embedded in empty time intervals. Empty time intervals, referring to time gaps between consecutive behaviors, carry important information on how time is organized for online behaviors. By analyzing two behavioral log files on webpage browsing and mobile application use, the study evaluates whether online behaviors characterized by empty time intervals differ from or accord with online behaviors characterized by filled time intervals. Behavioral burstiness, which measures the distribution of empty time intervals in consecutive online behaviors, is found to unveil behavioral patterns that are distinct from temporal duration that measures the overall length of the filled time intervals of online behaviors. Temporal duration is much more extended in mobile use compared with web surfing, whereas behavioral burstiness in mobile use is lower than that in web surfing. Marked circadian rhythms are observed in behavioral burstiness in web surfing and mobile use, whereas circadian rhythms are vague in temporal duration in web surfing and mobile use.
Article
We introduce a novel event-driven continuous time Bayesian network (ECTBN) representation to model situations where a system's state variables could be influenced by occurrences of events of various types. In this way, the model parameters and graphical structure capture not only potential “causal” dynamics of system evolution but also the influence of event occurrences that may be interventions. We propose a greedy search procedure for structure learning based on the BIC score for a special class of ECTBNs, showing that it is asymptotically consistent and also effective for limited data. We demonstrate the power of the representation by applying it to model paths out of poverty for clients of CityLink Center, an integrated social service provider in Cincinnati, USA. Here the ECTBN formulation captures the effect of classes/counseling sessions on an individual's life outcome areas such as education, transportation, employment and financial education.
Chapter
In this article, we propose an innovative method which is a combination of Sequences Analysis and Event History Analysis. We called this method Sequence History Analysis (SHA). We start by identifying typical past trajectories of individuals over time by using Sequence Analysis. We then estimate the effect of these typical past trajectories on the event under study using discrete-time models. The aim of this approach is to estimate the effect of past trajectories on the chances of experiencing an event. We apply the proposed methodological approach to an original study of the effect of past childhood co-residence structures on the chances of leaving the parental home in Switzerland. The empirical research was based on the LIVES Cohort study, a panel survey that started in autumn 2013 in Switzerland. Analyses show that it is not only the occurrence of an event that increases the risk of experiencing another event, but also the order in which various states occurred. What is more, it seems that two features have a significant influence on departure from the parental home: the co-residence structures and the arrival or departure of siblings from the parental home.
Chapter
In this article, we propose an innovative method which is a combination of Sequences Analysis and Event History Analysis. We called this method Sequence History Analysis (SHA). We start by identifying typical past trajectories of individuals over time by using Sequence Analysis. We then estimate the effect of these typical past trajectories on the event under study using discrete-time models. The aim of this approach is to estimate the effect of past trajectories on the chances of experiencing an event. We apply the proposed methodological approach to an original study of the effect of past childhood co-residence structures on the chances of leaving the parental home in Switzerland. The empirical research was based on the LIVES Cohort study, a panel survey that started in autumn 2013 in Switzerland. Analyses show that it is not only the occurrence of an event that increases the risk of experiencing another event, but also the order in which various states occurred. What is more, it seems that two features have a significant influence on departure from the parental home: the co-residence structures and the arrival or departure of siblings from the parental home.
Article
Event sequence data is common to a broad range of application domains, from security to health care to scholarly communication. This form of data captures information about the progression of events for an individual entity (e.g., a computer network device; a patient; an author) in the form of a series of time-stamped observations. Moreover, each event is associated with an event type (e.g., a computer login attempt, or a hospital discharge). Analyses of event sequence data have been shown to help reveal important temporal patterns, such as clinical paths resulting in improved outcomes, or an understanding of common career trajectories for scholars. Moreover, recent research has demonstrated a variety of techniques designed to overcome methodological challenges such as large volumes of data and high dimensionality. However, the effective identification and analysis of latent stages of progression, which can allow for variation within different but similarly evolving event sequences, remain a significant challenge with important real-world motivations. In this paper, we propose an unsupervised stage analysis algorithm to identify semantically meaningful progression stages as well as the critical events which help define those stages. The algorithm follows three key steps: (1) event representation estimation, (2) event sequence warping and alignment, and (3) sequence segmentation. We also present a novel visualization system, ET2, which interactively illustrates the results of the stage analysis algorithm to help reveal evolution patterns across stages. Finally, we report three forms of evaluation for ET2: (1) case studies with two real-world datasets, (2) interviews with domain expert users, and (3) a performance evaluation on the progression analysis algorithm and the visualization design.
Article
Storyline visualization techniques have progressed significantly to generate illustrations of complex stories automatically. However, the visual layouts of storylines are not enhanced accordingly despite the improvement in the performance and extension of its application area. Existing methods attempt to achieve several shared optimization goals, such as reducing empty space and minimizing line crossings and wiggles. However, these goals do not always produce optimal results when compared to hand-drawn storylines. We conducted a preliminary study to learn how users translate a narrative into a hand-drawn storyline and check whether the visual elements in hand-drawn illustrations can be mapped back to appropriate narrative contexts. We also compared the hand-drawn storylines with storylines generated by the state-of-the-art methods and found they have significant differences. Our findings led to a design space that summarizes 1) how artists utilize narrative elements and 2) the sequence of actions artists follow to portray expressive and attractive storylines. We developed iStoryline, an authoring tool for integrating high-level user interactions into optimization algorithms and achieving a balance between hand-drawn storylines and automatic layouts. iStoryline allows users to create novel storyline visualizations easily according to their preferences by modifying the automatically generated layouts. The effectiveness and usability of iStoryline are studied with qualitative evaluations.
Article
This paper presents an approach for the interactive visualization, exploration and interpretation of large multivariate time series. Interesting patterns in such datasets usually appear as periodic or recurrent behavior often caused by the interaction between variables. To identify such patterns, we summarize the data as conceptual states, modeling temporal dynamics as transitions between the states. This representation can visualize large datasets with potentially billions of examples. We extend the representation to multiple spatial granularities allowing the user to find patterns on multiple scales. The result is an interactive web-based tool called StreamStory. StreamStory couples the abstraction with several tools that map the abstractions back to domain-specific concepts using techniques from statistics and machine learning. It is aimed at users who are not experts in data analytics, minimizing the number of parameters to configure out-of-the-box. We use three real-world datasets to demonstrate how StreamStory can be used to perform three main visual analytics tasks: identify the main states of a complex system and map them back to data-specific concepts, find high-level and long-term periodic behavior and traverse the scales to identify which scales exhibit interesting phenomena. We find and interpret several known, as well as previously unknown patterns in these datasets.
Article
Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Article
The rapid development of information technology paved the way for the recording of fine-grained data, such as stroke techniques and stroke placements, during a table tennis match. This data recording creates opportunities to analyze and evaluate matches from new perspectives. Nevertheless, the increasingly complex data poses a significant challenge to make sense of and gain insights into. Analysts usually employ tedious and cumbersome methods which are limited to watching videos and reading statistical tables. However, existing sports visualization methods cannot be applied to visualizing table tennis competitions due to different competition rules and particular data attributes. In this work, we collaborate with data analysts to understand and characterize the sophisticated domain problem of analysis of table tennis data. We propose iTTVis, a novel interactive table tennis visualization system, which to our knowledge, is the first visual analysis system for analyzing and exploring table tennis data. iTTVis provides a holistic visualization of an entire match from three main perspectives, namely, time-oriented, statistical, and tactical analyses. The proposed system with several well-coordinated views not only supports correlation identification through statistics and pattern detection of tactics with a score timeline but also allows cross analysis to gain insights. Data analysts have obtained several new insights by using iTTVis. The effectiveness and usability of the proposed system are demonstrated with four case studies.
Article
This study uses a longitudinal dataset extracted from a mobile news application and adopts a multilevel design to examine the evolution of diversity of individuals’ news consumption and to identify the factors that underlie such evolution. A decreasing trend in news consumption diversity is observed among users. The news consumption diversity of individuals is positively related to global information diversity. Furthermore, the news consumption diversity of males exhibits a stronger tendency to be influenced by global information diversity than that of females. The decreasing trend in news consumption diversity is less remarkable among males than among females. This study complements traditional motivation-driven perspectives of news consumption by mining the structural antecedents of news consumption diversity and further emphasizes the social implications of mobile news technology. Lastly, practical implications and limitations are discussed.
Article
Event sequence datasets with high event cardinality and long sequences are difficult to visualize and analyze. In particular, it is hard to generate a high level visual summary of paths and volume of flow. Existing approaches of mining and visualizing frequent sequential patterns look promising, but have limitations in terms of scalability, interpretability and utility. We propose CoreFlow, a technique that automatically extracts and visualizes branching patterns in event sequences. CoreFlow constructs a tree by recursively applying a three-step procedure: rank events, divide sequences into groups, and trim sequences by the chosen event. The resulting tree contains key events as nodes, and links represent aggregated flows between key events. Based on CoreFlow, we have developed an interactive system for event sequence analysis. Our approach can compute branching patterns for millions of events in a few seconds, with improved interpretability of extracted patterns compared to previous work. We also present case studies of using the system in three different domains and discuss success and failure cases of applying CoreFlow to real-world analytic problems. These case studies call forth future research on metrics and models to evaluate the quality of visual summaries of event sequences.
Article
Massive public resume data emerging on the internet indicates individual-related characteristics in terms of profile and career experiences. Resume Analysis (RA) provides opportunities for many applications, such as recruitment trend predict, talent seeking and evaluation. Existing RA studies either largely rely on the knowledge of domain experts, or leverage classic statistical or data mining models to identify and filter explicit attributes based on pre-defined rules. However, they fail to discover the latent semantic information from semi-structured resume text, i.e., individual career progress trajectory and social-relations, which are otherwise vital to comprehensive understanding of people’s career evolving patterns. Besides, when dealing with large numbers of resumes, how to properly visualize such semantic information to reduce the information load and to support better human cognition is also challenging. To tackle these issues, we propose a visual analytics system called ResumeVis to mine and visualize resume data. First, a text mining-based approach is presented to extract semantic information. Then, a set of visualizations are devised to represent the semantic information in multiple perspectives. Through interactive exploration on ResumeVis performed by domain experts, the following tasks can be accomplished: to trace individual career evolving trajectory; to mine latent social-relations among individuals; and to hold the full picture of massive resumes’ collective mobility. Case studies with over 2,500 government officer resumes demonstrate the effectiveness of our system.
Article
Although basketball games have received broad attention, the forms of game reports and webcast are purely content-based cross-media: texts, videos, snapshots, and performance figures. Analytical narrations of games that seek to compose a complete game from heterogeneous datasets are challenging for general media producers because such a composition is time-consuming and heavily depends on domain experts. In particular, an appropriate analytical commentary of basketball games requires two factors, namely, rich context and domain knowledge, which includes game events, player locations, player profiles, and team profiles, among others. This type of analytical commentary elicits a timely and effective basketball game data visualization made up of different sources of media. Existing visualizations of basketball games mainly profile a particular aspect of the game. Therefore, this paper presents an expressive visualization scheme that comprehensively illustrates NBA games with three levels of details: a season level, a game level and a session level. We reorganize a basketball game as a sequence of sessions to depict the game states and heated confrontations. We design and implement a live system that integrates multi-media NBA datasets: play-by-play text data, box score data, game video data, and action area data. We demonstrate the effectiveness of this scheme with case studies and user feedbacks.
Article
We have created and made available to all a dataset with information about every paper that has appeared at the IEEE Visualization (VIS) set of conferences: InfoVis, SciVis, VAST, and Vis. The information about each paper includes its title, abstract, authors, and citations to other papers in the conference series, among many other attributes. This article describes the motivation for creating the dataset, as well as our process of coalescing and cleaning the data, and a set of three visualizations we created to facilitate exploration of the data. This data is meant to be useful to the broad data visualization community to help understand the evolution of the field and as an example document collection for text data visualization research.