Social Network Analysis | Chapter 9 | Graph Representation Learning | Part 2

Name: Social Network Analysis | Chapter 9 | Graph Representation Learning | Part 2
Uploaded: 2022-04-26T12:24:25.000Z
Duration: 2 h 28 min 11 s
Description: This is supplementary material for the book Social Network Analysis by Dr. Tanmoy Chakraborty. Book Website: https://social-network-analysis.in/ Available for purchase at: https://www.amazon.in/Social-Network-Analysis-Tanmoy-Chakraborty/dp/9354247830

Graph Information Learning: Overview and Techniques

Introduction to Graph Information Learning

The discussion continues on graph information learning, focusing on mapping various entities of a graph (nodes, edges, subgraphs) into an embedding space.

The embedding space can vary; it may be non-Euclidean, such as spherical or hyperbolic spaces. This flexibility is crucial due to the large size of graphs making computations on adjacency matrices challenging.

A condensed representation is sought to facilitate easier computation and understanding of graph structures.

Challenges in Graph Representation

Previous naive implementations like matrix factorization have limitations, particularly high computational costs associated with matrix saturation.

Graph dynamics are also a concern; social networks evolve over time with nodes and edges being added or removed frequently.

Paradigms in Graph Learning

Two paradigms will be discussed: representation learning on static graphs and leveraging random walks for better context understanding.

Random walk techniques previously covered in link prediction chapters will be revisited to explore their application in node context analysis.

Understanding Node Context through Random Walk

Importance of Context in Nodes

Random walks are essential for grasping the context surrounding a node. The first-order neighborhood structure provides some context but may not be sufficient alone.

A combination of first-order and second-order neighbors is suggested for a more comprehensive understanding of node relationships.

Addressing Similarities Between Distant Nodes

Capturing similarities between nodes that are far apart yet share similar neighborhood structures poses a challenge for standard techniques like LINE.

DeepWalk: A Baseline Approach

Overview of DeepWalk

DeepWalk was proposed in 2014 as a foundational method for graph representation learning. It focuses on generating embeddings for nodes (e.g., z_v , z_u ).

Embedding Similarity Measurement

The task involves ensuring that similar nodes have close embeddings within the vector space. Proximity should reflect actual network closeness.

Probability Estimation via Random Walk

Understanding Random Walks and Node Embeddings

Introduction to Random Strategies

The discussion begins with the concept of estimating the probability of reaching a node b from a starting node u using a random strategy.

Different types of random walks are introduced, including uniform random walks where all neighbors have equal probabilities for movement.

Types of Random Walks

A biased random walk is described, where movement preferences can be established (e.g., favoring certain neighbors).

The idea of exploring depth versus breadth in random walks is mentioned as an alternative strategy.

Probability Estimation and Similarity

The goal is to compute probabilities that reflect the similarity between embeddings in an embedding space, focusing on cosine similarity or dot products.

An emphasis is placed on creating a d-dimensional vector representation that preserves node proximity in the original network.

Neighborhood Context and Multi-Sets

The context around nodes is captured through their neighborhoods, defined as multi-sets allowing repeated entries due to randomness in walks.

Various strategies (uniform or biased random walks) are employed to sample neighborhood structures effectively.

Maximizing Probabilities for Node Representations

For each node u , the probability of encountering its neighbors v is measured, aiming to maximize this likelihood.

This maximization translates into learning embeddings represented by vectors, which should reflect high similarity among neighboring nodes.

Optimization Process

Log-likelihood functions are utilized for optimization; maximizing probabilities equates to minimizing negative objectives.

A fixed-length random walk process is proposed as part of the methodology, with specific parameters determining walk length (e.g., 10 hops).

Conclusion on Embedding Learning

Understanding Hot Max and Probability Measurement

Introduction to Hot Max

The discussion begins with the concept of "Hot Max," which involves calculating the probability of v given z and u . This is expressed mathematically using an exponential function.

A simpler method for measuring this probability is introduced, emphasizing that it can be customized but the presented approach is straightforward.

Neighborhood Set and Optimization Function

For each node u , the neighborhood set denoted as N_ru is examined to optimize the function.

The negative log of a specific exponential expression is calculated for every neighbor v , ensuring normalization within a range of 0 to 1.

Complexity Challenges

A significant challenge arises due to the need to compute sums over all nodes, leading to a complexity of order V^2 . This requires fetching embeddings from all neighbors, which is time-consuming.

Introduction to Negative Sampling

The concept of negative sampling from NLP techniques like Word2Vec and GloVe is introduced as a solution to reduce computational complexity.

Instead of considering all nodes in the graph, negative sampling allows for approximating by selecting a sample from node set V .

Sampling Distribution

The distribution used for sampling can vary; it may be based on degree distribution or other topological properties.

By selecting k nodes (where k > |N_ru| ), more evidence about non-contextual nodes can be gathered without needing all nodes.

Objective Function Analysis

Logarithmic Transformation

The objective function focuses on transforming logarithmic expressions into manageable components: specifically, separating them into log terms.

Scalar values obtained from dot products are passed through a sigmoid function, yielding outputs between 0 and 1 while introducing non-linearity.

Balancing Contextual and Non-contextual Nodes

Instead of summing over all nodes, only sampled nodes are considered. As k to V, results approximate those when considering all nodes.

If no weightage is given to negative examples (i.e., if k = 0), only contextual similarities are evaluated. Thus, finding an optimal balance for k becomes crucial.

Final Expression and Random Walk Process

Final Expression Overview

The final expression incorporates insights from previous discussions on negative sampling strategies. It suggests maintaining sample sizes between 5 to 20 based on neighborhood size considerations.

Unbiased Random Walk Challenges

Improved Deep Work: Node to Vec

Introduction to Node to Vec

The discussion introduces an improved version of the deep work concept called "Node to Vec," which focuses on a random work process.

This method was published in KDD 2016 and aims to optimize the extraction of context by incorporating both local and global contexts.

Local vs. Global Context

The idea emphasizes that nodes close together with similar neighborhood structures should be mapped into the same embedding space.

Local context refers to immediate neighbors, while global context includes nodes that are farther away, highlighting the need for systematic incorporation of both in random work processes.

Random Walk Process

An example illustrates starting a random walk from node 'u' with a neighborhood size (nru) of three, distinguishing between local (first neighbors) and global contexts (further nodes).

To capture local information, breadth-first search (BFS) is used, while depth-first search (DFS) captures global information.

Biased Random Walk

Instead of unbiased random walks, a biased approach is proposed where parameters control exploration preferences between microscopic and macroscopic views.

Two key parameters are introduced:

p: Return probability indicating the likelihood of returning to the original node.

q: In-out parameter determining preference for BFS or DFS based on its value.

Control Parameters in Exploration

The probabilities during movement from one node to another depend on p and q values; higher q favors BFS while lower q favors DFS.

The relationship between p and return probabilities is explained, emphasizing how these parameters influence movement choices within the graph structure.

Summary of Node to Vec Methodology

Understanding Graph Neural Networks and Their Applications

Optimization Techniques in Graph Representation

The optimization of the objective function in graph representation is achieved through standard gradient descent methods, focusing on maximizing log likelihood.

These techniques can be applied to various tasks such as clustering and community detection, allowing for node representations that facilitate simple clustering methods like k-means.

Node Classification and Link Prediction

With known embeddings and labels of some nodes, models like logistic regression or SVM can be trained to classify other nodes based on their features.

Link prediction is possible by utilizing embeddings of two nodes (z_i and z_j), where functions combine these embeddings to represent potential links between them.

Limitations of Static Network Embeddings

Current embedding techniques are primarily suitable for static networks; they require re-running when network structures change over time.

The embeddings generated are task-independent, making it challenging to tune parameters based on specific downstream tasks like link prediction.

Transitioning to End-to-End Learning

To enhance the utility of embeddings, there is a need for an end-to-end training mechanism that aligns embedding parameters with final task losses.

Initial representations from techniques like DeepWalk can serve as preprocessing tools before applying more complex neural network strategies.

Introduction to Graph Neural Networks (GNN)

The discussion shifts towards graph neural networks (GNN), highlighting its growing importance in research within machine learning programs.

GNN incorporates advanced mechanisms inspired by fields such as computer vision and NLP, indicating a rich area for exploration in current literature.

Understanding Embedding Spaces

The goal of embedding is to map graphs into an embedding space where similar nodes remain close together, facilitating easier problem-solving for downstream tasks.

An effective encoder must be developed that accurately represents node similarities within this embedding space.

Challenges with Existing Techniques

Understanding Graph Learning Challenges

Limitations of Current Methods

The discussed methods are limited as they require prior knowledge of the entire graph structure, making them ineffective for dynamic networks where nodes and edges change over time.

All previously mentioned methods are transductive, meaning they rely on knowing all data points and their labels during training, which restricts their applicability in real-time scenarios.

Transductive vs. Inductive Learning

In transductive learning, the model is trained with complete data points but only some labeled; it can utilize unlabeled data during training.

Inductive learning differs as it involves training on a subset of data while testing on an unknown dataset, emphasizing the need for models that can generalize beyond seen examples.

Incorporating Node Features

Current embedding methods do not explicitly consider node features (e.g., job or location), limiting their effectiveness in capturing relevant information.

Future discussions will focus on integrating node features into embedding techniques to create more robust inductive learning paradigms.

Introduction to Deep Graph Embeddings

A foundational understanding of deep learning concepts is recommended before delving into advanced topics like deep graph embeddings; starting with basic architectures such as perceptrons and CNNs is suggested.

Overview of Graph Convolutional Networks (GCN)

The first technique to be explored is Graph Convolutional Networks (GCNs), which apply convolutional operations similar to those used in image processing.

GCN utilizes kernels or filters that analyze local parts of graphs to produce condensed representations, akin to how images are processed through convolution layers.

Challenges with Graph Structures

Unlike images or sequences that have structured formats (like grids), graphs lack this uniformity, complicating the application of traditional convolutional techniques.

Graph Convolution Operations and Message Passing Paradigms

Understanding Convolution in Graphs

The concept of convolution operations is easily applicable to structured data, but it becomes complex when dealing with graphs. In graphs, a similar convolution operation is performed through neighborhood aggregation.

In image processing, a kernel focuses on a specific region (e.g., 3x3 matrix), aggregating information from surrounding cells to produce a single output.

The center of the region in images serves as the focal point for aggregation, combining information from both central and peripheral regions.

Central Nodes and Peripheral Regions in Graphs

In graph structures, every node can be considered as a center. The peripheral regions consist of neighboring nodes that provide additional context.

A defined region in graphs includes the central node and its neighbors, allowing for an aggregation of their representations to form a comprehensive understanding.

Message Passing Mechanism

The process resembles message passing: each node aggregates messages (embeddings) from its neighbors along with its own knowledge at time t .

At time t , nodes update their knowledge based on their embeddings and those received from neighbors. This iterative updating continues over time.

Each node independently undergoes this message-passing process, leading to updated aggregated knowledge representations across all nodes.

Challenges with Traditional Neural Networks

One might question why not use traditional neural networks like multi-layer perceptrons (MLPs). However, MLP approaches require order n parameters per node due to varying graph sizes.

For instance, if one graph has 100 nodes while another has 1000, the feature dimensions differ significantly, complicating direct comparisons or applications across different graphs.

Limitations of Adjacency Matrices

Using adjacency matrices poses challenges; swapping rows/columns alters input positions without changing the underlying graph structure—leading neural networks to misinterpret them as different inputs.

Introduction to Graph Convolutional Networks (GCNs)

Understanding Node Embeddings and Aggregation in Graphs

Initial Node Features

At time t = 0 , the embeddings can be represented as one-hot encodings or existing node features, establishing a baseline for further processing.

Message Passing and Aggregation

The process of updating embeddings involves aggregating node features. For a specific node (e.g., node A), its embedding is influenced by neighboring nodes (B, C, D).

Each node's embedding depends on its neighbors; for instance, A's embedding relies on B, C, and D while B’s depends on A and C.

Computational Graph Structure

The concept of a computational graph emerges where each node has a unique graph structure based on its neighbors. This allows for tailored aggregation processes.

Different nodes have distinct computational graphs; for example, the graph for A includes B, C, and D while the graph for B focuses on A and C.

Limitations of Multi-hop Aggregation

While multi-hop aggregation is possible, it becomes less effective as distance from the original node increases due to diminishing returns in relevant information.

The "six degrees of separation" principle suggests that beyond two hops (to three or four), most nodes are already encompassed within the network.

Aggregation Operations Overview

The black boxes in the computational graph represent aggregation operations where representations from neighboring nodes are combined to form new embeddings.

At different levels (0 to 2), only one type of operation is needed per layer—typically summation—which can be parameterized similarly to feedforward networks or CNN architectures.

Implementation of GCN Operations

In Graph Convolutional Networks (GCNs), an aggregation step involves summing up neighbor representations. For example, if considering neighbors A and C for node B.

Graph Convolutional Networks Explained

Understanding Linear Transformations in Graphs

A linear transformation is described as a one-layer feed-forward network, where a vector is multiplied by a trainable parameter matrix W_k .

The matrix W_k serves to project the neighborhood of nodes, while another trainable parameter matrix B_k is used for previous embeddings.

Keeping the weights for previous encodings and neighbor aggregations separate allows for more flexibility in emphasizing prior information over aggregated data.

Non-Linearity and Node Embeddings

After summing the transformed vectors, they are passed through a non-linearity (e.g., sigmoid), resulting in the embedding of node v at time k .

The matrices W_k and B_k are layer-dependent but shared across all nodes, maintaining fixed dimensions based on neighbors.

Matrix Representation and Neighborhood Information

The compact representation of node embeddings can be achieved through matrix formulations that include both updated embeddings and degree information.

To obtain neighborhood information, an adjacency matrix A , along with degree matrices, is necessary to encode relationships effectively.

Initialization Techniques

Initializations can utilize either one or two encoding methods or feature vectors to set up node embeddings before training begins.

Supervised Learning in Node Classification

The final task involves classifying nodes (e.g., toxic vs. non-toxic), with a loss function applied at the last layer to guide updates during backpropagation.

Cross entropy loss functions are commonly used; they measure discrepancies between predicted probabilities and actual labels during training.

Summary of Graph Convolutional Networks' Effectiveness

Understanding Graph Neural Networks

Introduction to Graph Representation in Learning

The discussion begins with the effectiveness of graph representations in transactive settings, highlighting their applicability in inductive learning. A specific representation called GraphSAGE is introduced, which will be elaborated on later.

Mechanisms of Information Aggregation

The current approach involves nodes aggregating information from all neighboring nodes with equal weightage, leading to a uniform and unbiased aggregation process.

Importance of Attention Mechanisms

It is proposed that not all neighbors hold equal importance for every task; thus, an attention mechanism could be developed to assign varying weights to different sets of neighbors based on their relevance.