当前位置:文档之家› 16 社区发现算法工作简介by_@sumnous_t

16 社区发现算法工作简介by_@sumnous_t


小小世界⺴网网络 Small World

1998,邓肯·瓦瓦茨(Duncan Watts)和斯蒂文文·斯特罗加茨 (Steven Strogatz),瓦瓦茨-斯特罗加茨模型(WS模型) 特征路径⻓长度短(两个节点的路径⻓长度的平均值) 高高集聚系数(一一个节点的集聚系数等于与它相连的节点中相 互连接的点对数与总点对数的比比值) 六度分割理论,一一百五十十法则
0 Old Methods
• • •
Clustering (Node similarity) Graph Cut (M groups, less edges intra-community) Modularity-Based Method(is NP-hard to optimize) [Newman, 2006] - Greedy - Simulated Annealing


什么
是社区?

A precise definition of what a “community” really is does not exist yet. One of the most widely accepted and used definitions is that given by Newman and Girvan (2004):
2 Local Expansion and Op5miza5on -­‐ OSLOM

OSLOM (Order Statistics Local Optimization Method) optimizes locally the statistical significance information of a cluster with respect to random fluctuation with Extreme and Order Statistics. It tests the statistical significance of a cluster with respect to a global null model. It can deal with weighted, directed edges, overlapping communities, hierarchies and dynamic communities. [Lancichinetti, 2011] worst-case complexity: O(n2)n and Op5miza5on -­‐ GCE
• •
GCE (Greedy Clique Expansion)! takes all maximum cliques as initial seeds to greedily expand the fitness function to find overlapping communities. [Lee, 2010] Greedy expansion complexity: O(|E|M), M is the number of cliques to be expanded. merge complexity: O(2(|C1|+|C2|)-1)(not sure) 最大大团问题(Maximum Clique Problem, MCP) NPcomplete
Web b We Web Web


研究背景与研究意义
研究背景: 复杂⺴网网络是复杂系统的抽象,现实中许多复杂系统都可以用用复杂⺴网网络的相关特性进行行 描述和分析。 图,⺴网网络中的节点表示示系统中的个体,边表示示个体之间的关系。 如,社会关系⺴网网络,万维⺴网网,食食物链,基因⺴网网,城市交通⺴网网络,电力力⺴网网,电路⺴网网。 对复杂⺴网网络的研究一一直是许多领域的研究热点,其中社区结构是复杂⺴网网络中得一一个普 遍特征,整个⺴网网络是由许多个社区组成的。

同一一社区内的节点与节点之间的连接很紧密,而而社区与社区之间的连接比比较稀疏。
图片片来源于⺴网网络
图片片来源于⺴网网络
• • • • • •
0 Old Methods! 1 Clique Percolation! 2 Local Expansion and Optimization! 3 Dynamical Algorithm! 4 Label Propagation Algorithm! 5 Other
/assets/publications/mapequationtutorial.pdf
4 Label Propaga5on Algorithm
• •
SLPA! is a general speaker-listener based information propagation process. [Xie, 2012] - set a memory for each node to store history labels - each neighbor of selected node(listener) randomly selects a label with probability proportional to the occurrence frequency of this label in its memory and sends the selected label to the listener - the listener adds the most popular label received to its memory - use threshold r to delete lower frequency seeing labels, and output communities

• •

凯文文⻉贝肯游戏(平均的“⻉贝肯数”是2.981,最大大的也仅仅是 8) Facebook六度分隔理论变为「四度」(4.74,7.21亿)

无无标度⺴网网络 Scale-free

一一个⺴网网络的度分布,是当随机地从⺴网网络中抽取一一个 节点时,与这个节点相连的节点数(叫做这个节点 的度)d 的概率分布。 无无尺度⺴网网络的度分布满足足幂律分布,也就是说d = k 的概率正比比于k 的某个幂次(一一般是负的):

• •
2 Local Expansion and Op5miza5on -­‐ EAGLE
• •
EAGLE! All maximal cliques is as initial communities, merged by maximum similarity -> dendrogram. The optimal cut on the dendrogram is determined by the extended modularity with a weight based on the number of overlapping memberships. [Shen, 2008] Extended Modularity:

• •
2 Local Expansion and Op5miza5on -­‐ LFM
• •
LFM! expands a community from a random seed node to form a natural community until fitness function is locally maximal. [Lancichinetti, 2009, New J. Phys.] fitness function:
社区发现算法工工作简介
!
- 机器学习算法班
@sumnous_t 2014.12.14
主要内容
! !
• •
社区发现算法的发展、简介 我的社区发现算法相关工工作
Web 1.0
Web 2.0
Web 3.0

⺴网网络的社交与信息连接度
WWW-以信息为中心心 Social Web-以人人为中心心 Semantic Web-让机器去理解⺴网网络上一一切数据、信息、内容的含义。
!

• •
where Oi is the number of communities to which node i belongs. O(n2+(h+n)s), where s is the number of maximal cliques, h is the number of pairs of maximal cliques which are neighbors.
• •
Girvan-Newman Algorithm (Betweenness, split) Spectral Method (在同一一个社区内的节点,它在拉普拉斯矩阵中的特征向量近似。将节
点对应的矩阵特征向量(与特征值和特征向量有关的都叫谱)看成空间坐标,将⺴网网络节点映射 到多维向量空间去,然后就可以运用用传统的聚类算法将它们聚集成社团。)
!


O(ncs2), where nc is the number of communities, s is the average size of communities, computation complexity is depended on parameter \alpha. worst-case complexity: O(n2)
! !

3 Dynamical Algorithm
• •
InfoMAP! The map equation framework
!

random walk: optimal compressing the information on the structure of the graph by optimizing a quality function, Minimum Description Length.[Rosvall, 2009]
相关主题