当前位置:文档之家› 数据驱动的常识理解方法

数据驱动的常识理解方法

Data Driven Approaches for Common Sense UnderstandingYanghua XiaoFudan UniversityKowledge Works at Fudan()Natural Language Understanding by KG1、Understanding bag of words (IJCAI2015)2、Understanding a set of entities3、Understanding verb phrase (AAAI2016)4、Understanding a concept (IJCAI 2106)5、Understanding short text (EMNLP2016)6、Understanding natural languages (IJCAI2016,VLDB2017)Knowledge Graph Construction1、IsA taxonomy completion (TKDE2017)2、Implicit isA relation inference (AAAI2017)3、Error isA correction (AAAI2017)4、Cross-lingual type inference(DASFAA2016)5、End-to-end knowledge harvesting6、Domain-specific knowledge harvestingKnowledgable Search/Recommendation1、Recommendation by KG (WWW2014、DASFAA2015)2、User profiling by KG (ICDM2015、CIKM2015)3、Categorization by KG (CIKM 2015)4、Entity suggestion with conceptual explanation5、Entity search by long concept queryBig Graph Management1、Big graph systems(SIGMOD12)2、Overlapping community search (SIGMOD2013)3、Local Community search (SIGMOD2014)4、Big graph partitioning (ICDE2014)5、Shortest distance query (VLDB2014)6、Fast graph exploration (VLDB 2016)Graph Analytic1、Models for symmetry (Physical Review E 2008)2、Graph Simplification (Physical Review E 2008)3、Complexity/distance measurement (Pattern Recognition 2008, Physica A 2008)4、Graph Index Compression (EDBT2009)5、Graph anonymization (EDBT2010)Research Outline-DBPedia CN-DBpedia is an effort to extractstructured information from Chinese encyclopedia sites, such as Baidu Baike, and make this information available on the Web. CN-DBpedia allows you to ask sophisticated queries against Chinese encyclopedia sites, and to link the different data sets on the Web to Chinese encyclopedia sites data2. Probase Plus Probase is a web-scale taxonomythat contains 10 millions of concepts/entities and 16 millions of isA relations. In addition, ProbasePlus is a updatedtaxonomy that has more isA relations inferred from the original Probase. They are useful for conceptualization, reasoning, etc3. Verb BaseVerb pattern is a probabilistic semantic representation on verbs. We introduce verb patterns to represent verbs’ semantics, such that each pattern corresponds to a single semantic of the verb. We constructed verb patterns with theconsideration of their generality and specificity.Knowledge Graph ServiceCommon Sense UnderstandingLily will hold a birthday party. Mary wonders if Lily likes a kite. Mary shakes her piggy bank. There is no sound.Person 1Activity Gift Money Person 2Person 1SoundPerson 2Sentence-Based HMMActivity Gift require Person 1Person 2friend Gift Money require Money Soundmake Conceptualize CommonSense4•Common sense understanding is critical for language understanding•Common sense knowledge•human cannot fly•the sun rises from the east•the object will fall to ground without any supportChallenge of common sense understanding•Common sense knowledge is implicit•No one will mention it explicitly in texts or other media•Common sense knowledge is sparse•No source to extract•Huge human cost in hand-crafted KB•Language understanding relies extensively more commonsense knowledge(iceberg)AI-CompleteOur new opportunity•Big dataCommon sense extractionCommon sense inferenceCommon sense determinationCommon sensereasoningInference by Collaborative Filtering •“car” and “automobile” aresynonyms•They should share hypernyms•“automobile” should beA“wheelbasevehicle”•Missing isA relation hurts theunderstanding the concepts ofentities•Is Lincoln zephyr a car?Solution and Results •Concepts with similar meanings tend toshare hypernyms/hyponyms in an isA taxonomy•To find missing hypernyms for a concept c •First find c’s synonyms and siblings•Then we transport their hypernyms to cInference by Transitivity •We can use transitivity to find many common sense facts•Example 1•But it is not trivial, there are wrong cases •Example 2 & 3•If we can determine in which cases transitivity hold, we can generate many missing isA relations omy is taken for granted,that is,given hyponym(A,B)an hyponym(B,C),we know hyponym(A,C)(Sang2007),a shown in Example1.Transitivity is thus one of the corne stones in knowledge-based inferencing,and many applic tions rely on transitivity(e.g.,finding all the super concep of an instance).Example1Is Einstein a scientist?hyponym(einstein,physicist)hyponym(physicist,scientist))hyponym(einstein,scientist) Unfortunately,transitivity does not always hold in dat driven lexical taxonomies.Let us consider the following tw examples:Example2Is Einstein a profession?hyponym(einstein,scientist)hyponym(scientist,profession);hyponym(einstein,profession)Example3Is a car seat a piece of furniture? hyponym(car seat,chair)hyponym(chair,furniture);hyponym(car seat,furniture)It is obvious that Einstein is not a profession.Howeve in a data-driven lexical taxonomy such as Probase,we hav strong evidence that hyponym(einstein,scientist and hyponym(scientist,profession).If transitivi holds,we will draw a conclusion that conflicts with commo sense.As for car seat and furniture,we are trappe in a similar situation.Thus,it is clear that transitivity doe not always hold in data-driven lexical taxonomies.omy is taken for granted,that is,given hyponym(A,B)an hyponym(B,C),we know hyponym(A,C)(Sang2007),a shown in Example1.Transitivity is thus one of the corne stones in knowledge-based inferencing,and many applic tions rely on transitivity(e.g.,finding all the super concep of an instance).Example1Is Einstein a scientist?hyponym(einstein,physicist)hyponym(physicist,scientist))hyponym(einstein,scientist) Unfortunately,transitivity does not always hold in dat driven lexical taxonomies.Let us consider the following tw examples:Example2Is Einstein a profession?hyponym(einstein,scientist)hyponym(scientist,profession);hyponym(einstein,profession)Example3Is a car seat a piece of furniture? hyponym(car seat,chair)hyponym(chair,furniture);hyponym(car seat,furniture)It is obvious that Einstein is not a profession.Howeve in a data-driven lexical taxonomy such as Probase,we hav strong evidence that hyponym(einstein,scientist and hyponym(scientist,profession).If transitivi holds,we will draw a conclusion that conflicts with commo sense.As for car seat and furniture,we are trappe in a similar situation.Thus,it is clear that transitivity doe not always hold in data-driven lexical taxonomies.One way out of this dilemma is to enforce word sens disambiguation,just as WordNet does.For example,wCommon Sense Inference for Long-Tail or Emerging Entities•Objective: For an entity E , gather knowledge from existing knowledge bases, and enrich information from related entities or categories.PV pair DeveloperApple IncType Device software Categories IOS (Apple)IPhone CSKNoneiPhone 7Common sense fact determination•Goal§Can penguin fly?11Feature 1§Model1: Full Paths (Paths w/ Entities): x -[p 1] -> c -[p 2]-> y §Model2: Meta Paths (Paths w/o Entities): x -[p 1] -> [p 2] -> y §Feature 2§Why penguin cannot fly?§If penguin could fly….Behavior #InstancesModel1Model2fly 1030.730.75sleep 210.60.6die 840.90.9eat 3611burn 320.830.83walk 280.50.5grow 220.670.67run 2311work 2511breathe 240.750.75swim 260.670.67think 2510.75drive_car 2110read 1810.33sing 170.50.5SUM/AVG5050.810.68Human-like Common Sense Reasoning •Beijing is the capital of China,which implies that•Area: Beijing < China•Population: Beijing < China•LocatedIn: Beijing LocatedIn China•We formalize the problem to a classification problem: whether the binary relations holds•To understand different sentence representations with similar semantics, we leverage RNN-LSTM model.•Input layer: context c•Embedding layer•LSTM layer•Output layer: binary relation label (R)Conclusion•Common sense knowledge is not mentionedexplicitly in texts or other media •Common sense understanding is critical forlanguage understanding and cognition of the world•We have new opportunity because we have big data now•Data driven approaches is promising in common sense understandingThanks。

相关主题