Applying Markov Logics for Controlling Abox - TUBdok
Transcrição
Applying Markov Logics for Controlling Abox - TUBdok
Applying Markov Logics for Controlling Abox Abduction Vom Promotionsausschuss der Technischen Universität Hamburg-Harburg zur Erlangung des akademischen Grades Doktor Ingenieurin genehmigte Dissertation von Dipl.-Ing. Anahita Nafissi aus Shiraz, Iran 2013 Reviewers: Prof. Dr. Ralf Möller Prof. Dr. Bernd Neumann Day of the defense: 10.10.2013 Abstract Manually annotating the multimedia documents is a time-consuming and cost-intensive task. In this work, we define a media interpretation agent for automatically generating annotations for multimedia documents. Observations of the agent are given as surface-level information extracted by state-of-the-art media analysis tools. Based on background knowledge the agent interprets observations by computing high-level explanations. Observations and their explanations constitute the annotations of a media document. For this purpose, we investigate an abduction algorithm which computes explanations using a logic-based knowledge representation formalism. Multiple explanations might be possible for certain media content. Since the agent’s resources for computing explanations are limited, we need to control the abduction procedure in terms of branching of the computation process and “depth” of computed results, while still producing acceptable annotations. To control the abduction procedure, we employ a first-order probabilistic formalism. Kurzfassung Die manuelle Erstellung von Anmerkungen zu Multimedia-Dokumenten ist eine zeitund kostenintensive Aufgabe. In dieser Arbeit definieren wir einen Agenten zur Medieninterpretation, der automatisch Anmerkungen zu Multimedia-Dokumenten generiert. Die Beobachtungen des Agenten werden durch oberflächliche Informationen gegeben, die mit Hilfe von modernen Medien-Analyse-Werkzeugen extrahiert wurden. Auf der Basis von Hintergrundwissen interpretiert der Agent diese Beobachtungen durch die Herleitung von Erklärungen auf einer höheren Ebene. Beobachtungen und ihre Erklärungen bilden die Anmerkungen zu einem Multimedia-Dokument. Zu diesem Zweck untersuchen wir einen Abduktionsalgorithmus, der Erklärungen durch die Verwendung eines logikbasierten Wissensrepräsentationsformalismus bestimmt. Für bestimmte Medieninhalte können auch mehrere Erklärungen möglich sein. Da dem Agenten zur Bestimmung der Erklärungen nur begrenzte Ressourcen zur Verfügung stehen, müssen wir die Abduktionsprozedur in Bezug auf die Verzweigung des Rechenverfahrens und Tiefe der berechneten Ergebnisse beschränken, wobei immer noch akzeptable Anmerkungen generiert werden sollen. Um die Abduktionsprozedur zu beschränken, benutzen wir einen probabilistischen Formalismus erster Ordnung. iii iv To my dear parents v vi Acknowledgements It is a pleasure to pay tribute to those who made the completion of this thesis possible: I would like to express my deep and sincere gratitude to my supervisor Prof. Dr. Ralf Möller for giving me the opportunity to do research in this exciting field and supporting throughout this work. I would also like to thank Prof. Dr. Bernd Neumann for reviewing my thesis. Special thanks go to Maurice Rosenfeld and Björn Hass for evaluating the results. I am also very grateful to my colleagues at the Institute for Software Systems: Dr. Michael Wessel, Dr. Atila Kaya, Oliver Gries, Kamil Sokolski, Dr. Sofia Espinosa, Dr. Özgür Özçep, Tobias Näth, and Karsten Martiny. I would also like to thank all colleagues in the CASAM and PRESINT projects. Last but not least, sincere thanks to my family whose support, patience and encouragement enabled me to complete this thesis. vii viii Contents 1 Introduction 1.1 Motivation and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 2 Preliminaries 2.1 Description Logics, Queries, and Rules . . . . . . . . . 2.1.1 Syntax, Semantics, and Decision Problems . . . 2.1.2 Extensions: Queries, and Rules . . . . . . . . . 2.2 Probabilistic Representation Formalisms . . . . . . . . 2.2.1 Basic Notions of Graph and Probability Theory 2.2.2 Bayesian Networks . . . . . . . . . . . . . . . . 2.2.3 Markov Logic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 4 . . . . . . . 5 5 5 9 12 12 16 21 3 Abduction for Data Interpretation: Related Work 3.1 Perception as Abduction: The General Picture . . . . . . . . . . . . 3.2 Probabilistic Horn Abduction and Independent Choice Logic . . . . 3.3 Computing Explanations as Markov Logic Inference . . . . . . . . . 3.4 Preference Models Using Bayesian Compositional Hierarchies . . . . 3.5 DLLP-Abduction Based Interpretation . . . . . . . . . . . . . . . . 3.5.1 DLLP Abduction . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Ranking Simple Explanations . . . . . . . . . . . . . . . . . 3.6 The Need for Well-Founded Control for Sequential Abduction Steps . . . . . . . . . . . . . . . . 31 31 33 40 43 44 45 48 49 4 Probabilistic Control for DLLP-Abduction Based Interpretation 4.1 Abduction-based Interpretation Inside an Agent . . . . . . . . . . . 4.2 Controlling the Interpretation Process . . . . . . . . . . . . . . . . . 4.2.1 Controlling Branching . . . . . . . . . . . . . . . . . . . . . 4.2.2 Controlling Abduction Depth . . . . . . . . . . . . . . . . . 4.2.3 Controlling Reactivity . . . . . . . . . . . . . . . . . . . . . 4.3 Comparison with Independent Choice Logic . . . . . . . . . . . . . 4.4 Conversion of the Knowledge Base into ML Notation . . . . . . . . . . . . . . . . . . . . . . 51 51 60 60 61 65 66 67 ix CONTENTS 5 Evaluation 5.1 Optimization Techniques . . 5.2 Case Study: CASAM . . . . 5.3 Hypotheses and Results . . 5.4 Quality of the Interpretation . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusions 79 79 84 85 93 97 A Alchemy Knowledge Representation System and A.1 Alchemy Knowledge Representation Language . . A.2 Interfaces to Alchemy . . . . . . . . . . . . . . . . A.3 Inference Services . . . . . . . . . . . . . . . . . . A.4 Peculiarities of the Inference Engine Alchemy . . Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 103 105 106 108 B RacerPro and its Module for DLLP Abduction 115 C The DLLP-Abduction Based Interpretation System 117 Bibliography 118 x Chapter 1 Introduction 1.1 Motivation and Objectives The world wide web contains a huge amount of documents, and their number grows rapidly over time. To simplify retrieval processes for human users, in recent years several content-based search engines have been developed. Despite the fact that many irrelevant documents are shown, these engines are successful because humans can quite easily eliminate irrelevant entries in the retrieval result set. Nevertheless, contentbased retrieval is pushed to its limits. To further improve retrieval results, in particular also for automatic processes accessing multimedia documents, the semantics of the multimedia documents must be considered by retrieval processes. This is the issue discussed in the field of the semantic web. According to the idea of the semantic web, multimedia documents are analyzed by state-of-the-art analysis tools and subsequently are automatically interpreted in order to produce annotations to be attached to media documents. Semantics-based retrieval of multimedia documents is then based on these annotations. In previous work, interpretation of multimedia documents has been formalized by defining an interpretation agent which, in order to compute interpretations, uses an abduction algorithm for computing explanations for “observations” [MN08, EKM09]. Observations are seen as surface-level information extracted by state-of-the-art analysis tools. Using the abduction algorithm, the agent interprets observations by computing high-level explanations for (lower-level) observations. Observations and their explanations constitute the annotations of a media document. Usually, multiple explanations might be possible for certain media content. As a knowledge representation language a combination of description logic [BCM+ 03] and Horn logic [Llo87] has been used for defining the space of possible interpretations of media content in an operational way as in [MN08, Kay11, Esp11]. Although declarative abduction approaches have been proposed [DK02, LY02], fixing the number of ground literals that can be abduced, which is common to these approaches, is seen as a limitation because guessing ground literals in beforehand is considered to be hardly possible in our media interpretation context. 1 1.2 Contributions Since the agent’s resources for computing explanations are limited, we need to control the abduction procedure in terms of branching limitations as well as depth restrictions, while still producing acceptable annotations. The general procedure for information processing is an implementation of the architecture described in Robert Kowalski’s influential book on Computational Logic and Human Thinking [Kow11]. The focus of the work discussed in this doctoral thesis is on how to control abduction as part of Kowalski’s framework. In particular, besides computing new assertions about the world that explain the observations (or help realizing achievement goals in Kowalski’s framework), the main idea about why an agent should actually apply abduction is to reduce uncertainty about incoming observations (or about the realizability of achievement goals), and newly incoming assertions representing new knowledge available to the agent should increase uncertainty again, which, in turn, is then reduced by abductive reasoning [FK00, Pau93]. In this spirit, data interpretation, is seen as reduction of uncertainty. In this work, we investigate a first-order probabilistic formalism, Markov logic [DR07], for controlling abduction as an inference task used to formalize data interpretation. Note that Kowalski [Kow11] does not consider uncertainty in this way. We investigate how the interpretation process can deal with input data that are uncertain and may be even inconsistent, and explore how ranking of possible interpretations can be accomplished in terms of a probabilistic scoring function. Extending the approach described in [Kay11], in this work we would like to increase the expressivity of the knowledge representation language by supporting recursive Horn rules. Furthermore, we investigate ways to incrementally process input data, thereby coming up with interpretation results as early as possible. The main requirement of this work is that by explaining the observations successively, the ranks of the interpretation alternatives increase monotonically. Consequently, the idea is that interpretation increases the belief of an agent in its observations, and thus, interpretation provides a better basis for computing appropriate actions. 1.2 Contributions The scientific contributions of the thesis are summarized as follows: the thesis provides a formalization of first-order abduction-based media interpretation in terms of a probabilistic ranking principle for interpretation alternatives. In this context, interpretation is formalized using a set of weighted Horn rules in combination with a description logic knowledge base. The semantics for weights is defined in terms of Markov logic [DR07]. Interpretations are defined as explanations of ground formulas modeling observations of an agent. Explanations (and observations) are stored as annotations for media documents in order to better support semantics-based retrieval. The task of the interpretation engine is to generate deep-level information of multimedia documents according to the given surface-level information. The surface-level information extracted by state-of2 Introduction the-art analysis tools is supplied as incremental input to the probabilistic interpretation engine. We present solutions for branching and depth control problems in abductionbased interpretation engines, and we evaluate our probabilistic interpretation engine in a practical scenario. Besides media interpretation, the thesis also contributes to the understanding of software agents which incrementally process multimodal data in an incremental way. We propose the use of Markov logic [DR07] to define the motivation for the agent to generate interpretations, namely to reduce uncertainty of its observations. This principle can be seen as an implicit utility measure built into the agent’s architecture. Given a set of observations about which the agent might be uncertain or which might be even inconsistent given the agent’s knowledge base, we investigate how Markov logic can help to interpret the observations based on a maximal subset of observations that the agent assumes most-probably to be true. Interpretation of the agent’s observations is formalized as a first-order abduction process for computing explanations based on a set of interpretation rules (weighted Horn formulas) and an ontology (description logic axioms). Using a Markov logic based ranking principle, the agent architecture supports incremental processing of new observations such that by generating interpretations (explanations) over time the agent reduces its uncertainty, until then new observations are provided as input and the process starts over again. As new data becomes available, running interpretation processes are possibly be interrupted such that the agent can focus on most recent observations. This work has been supported by the European Commission as part of the CASAM project (Contract FP7-217061) and by the German Science Foundation as part of the PRESINT project (DFG MO 801/1-1). The following papers have been published up to now as part of this thesis: • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, K. Sokolski, and M. Wessel. A Probabilistic Abduction Engine for Media Interpretation based on Ontologies. In Pascal Hitzler and Thomas Lukasiewicz, editors, Proceedings of 4th International Conference on Web Reasoning and Rule Systems (RR-2010), September 2010. • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, K. Sokolski, and M. Wessel. A Probabilistic Abduction Engine for Media Interpretation based on Ontologies. In Thomas Lukasiewicz, Rafael Penaloza, and Anni-Yasmin Turhan, editors, Proceedings of International Workshop on Uncertainty in Description Logics (UnIDL2010), 2010. Project deliverable to which thesis has contributed have been submitted to the European Commission: • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, and K. Sokolski. Formalisms Supporting First-order Probabilistic Structures. CASAM project deliverable D3.1. Institute for Software Systems (STS), Hamburg University of Technology, 2008. 3 1.3 Outline of the Dissertation • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, and K. Sokolski. CASAM Domain Ontology. CASAM project deliverable D6.2. Institute for Software Systems (STS), Hamburg University of Technology, 2009. • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, K. Sokolski, and M. Wessel. Basic Reasoning Engine: Report on Optimization Techniques for First-Order Probabilistic Reasoning. CASAM project deliverable D3.2. Institute for Software Systems (STS), Hamburg University of Technology, 2009. • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, K. Sokolski, and M. Wessel. Probabilistic Abduction Engine: Report on Algorithms and the Optimization Techniques used in the Implementation. CASAM project deliverable D3.3. Institute for Software Systems (STS), Hamburg University of Technology, 2010. • O. Gries, R. Möller, A. Nafissi, M. Rosenfeld, K. Sokolski, and M. Wessel. MetaLevel Reasoning Engine, Report on Meta-Level Reasoning for Disambiguation and Preference Elicitation. CASAM project deliverable D3.4. Institute for Software Systems (STS), Hamburg University of Technology, 2010. 1.3 Outline of the Dissertation The remaining parts of the thesis are structured as follows. Chapter 2 explains the syntax and semantics of the Description Logic language ALHf − [BCM+ 03] as well as the signature of the knowledge base used in this work. Basic definitions of the probability theory used in this thesis are introduced. Moreover, probabilistic formalisms applied in this work are described and examples are given. In Chapter 3, we discuss related work on (probabilistic) abduction. Chapter 4 presents the probabilistic interpretation engine. We discuss how the interpretation procedure is performed, possibly based on inconsistency and uncertainty in the input data. We also introduce an approach for probabilistically ranking interpretation alternatives. For this purpose, we define a probabilistic scoring function according to the Markov logic formalism [DR07]. Furthermore, we present a media interpretation agent which generates annotations for multimedia documents. We define an approach for improving abduction in terms of branching and depth control. Chapter 5 evaluates the proposed approach in a practical scenario, showing that abduction control based on probabilistic ranking interpretations results in plausible agent behavior in general, and provides for sensible media annotations to be generated in the particular application scenario. The last chapter, Chapter 6, summarizes this thesis and provides an outlook to the future work. 4 Chapter 2 Preliminaries In this chapter, we explain the syntax and semantics of the description logic knowledge representation language ALHf − . We also define basic definitions of probability theory and describe probabilistic formalisms applied in this work. 2.1 Description Logics, Queries, and Rules Description logics (DLs) [BCM+ 03] are defined as decidable fragments of first-order logic [RN03] with varying levels of expressivity. DLs provide well understood means to establish ontologies. Therefore they are also used as representation languages for the semantic web [BHS05]. Section 2.1 is mostly taken from [GMN+ 10b, GMN+ 10c, EKM09]. For specifying the ontology used to describe surface-level analysis results as well as deep-level interpretation results, in this work a less expressive description logic is applied to facilitate fast computations for typical-case inputs. We decided to represent the domain knowledge with the DL ALHf − (a restricted attributive concept language with role hierarchies, functional roles and a string-based concrete domain) [BCM+ 03]. The motivation for supporting only restricted existential restrictions is to provide a well-founded integration of the description logic part of the knowledge base with the probabilistic part (based on Markov logic networks [DR07], see Section 2.2.3) in the context of abduction using Horn rules (see below for details). In addition, the description logic inference system used in this thesis (RacerPro [HM01]) is known to empirically perform well for inference problems w.r.t. this fragment. 2.1.1 Syntax, Semantics, and Decision Problems In logic-based approaches, atomic representation units have to be specified. The atomic representation units are fixed using a so-called signature. A DL signature is a tuple S = (CN, RN, IN), where CN = {A1 , ..., An } is the set of concept names (denoting sets of domain objects) and RN = {R1 , ..., Rm } is the set of role names (denoting relations 5 2.1 Description Logics, Queries, and Rules between domain objects). The signature also contains a component IN indicating a set of individuals (names for domain objects). At this point, we present the signature S whose notions will be used later as an ontology in several examples. The set of the concept names CN is given as follows: CN = {CarEntry, CarExit, EnvConference, EnvP rot, M ovement, HumanHealth, Car, DoorSlam, Building, V ehicle, Audio, Applause, Rain, F lood, AirPurification, WaterPollution, AirP ollution, NoisePollution, TrafficJam, OilPollution, CityW ithAirP ollution, Env, EnvWorkshop, Energy, W inds, CityWithIndustry, CityWithTrafficJam, RenewableEnergy} where EnvConference, EnvP rot, Env, and EnvWorkshop indicate environmental conference, environmental protection, environment, and environmental workshop, respectively. Additionally, the set of the role names RN and the set of the individual names IN (also called individuals for brevity) are represented as follows: RN = {causes, hasObject, hasEffect, occursAt, hasT opic, adjacent, hasEvent, hasT heme, hasSubEvent, hasP art, hasLocation, EnergyT oW inds} IN = {c1 , c2 , ds1 , ds2 , ind42 , . . . , ind45 , hamburg, berlin} Just a fragment of the signature can be listed here for reasons of brevity. In order to relate concept names and role names to each other (terminological knowledge) and to talk about specific individuals (assertional knowledge), a knowledge base has to be specified. An ALHf − knowledge base ΣS = (T , A), defined with respect to a signature S, comprises a terminological component T (called Tbox ) and an assertional component A (called Abox ). In the following, we just write Σ if the signature is clear from context. A Tbox is a set of so-called axioms, which are restricted to the following form in ALHf − : 1. Subsumption A1 v A2 , R1 v R2 2. Disjointness A1 v ¬A2 3. Domain and range restrictions for roles ∃R.> v A, > v ∀R.A 4. Functional restriction on roles > v (≤ 1 R) 5. Local range restrictions for roles A1 v ∀R.A2 6. Definitions with value restrictions A ≡ A0 u ∀R1 .A1 u ... u ∀Rn .An 6 Preliminaries With axioms of the form shown in Item 1, concept (role) names can be declared to be subconcepts (subroles) of each other. An example for a concept subsumption axiom is: carEntry v M ovement With axioms of the form shown in Item 6., so-called definitions (with necessary and sufficient conditions) can be specified for concept names found on the left-hand side of the ≡ sign. In axioms, so-called complex concepts are used. Complex concepts are concept names or expressions of the form > (anything), ⊥ (nothing), ¬A (atomic negation), (≤ 1 R) (role functionality), ∃R.> (limited existential restriction), ∀R.A (value restriction) and (C1 u ... u Cn ) (concept conjunction) with Ci being complex concepts. Concept names and complex concepts are also called concepts, and role names are also called roles for brevity. At this point, a Tbox T is given which, among subsumption axioms, contains disjointness axioms as well as domain and range restrictions. This Tbox will be applied later in this work: Car v V ehicle DoorSlam v Audio CarEntry v M ovement CarExit v M ovement Car v ¬DoorSlam Car v ¬Audio V ehicle v ¬DoorSlam V ehicle v ¬Audio CarEntry v ¬CarExit ∃causes.> v Car > v ∀causes.DoorSlam Table 2.1: An example for a Tbox T Knowledge about individuals is represented in the Abox part of Σ. An Abox A is a set of expressions of the form A(a) or R(a, b) (concept assertions and role assertions, respectively) where A stands for a concept name, R stands for a role name, and a, b stand for individuals. Aboxes can also contain equality (a = b) and inequality assertions (a 6= b). We say that the unique name assumption (UNA) is applied, if a 6= b is assumed for all pairs of individuals a and b. In the following, an example for an Abox is given: 7 2.1 Description Logics, Queries, and Rules Car(c1 ) DoorSlam(ds1 ) causes(c1 , ds1 ) CarEntry(ind42 ) hasObject(ind42 , c1 ) hasEffect(ind42 , ds1 ) Car(c2 ) DoorSlam(ds2 ) causes(c2 , ds2 ) CarExit(ind43 ) hasObject(ind43 , c2 ) hasEffect(ind43 , ds2 ) Table 2.2: An example for an Abox A In order to understand the notion of logical entailment, we introduce the semantics of ALHf − . In DLs such as ALHf − , the semantics is defined with interpretations I = (4I , ·I ), where 4I is a non-empty countable set of domain objects (called the domain of I) and ·I is an interpretation function which maps individuals to objects of the domain (aI ∈ 4I ), atomic concepts to subsets of the domain (AI ⊆ 4I ) and roles to subsets of the cartesian product of the domain (RI ⊆ 4I × 4I ). The interpretation of arbitrary ALHf − concepts is then defined by extending ·I to all ALHf − concept constructors: >I ⊥I (¬A)I (≤ 1 R)I (∃R.>)I (∀R.C)I (C1 u ... u Cn )I = = = = = = = 4I ∅ 4I \ AI {u ∈ 4I | (∀v1 , v2 ) [((u, v1 ) ∈ RI ∧ (u, v2 ) ∈ RI ) → v1 = v2 ] {u ∈ 4I | (∃v) [(u, v) ∈ RI ]} {u ∈ 4I | (∀v) [(u, v) ∈ RI → v ∈ C I ]} C1I ∩ ... ∩ CnI A concept C is satisfied by an interpretation I if C I 6= ∅, analogously for roles. I is called a model for C in this case. A concept inclusion C v D (concept definition C ≡ D) is satisfied in I, if C I ⊆ DI (resp. C I = DI ) and a role inclusion R v S (role definition R ≡ S), if RI ⊆ S I (resp. RI = S I ). Similarly, assertions C(a) and R(a, b) are satisfied in I, if aI ∈ C I resp. (a, b)I ∈ RI . If an interpretation I satisfies all axioms of T resp. assertions in A, it is called a model of T resp. A. If it satisfies both T and A, it is called a model of Σ. Finally, if there is a model of Σ (i.e., a model for T and A), then Σ is called satisfiable. We are now able to define the entailment relation |=. A DL knowledge base Σ logically entails an assertion α (symbolically Σ |= α) if α is satisfied by all models of 8 Preliminaries Σ. For an Abox A, we say Σ |= A if Σ |= α for all α ∈ A. If α is of the form C(i) then i is said to be an instance of C. Closed-World Assumption (CWA): Databases employ the closed-world assumption where only the positive facts are given and consequently, negative facts are implicit. In addition, based on the closed-world-assumption, a database is assumed to be complete with respect to the constants of the domain [Rei87]. The same idea can also be applied to knowledge bases. Let us consider a knowledge base Σ. An assertion ¬C(i) is considered as satisfied in all models if C(i) is not entailed by Σ. Σ |=CWA ¬C(i)if Σ 2CWA C(i) (2.1) Thus, i is found to be an instance of ¬C under the closed-world assumption. This is relevant for query answering, which is discussed in the next subsection. Before discussing this, we mention the open-world assumption. Open-World Assumption (OWA): Description logics employ the open-world assumption [BCM+ 03]: Given a knowledge base Σ, answering queries in description logics only involves axioms/assertions which are entailed by Σ, and the number of domain objects is unbounded. The open-world assumption is useful in order to avoid problems of non-monotonicity in description logics. 2.1.2 Extensions: Queries, and Rules In order to define queries and rules, a few definitions are required. Section 2.1.2 is mostly taken from [EKM09]. A variable is a string of characters of the form {A. . .Z}{a. . .z}∗ . In the following definitions, we denote variables or places where variables can appear with uppercase letters. Let V be a set of variables, and let X, Y1 , . . . , Yn be sequences h. . .i of variables from V . The notation z denotes a sequence of individuals. We consider sequences of length 1 or 2 only if not indicated otherwise, and assume that (hXi) is to be read as (X) and (hX, Y i) is to be read as (X, Y ) etc. Furthermore, we assume that sequences are automatically flattened. A function as set turns a sequence into a set in the obvious way. A variable substitution σ = [X ← i, Y ← j, . . .] w.r.t. a specified Abox is a mapping from variables to individuals mentioned in the Abox. The application of a variable substitution σ to a sequence of variables hXi or hX, Y i is defined as hσ(X)i or hσ(X), σ(Y )i, respectively, with σ(X) = i and σ(Y ) = j. In this case, a sequence of individuals is derived. If a substitution is applied to a variable X for which there is no mapping X ← k in σ then the result is undefined. A substitution for which all required mappings are defined is called admissible (w.r.t. the context). 9 2.1 Description Logics, Queries, and Rules Grounded Conjunctive Queries Let X, Y1 , . . . , Yn be sequences of variables, and let Q1 , . . . , Qn denote concept or role names. A query is defined by the following syntax: {(X) | Q1 (Y1 ), . . . , Qn (Yn )}. The sequence X may be of arbitrary length, but all variables mentioned in X must also appear in at least one of the Y1 , · · · , Yn : as set(X) ⊆ as set(Y1 ) ∪ · · · ∪ as set(Yn ). Informally speaking, Q1 (Y1 ), . . . , Qn (Yn ) defines a conjunction of so-called query atoms Qi (Yi ). The list of variables to the left of the sign | is called the head and the atoms to the right are called the query body. Answering a query with respect to a knowledge base Σ means finding admissible variable substitutions σ such that Σ |= {σ(Q1 (Y1 )), . . . , σ(Qn (Yn ))} We say that a variable substitution σ = [X ← i, Y ← j, . . .] introduces bindings i, j, . . . for variables X, Y, . . .. Substitutions are defined w.r.t. a specified Abox. Given all possible variable substitutions σ, the result of a query is defined as {(z) | z = σ(X)}. Note that variables range over named domain objects only, and thus the queries used here are called grounded conjunctive queries. Let us consider Abox A given in Table 2.2. Additionally, let us assume the next conjunctive query which is used to retrieve the causes relation among Car and DoorSlam instances: q1 = {(X, Y ) | Car(X), causes(X, Y ), DoorSlam(Y )} The answer to the query q1 is: {(c1 , ds1 ), (c2 , ds2 )} For answering the query q1 , we have the following substitutions which define the bindings for X, and Y : X ← c1 , Y ← ds1 X ← c2 , Y ← ds2 A Boolean query is a query with X being of length zero. If for a Boolean query there is a variable substitution σ such that Σ |= {σ(Q1 (Y1 )), . . . , σ(Qn (Yn ))} holds, we say that the query is answered with true, otherwise the answer is false. Let us consider again Abox A given in Table 2.2. An example for a Boolean query is: q2 = {() | CarEntry(X), hasEffect(X, Y ), DoorSlam(Y )} For answering the query q2 , we have the following substitutions which define the bindings for X, and Y : X ← ind42 , Y ← ds1 10 Preliminaries Since for the query q2 the variable substitution σ exists, the query is answered with true. Later on, we will have to convert query atoms into Abox assertions. This is done with the function transform. The function transform applied to a set of query atoms {γ1 , . . . γn } is defined as: {transform(γ1 , σ), . . . , transform(γn , σ)} where transform(P (X), σ) := P (σ(X)). Answering grounded conjunctive queries is supported by DL reasoners, e.g., RacerPro [HM01], KAON2 [HMS04], and Pellet [SP06]. Rules A rule r has the following form P (X) ← Q1 (Y1 ), . . . , Qn (Yn ) (2.2) where P, Q1 , . . . , Qn denote concept or role names with the additional restriction (safety condition) that as set(X) ⊆ as set(Y1 ) ∪ · · · ∪ as set(Yn ). Rules are used to derive new Abox assertions, and we say that a rule r is applied to an Abox A. Let S be the set of substitutions σ such that the answer to the query {() | Q1 (σ(Y1 )), . . . , Qn (σ(Yn ))} is true with respect to Σ∪A.1 The function call apply(Σ, P (X) ← Q1 (Y1 ), . . . , Qn (Yn ), A) returns a set of Abox assertions [ {σ(P (X))} σ∈S for all σ ∈ S. The application of a set of rules R = {r1 , . . . rn } to an Abox is defined as follows: [ apply(Σ, R, A) = apply(Σ, r, A) r∈R The result of forward chain(Σ, R, A) is defined to be ∅ if apply(Σ, R, A) ∪ A = A holds. Otherwise the result of forward chain is determined by the recursive call apply(Σ, R, A) ∪ forward chain(Σ, R, A ∪ apply(Σ, R, A)). For some set of rules R we extend the entailment relation by specifying: (T , A) |=R A0 iff (T , A ∪ forward chain((T , ∅), R, A)) |= A0 1 (2.3) We slightly misuse notation in assuming (T , A) ∪ ∆ = (T , A ∪ ∆). If Σ ∪ A is inconsistent the result is well-defined but useless. 11 2.2 Probabilistic Representation Formalisms 2.2 Probabilistic Representation Formalisms 2.2.1 Basic Notions of Graph and Probability Theory In this section, we present basic notions of graph and probability theory in order to fix the nomenclature required to explain probabilistic knowledge representation formalisms based on graphs. The following definitions of graphs are based on the notations used in [Kna11]. Section 2.2.1 is taken from [GMN+ 08, GMN+ 09a, GMN+ 10c]. Graph: A graph G = (V , E ) is composed of a finite set of vertices V and a finite set of edges E ⊆ V × V . Thus, E = {(vi , vj ) | for some vi , vj ∈ V }, representing the fact that there are edges from vi to vj , respectively. G is said to be directed (Figure 2.1a), if for each vi , vj it holds that (vi , vj ) ∈ E does not imply (vj , vi ) ∈ E and, analogously, G is said to be undirected (Figure 2.1b), if for each vi , vj it holds that (vi , vj ) ∈ E implies (vj , vi ) ∈ E . Directed edges are depicted with arrows and undirected edges with simple lines. Figure 2.1a shows an example of a directed graph and its undirected counterpart. V = {v0 , . . . , v5 }, E = {(v0 , v2 ), (v1 , v0 ), (v1 , v3 ), (v2 , v4 ), (v2 , v5 ), (v4 , v1 )} for the directed graph, and the set of edges for the corresponding undirected graph results by adding all tuples to E needed in order to ensure symmetry). v0 v1 v3 v0 v2 v4 v1 v5 v3 (a) Directed graph v2 v4 v5 (b) Undirected graph Figure 2.1: Example of a directed and an undirected graph A path P in a directed graph G = (V, E) is a sequence v0 , ; v1 ; . . . ; vn of vertices from V where (vi , vi+1 ) ∈ E for all 0 ≤ i < n. An example for a path in Figure 2.1a is v0 ; v2 ; v4 ; v1 ; v3 . A cycle C is a path P = vi ; vj ; . . . ; vi which begins and ends with the same vertex. In Figure 2.1 in both graphs there is the cycle C1 = v0 ; v2 ; v4 ; v1 ; v0 . In the undirected graph there is the additional cycle C2 = v0 ; v1 ; v4 ; v2 ; v0 . A graph G is called cyclic if it contains at least one cycle, otherwise it is called acyclic. A function P arents : V → V maps a vertex v ∈ V to a set W = {w | (w, v) ∈ E}. Clearly W ⊆ V . In Figure 2.1a, P arents(v2 ) = {v0 }. 12 Preliminaries The basic notion of probabilistic knowledge representation formalisms is the socalled random experiment. In the following, we define the basic probability notations based on the notation used in [RN03]. Random Variable: A random variable X is a function assigning a value to the result of a random experiment. A random experiment is represented by a so-called sample space. Random variables are functions without arguments, and they return different values at different points of time. There are two types of random variables: discrete and continuous ones. Unlike continuous random variables, which provide a map to real numbers, discrete random variables provide a map to a finite number of distinct values. In this work, we consider only discrete random variables. Possible values of a random variable comprise the so-called domain of the random variable. A random variable X is called binary if dom(X) contains 2 values. It is called Boolean if dom(X) = {true, false}. Otherwise, it is called a multi-valued random variable. For instance, the domain of a discrete random variable W aterP ollution has the following values which indicate the water pollution level: dom(W aterP ollution) = {low, medium, high}. Event: An event is a set of possible outcomes of a random experiment, i.e., a subset ~ = {X1 , ..., Xn } be the ordered set of all random variables of the sample space. Let X ~ = ~x, is an assignment, i.e., of a random experiment. An atomic event, denoted by X a mapping definition, X1 = x1 , ..., Xn = xn for all random variables. Atomic events represent single outcomes for each random variable {X1 , ..., Xn }, and are also called (possible) worlds. For example, an atomic event from the above example using only the random variable W aterP ollution is W aterP ollution = low. With the obvious meaning (complex) events are denoted using propositional logic formulae involving multiple atomic events, possibly for different random variables. In case of an event with a Boolean random variable X, we write x as an abbreviation for X = true and ¬x as an abbreviation for X = false. Probability: A possible world can be associated with a probability value or probability for short. A (prior) probability or unconditional probability of an event X = x is the chance that random variable X takes value x in a random experiment. The probability value of an event is denoted as P (event) Mappings of events to probabilities (or assignment of probabilities to events) are specified with so-called probability assertions using the syntax P (event) = p where p is a real value between 0 and 1. Probability values must be assigned such that the Kolmogorov axioms [Kol50] hold. For instance, the probability of the event 13 2.2 Probabilistic Representation Formalisms X = x is denoted as P (X = x). Analogously for complex events. We use the term prior probability if no other information about random variable X is given. A set of probabilistic assertions is called a probabilistic knowledge base (with signa~ if X ~ contains all random variables mentioned in the assertions). For example, ture X P (OilP ollution = true) = 0.1 indicates that the probability of the event OilP ollution = true is 0.1. Probability Distribution: A total mapping from the domain of a random variable X to probability values [0, 1] is called a distribution. For distributions we use the notation P(X) or P(X1 , . . . , Xn ) if distributions are to be denoted for (ordered) sets of random variables. For specifying a distribution, probability assertions for all domain values must be specified, and the values pi (1 ≤ i ≤ m) with m = |dom(X)| must sum up to 1. P(X) = hp1 , . . . , pm iT For joint distributions, we have the following P(X1 , . . . , Xn ) = hp1 , . . . , pm iT with m = |dom(X1 ) × . . . × dom(Xn )| in this case. Let us consider an example. Assume the probability distribution: P(W aterP ollution) = h0.5, 0.3, 0.2i This means, the associated probabilities to the events are, respectively: P (W aterP ollution = low) = 0.5 P (W aterP ollution = medium) = 0.3 P (W aterP ollution = high) = 0.2 In other words, the latter three probability assertions state the same knowledge. Full Joint Probability Distribution: In case all random variables of a random experiment are involved in a distribution, we speak of a full joint probability distribution (JPD), otherwise the expression is said to denote a joint distribution or a marginal distribution (projection of the n-dimensional space of probability values to a lower-dimensional space with m dimensions). Let us assume n random variables X1 , . . . , Xn are specified in the signature of a probabilistic knowledge base. Consequently P(X1 , . . . , Xn ) is a full joint probability distribution. The expression P(X1 , . . . , Xm , Xm+1 = xm+1 , . . . , Xl = xl ) denotes an m-dimensional distribution with known values xm+1 , . . . , xl for random vari~ = ~e or even just ~e ables Xm+1 . . . Xl . In slight misuse of notation, we sometimes write E for these events (e stands for evidence). The fragment ~e need not necessarily be written at the end in the parameter list of P. 14 Preliminaries Conditional Probability: A conditional probability or posterior probability P (X = x|Y = y) is the probability of event X = x under the condition that event Y = y might take place P (Y = y) > 0. This is defined as follows: P (X = x|Y = y) = P (X = x ∧ Y = y) P (Y = y) (2.4) In distribution form we have P(X, Y ) P(Y ) P(X|Y ) = where a faction of distributions is to be read as a vector of fractions for all values from dom(X) × dom(X) computed as indicated above. The distribution is called conditional probability distribution. Continuing the environmental example, we consider the probability of F lood = true given Rain = true, which is indicated by: P (F lood = true|Rain = true) where F lood and Rain are binary random variables. This probability is determined by P (F lood = true|Rain = true) = P (F lood = true ∧ Rain = true) P (Rain = true) Combined forms are also possible for conditional probabilities: P(X1 , ..., Xm | ~e) is defined as: ~ ~e) P(X, (2.5) P (~e) The semantics and algebra of these expressions should be obvious given what is explained above. Probabilistic Inference Problems and their Algorithms: For a probabilistic knowledge base, formal inference problems are defined. We restrict our attention to the conditional probability query. A conditional probability query is a denotation for the joint distribution of a set of m random variables out of a set X of n random variables conditioned on ~e and is denoted with PX (x1 ∧ . . . ∧ xm | ~e) where vars(x1 , . . . , xm ) ∩ vars(~e) = ∅ and vars(x1 , . . . , xm ) ∪ vars(~e) ⊆ X with vars specified in the obvious way. In this context xi indicates Xi = xi . We also have the distribution form of the above query: PX (X1 , . . . , Xm | ~e). If the set of random variables X is known from the context, the subscript X is often omitted. For solving problems of this kind, mostly two approaches are discussed in the literature (e.g. [RN03]), exact inference and approximate inference. Exact algorithms for solving this problem work by summing out all hidden random variables [RN03]. Therefore, exact inference is known to be highly intractable even in 15 2.2 Probabilistic Representation Formalisms the case of large numbers of independence assumptions (see the next section). Since probability distributions applied to problems in the real world can be very complex, with probabilities varying greatly over a high-dimensional space, there may be no way to sensibly characterize such distributions analytically [Nea93]. Thus, the combinatorial combination of probability values provides for long runtimes in practice. With sampling algorithms it is possible to mitigate these kinds of problems. Instead of summing out all hidden random variables, the primitive element in any sampling algorithm is the generation of samples from a known probability distribution [RN03]. For example, instead of computing all possible outcomes of a complex experiment with coin throws, the idea is to ”flip” the coin itself a number of times. Even with sampling algorithms specifying a full joint distribution requires an exponential number of probabilistic assertions. In the next section, this problem is solved by exploiting another form of knowledge, namely probabilistic independence assertions, which are specified as P(Xi |Xi1 , . . . , Xik ) = P(Xi |Xj1 , . . . , Xjm ) with {i1 , . . . , ik } ⊆ {j1 , . . . , jm } ⊆ N \ {i} Even exact inference can be dramatically improved if independence assumptions are considered for solving inference problems. While this can be exploited for summingout techniques as well, it can also be exploited for reducing the number of probability values to be specified in a knowledge base. This is discussed in the next section in the context of so-called probabilistic knowledge representation formalisms. We will also discuss sampling techniques in this context. In following, we introduce two probabilistic knowledge representation formalisms, namely Bayesian networks [Pea88] and Markov logic [DR07]. Using several examples from the environmental domain, advantages and disadvantages of each formalism are discussed. 2.2.2 Bayesian Networks Bayesian networks [Pea88] are one of the frameworks for effectively answering queries with respect to probabilistic knowledge bases. They are used in many real-world applications including diagnosis, forecasting, automated vision, sensor fusion and manufacturing control. In the next sections, syntax and semantics of Bayesian networks are discussed. Section 2.2.2 is taken from [GMN+ 08]. The following discussion and notations are according to [RN03]. Syntax: A Bayesian network BN = (G, γ) is defined by a directed acyclic graph G = (V, E) and a function γ : V → T which maps a vertex v ∈ V to a conditional probability distribution Ti ∈ T . Note that a vertex v indicates a random variable Xi and an edge eij = (vi , vj ) represents a direct influence of a parent vertex on a child vertex vi . We use the function P arentsE (·) to denote the set {Y | (Y, X) ∈ E}, 16 Preliminaries and omit the subscript if the graph is clear from context. A conditional probability distribution Ti is given as P(Xi |P arents(Xi )). If P arents(Xi ) = ∅, then Ti specifies a prior probability. P(Xi |P arents(Xi )) is usually specified as a table γ(Xi ) (see below for examples). P(Xi |P arents(Xi )) is also called conditional probability table (CPT). The structure of a Bayesian network is determined by the insertion order of the vertices [RN03]. During the network construction, vertices are added to the network individually. After adding a vertex, conditional dependencies of the new vertex to the vertices of the current network are to be determined. If there are dependencies, incoming edges to the new vertex are added accordingly. Each incoming edge comes from a previously added vertex which has conditional dependency to the new vertex. On the other hand, for instance, two sibling nodes are conditionally independent given the value of the father node is known (for other examples, see below). Semantics: The semantics of a Bayesian network BN = (({X1 , . . . , Xn }, E), γ) can be seen in two different ways [RN03]: 1. The first semantics of a Bayesian network indicates that the structure of a Bayesian network shows the conditional independence relationships which hold among the variables in the domain. 2. Bayesian networks are representations of full joint probability distributions of domain variables {X1 , . . . , Xn } (Equation 2.6). PBN (X1 , . . . , Xn ) = n Y P(Xi |P arentsE (Xi )) (2.6) i=1 where P(Xi |P arentsE (Xi )) = γ(Xi ). The joint probability distribution of a set of variables is the product of their conditional probability distributions. But, in contrast to what the chain rule defines, the result simplifies dramatically in the presence of conditional independence assumptions. Example 1 In this example we consider two events which influence the human health, namely air pollution and noise pollution. One of the factors which affects air- and noise pollution is traffic jam. These relationships are modelled by the Bayesian network graph in Figure 2.2: 17 2.2 Probabilistic Representation Formalisms TrafficJam AirPollution NoisePollution HumanHealth Figure 2.2: Example of a Bayesian network. TrafficJam and HumanHealth are conditionally independent given AirPollution and NoisePollution therefore there is no link between them. Similarly AirPollution and NoisePollution are conditionally independent given TrafficJam. Figure 2.3 depicts the above Bayesian network with conditional probability distributions. The variables TJ, AP, NP and HH stand for TrafficJam, AirPollution, NoisePollution and HumanHealth, respectively. All random variables are binary. P (TJ = true ) 0.3 TJ TJ P (AP = true |TJ ) T 0.6 F 0.1 AP TJ P (NP = true | TJ ) T 0.7 F 0.1 AP NP P (HH = true | AP,NP ) T T 0.1 T F 0.3 F T 0.2 F F 0.6 NP HH Figure 2.3: Example of a Bayesian network with CPTs γ(X) associated. Since TJ has no parents, only prior probabilities are assigned to them. The probability of TJ=true is 0.3. Consequently, the probability of TJ=false is 0.7. Since AP has only one parent, its conditional probability distribution has two rows. For example, the first row means that the probability of AP=true is 0.6 if TJ is true, and analogously P (AP = true) is 0.1 in case TJ=false. HH has two parents, consequently its 18 Preliminaries conditional probability table has four rows. The first row means that the probability of HH=true is 0.1 if AP and N P are both true and so on. An example for an entry in the full joint distribution is P (¬hh ∧ ap ∧ np ∧ tj) which is computed based on Equation 2.6 as follows: P (¬hh ∧ ap ∧ np ∧ tj) = P (tj)P (ap|tj)P (np|tj)P (¬hh|ap ∧ np) = 0.3 × 0.6 × 0.7 × 0.9 = 0.1134 This example shows how a Bayesian network is constructed. Additionally, it demonstrates how a full joint probability is determined according to the conditional probability tables. Inference in Bayesian networks In this section, we discuss the problem of computing the (posterior) probability distri~ = ~e) for a so-called query variable X given a set of evidence variables bution P(X|E ~ = {E1 , E2 , ...} where ~e is a tuple of particular observed valand their values, i.e., E ~ there is a set of non-evidence variables Y~ = {Y1 , Y2 , ...} ues. In addition to X and E, which are considered in the solution of the inference problem. There are two main solutions for the inference problem, namely exact inference and approximate inference. Our discussion is taken from [RN03]. Exact inference: The exact inference solves the inference problem by the full joint distribution: X ~ = ~e) = α P(X, E ~ = ~e) = α ~ = ~e, Y~ = ~y ) P(X, E (2.7) P(X|E ~) ~ y ∈dom(Y where the summation is over all possible values of non-evidence (hidden) variables Y~ . In the above equation, α denotes a normalization constant. With this notation, the above equation can be written as a full joint distribution. This simple algorithm for exact query answering for a Bayesian network with n Boolean variables is exponential in the number of (hidden) variables. In general, probabilistic inference in Bayesian networks is NP-hard [Coo90], and this result shows that, in general, exact inference for large networks is intractable. In the following, an example for exact inference with the simple algorithm specified in Formula 2.7 is given for illustration purposes. Figure 2.4 depicts a Bayesian network where the relationships between rain, flood and air purification are given. In the conditional probability distributions R, F and AP stand for the corresponding random variables Rain, F lood and AirPurification, respectively, which are all binary variables. Since Flood and AirPurification are conditionally independent, there is no edge between them: 19 2.2 Probabilistic Representation Formalisms P (R = true ) Rain 0.6 F lood AirPurification R P (F = true | R ) R P (AP = true | R ) T 0.6 T 0.8 F 0.1 F 0.4 Figure 2.4: Example of a Bayesian network The next table depicts the full joint distribution for the three Boolean random variables where airPurif indicates airPurification: rain ¬rain f lood airPurif ¬airPurif 0.288 0.072 0.016 0.024 ¬f lood airPurif ¬airPurif 0.192 0.048 0.144 0.216 Table 2.3: A full joint distribution for Rain, Flood and AirPurification world In Equation 2.8, we compute the probability of Rain = true given F lood = true, where Rain is a query variable and F lood is an evidence variable: P (rain|f lood) = 0.288 + 0.072 P (rain ∧ f lood) = = 0.9 P (f lood) 0.288 + 0.072 + 0.016 + 0.024 (2.8) Similarly, we compute the probability of Rain=false given Flood = true: P (¬rain|f lood) = P (¬rain ∧ f lood) 0.016 + 0.024 = = 0.1 P (f lood) 0.288 + 0.072 + 0.016 + 0.024 (2.9) The term 1/P (Flood = true) in Equations 2.8 and 2.9 is a normalization constant, which causes the sum of the above probabilities to be set to one. The non-evidence variable in the above inference is AirPurification. If we use probability distributions, 20 Preliminaries the above equations can be written in a single equation: P(Rain|f lood) = α P(Rain, f lood) = α[P(Rain, f lood, airPurification) + P(Rain, f lood, ¬airPurification)] = α [h0.288, 0.016i + h0.072, 0.024i] = α [h0.36, 0.04i] = h0.9, 0.1i This example shows how the conditional probabilities in a Bayesian network are determined according to exact inference. Approximate Inference: Since the complexity of exact inference for large networks is very high, approximate inference methods have been developed. These methods utilize the generation of samples for the considered random variables. The accuracy of sampling methods depends on the number of samples. It means, generating more samples leads to higher accuracy and consequently the result converges to the result of exact inference. In the literature (e.g. [RN03]), many approximate inference methods for Bayesian networks are defined. Due to space constraints, we do not describe them in this work. Advantages and Disadvantages of Bayesian Networks Bayesian networks [Pea88] are one of the best-understood models for effectively representing the joint probability distribution of a domain. Using acyclic graphs, Bayesian networks provide a compact representation for conditional independence assumptions. Despite these interesting properties, Bayesian networks have the following disadvantages. If Bayesian networks are used for modeling causality, humans can understand local conditional probability distributions quite well. However, due to the fact that Bayesian networks are acyclic, modeling arbitrary joint probability distributions also poses quite substantial challenges for providing sensible values for (non-causal) dependencies and resulting CPTs. In Bayesian networks, it is not easily possible to refer to objects and their relations. For example, we cannot refer to the TrafficJam in a particular city like Hamburg or a Flood in Berlin. Bayesian networks of the kind discussed so far are inherently propositional. Many extensions of Bayesian networks have been proposed in the literature (see also [RN03] for entry points). Rather than introducing first-order Bayesian network modeling approaches (e.g. [KP97, Las08]), we now directly introduce an approach that avoids acyclicity restrictions and provides means for first-order modeling. 2.2.3 Markov Logic Networks The formalism of Markov logic [DR07] provides a means to combine the expressivity of first-order logic [RN03] augmented with the formalism of Markov networks [Pea88]. 21 2.2 Probabilistic Representation Formalisms The Markov logic formalism uses first-order logic to define “templates” for constructing Markov networks. Section 2.2.3 is mostly taken from [GMN+ 08,GMN+ 09a,GMN+ 10c]. A Markov logic network MLN = (FMLN , WMLN ) [DR07] consists of a sequence of first-order formulas FM LN = hF1 , ..., Fm i and a sequence of real number weights WM LN = hw1 , ..., wm i. The association of a formula to its weight is by position in the sequence. For a formula F ∈ FM LN with associated weight w ∈ WM LN we also write w F (weighted formula). Thus, a Markov logic network can also be defined as a set of weighted formulas. Both views can be used interchangeably. As a notational ~ Y~ instead of X ~ ∩ Y~ . convenience, for ordered sets we nevertheless sometimes write X, In contrast to standard first-order logics such as predicate logic, relational structures not satisfying a formula Fi are not ruled out as models. If a relational structure does not satisfy a formula associated with a large weight it is just considered to be quite unlikely the intended one. Let us consider the universally quantified formula: ∀x CityWithTrafficJam(x) → CityW ithAirP ollution(x) The above formula might be true in some relational structures, but might be false in others. By assigning a reasonable weight to a formula, a formula can become a “soft constraint” allowing some relational structures not satisfying this formula to be still considered as possible models (possible worlds). In other words, the relational structure in question corresponds to a world associated with a non-zero probability. Let C = {c1 , ..., cm } be the set of all constants mentioned in FM LN . A grounding of a formula Fi ∈ FM LN is a substitution of all variables in the matrix of Fi with constants from C. From all groundings, the (finite) set of grounded atomic formulas (also referred to as ground atoms) can be obtained. Grounding corresponds to a domain closure assumption. The motivation is to get rid of the quantifiers and reduce inference problems to the propositional case. The Markov logic network [DR07] is composed of a set of nodes and edges which are defined based on the formulas Fi and the constants in the knowledge base. In the following, it is explained how the nodes and edges of a Markov logic network are defined [DR07]: • One node for each possible grounding of each predicate appearing in MLN • One edge between two nodes if and only if the corresponding ground predicates appear together in a grounding of a formula Fi in MLN Since a ground atom can either be true or false in an interpretation (or world), it can be considered as a Boolean random variable X. Consequently, for each M LN ~ there is a set of possible worlds ~x. In this view, with associated random variables X, sets of ground atoms are sometimes used to denote worlds. In this context, negated ground atoms correspond to false and non-negated ones to true. We denote worlds using a sequence of (possibly negated) atoms. Let us assume the constant hamburg. 22 Preliminaries An example world specification using this convention is: hcityWithTrafficJam(hamburg), ¬cityW ithAirP ollution(hamburg)i So, there is traffic jam in Hamburg, but no air pollution. When a world ~x violates a weighted formula (does not satisfy the formula) the idea is to ensure that this world is less probable rather than impossible as in predicate logic. Note that weights do not directly correspond to probabilities (see [DR07] for details). For each possible world of a Markov logic network M LN = (FM LN , WM LN ) there is a probability for its occurrence. The weights associated with the formulae define probabilistic knowledge, i.e., the weights associated with the formulas induce a probability distribution over the derived ground atoms. In the formalism of Markov networks the full joint probability distribution of a Markov logic network M LN is specified in symbolic form as: ~ = (P (X ~ = ~x1 ), . . . , P (X ~ = ~xn )) PM LN (X) (2.10) ~ and for every possible ~xi ∈ {true, f alse}n , n = |X| ~ = ~x) := log linM LN (~x) P (X (2.11) For a motivation of the log-linear form, see, e.g., [DR07], where log lin being defined as |FM LN | X 1 exp ( wi ni (~x)) (2.12) log linM LN (~x) = Z i=1 According to this definition, the probability of a possible world ~x is determined by the exponential of the sum of the number of true groundings (computed with the function ni ) of formulas Fi ∈ FM LN in ~x, multiplied with their corresponding weights wi ∈ WM LN , and finally normalized with |FM LN | Z= X ~ ~ x∈X exp ( X wi ni (~x)), (2.13) i=1 the sum of the probabilities of all possible worlds. Thus, rather than specifying the full joint distribution directly in symbolic form as we have discussed before, in the Markov logic formalism, the probabilistic knowledge is specified implicitly by the weights associated with formulas. Determining these formulas and their weights in a practical context is all but obvious, such that machine learning techniques [LD07, DLK+ 08] are usually employed for knowledge acquisition. A conditional probability query for a Markov logic network M LN is the computation of the joint distribution of a set of m events involving random variables conditioned on ~e and is denoted by: PM LN (x1 ∧ . . . ∧ xm | ~e) (2.14) 23 2.2 Probabilistic Representation Formalisms where ~e indicates the evidence vector which contains a set of weighted and/or strict ground atoms of the predicates in the knowledge base. Note that the absence of a ground atom in the evidence vector means that the weight of this ground atom is zero. Consequently, the knowledge base in the Markov logic network [DR07] is not based on the closed world assumption. The semantics of this query is given as: Prand vars(M LN ) (x1 ∧ . . . ∧ xm | ~e) w.r.t. PM LN (rand vars(M LN )) where vars(x1 , . . . , xm ) ∩ vars(~e) = ∅ and vars(x1 , . . . , xm ) ⊆ rand vars(M LN ) The function rand vars is defined as follows: rand vars((F, W)) := {A(C) | A(C) is mentioned in some grounded formula F ∈ F} Grounding is accomplished w.r.t. all constants that appear in F where A denotes atomic concept or atomic role. An algorithm for answering queries of the above form is investigated in [GM10]. Example 2 Let us consider the next weighted formulas and a constant hamburg: 0.5 ∀x CityWithTrafficJam(x) ⇒ CityW ithAirP ollution(x) 0.5 ∀x CityWithIndustry(x) ⇒ CityW ithAirP ollution(x) Thus, rand vars((F, W)) is: rand vars((F, W)) = {CityWithTrafficJam(hamburg), CityW ithAirP ollution(hamburg), CityWithIndustry(hamburg)} Furthermore, let us assume that the evidence vector is: ~e = {CityWithIndustry(hamburg) = true} Thus, vars(~e) is: vars(~e) = {CityWithIndustry(hamburg)} Let us assume that the ground atoms denote Boolean random variables. The value of CityWithIndustry(hamburg) is fixed by evidence. Thus, there are four possible worlds W1 , . . . , W4 left. In the next table, the probability of each world is determined according to the Markov logic formalism where CWTJ , CWAP , CWI , and h indicate 24 Preliminaries CityWithTrafficJam, CityW ithAirP ollution, CityWithIndustry, and hamburg, respectively: Worlds Wi W1 W2 W3 W4 CWTJ (h) 0 0 1 1 CWAP (h) 0 1 0 1 CWI (h) 1 1 1 1 P (Wi ) exp(0.5)/Z = 0.20 exp(1)/Z = 0.34 1/Z = 0.12 exp(1)/Z = 0.34 Table 2.4: The possible worlds and their probabilities according to the Markov logic formalism. The normalization constant Z is determined according to Equation 2.13: Z = 1 + 2 × exp(1) + exp(0.5) As you can see in the above table, the sum of the worlds probabilities is equal to 1: P (W1 ) + . . . + P (W4 ) = 1 In the following, we extend the above example with the next formula: 1 ∀x [CityW ithAirP ollution(x) ⇒ ∃y[Adjacent(x, y) ∧ CityW ithAirP ollution(y)]] (2.15) Additionally, we consider the new constant berlin. Thus, the possible ground atoms which are the nodes of the Markov network are defined as follows: rand vars((F, W)) = {CityWithTrafficJam(h), CityW ithAirP ollution(h), CityWithTrafficJam(b), CityW ithAirP ollution(b), CityWithIndustry(h), CityWithIndustry(b) Adjacent(h, h), Adjacent(h, b), Adjacent(b, h), Adjacent(b, b)} where h and b denote hamburg and berlin, respectively. The possible groundings of Formula 2.15 are as follows: CityW ithAirP ollution(h) CityW ithAirP ollution(b) CityW ithAirP ollution(h) CityW ithAirP ollution(b) ⇒ ⇒ ⇒ ⇒ [Adjacent(h, h) ∧ CityW ithAirP ollution(h)] [Adjacent(b, h) ∧ CityW ithAirP ollution(h)] [Adjacent(h, b) ∧ CityW ithAirP ollution(b)] [Adjacent(b, b) ∧ CityW ithAirP ollution(b)] which leads to the following Markov network subgraph: 25 2.2 Probabilistic Representation Formalisms Figure 2.5: The Markov network subgraph of Example 2. The next figure depicts the complete Markov network graph of this example: Figure 2.6: The Markov network graph of Example 2. The above Markov network contains 10 binary ground atoms, and there are 210 worlds. Since the truth value of the ground atom CityWithIndustry(h) is known, the number 26 Preliminaries of possible worlds is reduced to 29 . A possible world for this example is: ~x = hcityWithTrafficJam(h), ¬cityW ithAirP ollution(h), ¬cityWithTrafficJam(b), cityW ithAirP ollution(b), cityWithIndustry(h), cityWithIndustry(b), ¬adjacent(h, h), adjacent(h, b), adjacent(b, h), adjacent(b, b)i This example shows how a Markov logic network is constructed according to a set of formulas and a set of constants. Additionally, it demonstrates how the probability of a world is determined. MAP Inference Problem in MLN The Maximum A Posteriori (M AP ) inference [SD05] returns the most-likely state of query atoms given the evidence. Based on the M AP inference the “most-probable world” given the evidence is determined as a set of events. The MAP inference problem given a distribution M LN for a set of random variables X = rand vars(M LN ) is formalized as follows: M APX (~e) := ~e ∪ argmax~y PM LN (~y | ~e) (2.16) where: vars(~y ) ∩ vars(~e) = ∅ vars(~y ) ∪ vars(~e) = X In the MLN, the definition of the M AP problem [DLK+ 08] is rewritten as follows. The conditional probability term P (~x|~e) is replaced with the Markovian formula: ! X 1 M APM LN (~e) := ~e ∪ argmax~y exp wi ni (~y , ~e) (2.17) Ze i Thus, for describing the most-probable world, M AP returns a set of events, one for each random variable used in the Markov network derived from M LN . In the above equation, ~x denotes the hidden variables, and Ze denotes the normalization constant which indicates that the normalization process is performed over possible worlds consistent with the evidence ~e. In the next equation, Ze is removed since it is constant and it does not affect the argmax operation. Similarly, in order to optimize the M AP computation the exp function is left out since it is a monotonic function and only its argument has to be maximized: X M APM LN (~e) := ~e ∪ argmax~y wi ni (~y , ~e) (2.18) i 27 2.2 Probabilistic Representation Formalisms The above equation shows that the M AP problem in Markov logic formalism is reduced to a new problem which maximizes the sum of weights of satisfied clauses [DLK+ 08]. Since the M AP determination in Markov networks is an NP-hard problem [Rot96], it is usually to be performed by approximate solvers in realistic domains. The most commonly used approximate solver is the MaxWalkSAT algorithm [KSJ97], a weighted variant of the WalkSAT [SKC96] local-search satisfiability solver. The MaxWalkSAT algorithm attempts to satisfy clauses with positive weights and keeps clauses with negative weights unsatisfied [KSR+ 10]. In this work, we apply the M AP operation [DLK+ 08] in order to remove inconsistencies in the observation Abox. By applying M AP to the weighted observation Abox, we map the probabilistic logic to the classical logic, i.e., the weighted Abox is changed to a non-weighted consistent Abox which describes several models. The next example shows the effect of applying M AP to an inconsistent Abox. Example 3 Let us consider the following Tbox axioms: DoorSlam Applause Car Car Car DoorSlam v v v v v v Audio Audio ¬Audio ¬DoorSlam ¬Applause ¬Applause Furthermore, let us assume the following inconsistent Abox where ds1 is an instance of the disjoint concepts DoorSlam and Applause: A = {1.3 Car(c1 ), 1.2 DoorSlam(ds1 ), causes(c1 , ds1 ), 0.3 Applause(ds1 )} In order to determine the most probable world according to the Tbox, the M AP operation is applied to the Abox A. The result is the bit vector W : W = h1, 1, 1, 0, 1i where each component of W corresponds to a ground atom of a vector G with G = hCar(c1 ), DoorSlam(ds1 ), Causes(c1 , ds1 ), Applause(ds1 ), Audio(ds1 )i W indicates that there are four positive ground atoms and one negative ground atom denoted by one and zero, respectively. The selected world with the highest probability is the following world: hcar(c1 ), doorSlam(ds1 ), causes(c1 , ds1 ), ¬applause(ds1 ), audio(ds1 )i 28 Preliminaries By considering the positive and negative ground atoms of the most probable world in terms of A, the following assertions remain: {car(c1 ), doorSlam(ds1 ), causes(c1 , ds1 ), ¬applause(ds1 )} The ground atom audio(ds1 ) is not in the above vector since it is not in A. Since an Abox contains only positive ground atoms, we remove ¬applause(ds1 ): {car(c1 ), doorSlam(ds1 ), causes(c1 , ds1 )} This example shows how the inconsistency and the uncertainty in the input data is eliminated by the M AP operation [DLK+ 08]. Note that an Abox is not a model since an Abox indicates many worlds and only some worlds are models. The above Tbox does not contain any existential restrictions. In ALHf − , we have only existentials for the domain restriction of a role which are located on the left side of v operator. Thus, the determination of the most probable world is not affected. Relation Between a Weight and a Probability in MLNs: Based on the Markov logic network the probability of arbitrary weighted atomic concept assertions w A(ind) is determined as follows: p= ew ew + e0 (2.19) In other words, P (A(ind)) = p is entailed. The above equation holds for weighted role assertions w R(ind1 , ind2 ) as well. For w = −∞, the corresponding probability is p = 0, and similarly for w = ∞, the corresponding probability is p = 1. In order to determine the weight for probabilistic assertions given the probability of the associated event, Equation 2.19 has to be resolved to w and the result is: w = ln( p ) 1−p (2.20) Note that the above equations hold if we do not consider the Tbox and any weighted formulas. 29 2.2 Probabilistic Representation Formalisms 30 Chapter 3 Abduction for Data Interpretation: Related Work A general introduction on abduction for multimedia data interpretation has been given in [EKM11, Kay11, Esp11]. The insights are not to be repeated here. However, we discuss related work in the context of probabilistic logic-based abduction in general and introduce description logic Abox abduction in particular. 3.1 Perception as Abduction: The General Picture In [Sha05], abduction is applied in the context of robotics where several sensors are used to discover the physical environment of a robot. Let us assume that Γ indicates the surface-level sensor data. For example, Γ might contain a set of low level observations generated by an edge detector during the image processing. Furthermore, Σ denotes the background knowledge which is composed of two different sets of formulas. The first set describes the effects of the robot’s interaction on the world and the second set describes how the changes in the world operates on the robot’s sensors. The objective of the abduction is to find one or more explanations ∆ such that Σ ∪ ∆ |= Γ (3.1) In addition to the basic abductive framework defined above, Shanahan describes the following important challenges in the context of robotics: • Incompleteness and uncertainty: We have to consider the fact that sensor data is incomplete and uncertain. Incompleteness is due to sensor limitations since each sensor cannot observe its environment completely. There are limitations in the angle, range, etc. Furthermore, there is uncertainty in the data because of the noise effects. 31 3.1 Perception as Abduction: The General Picture • Top-down information flow: The union of the explanation ∆ and the background knowledge Σ entails not only the observations Γ but possibly also a set of new assertions called expectations which might not be explicit in Γ. According to these “expectations”, surface-level sensor data could be investigated again, possibly by applying different approaches (e.g., different edge detectors), in order to verify whether expectations are actually explanations for observations which were just not made explicit before. • Active perception: Active perception describes the actions to gain more information. Shanahan distinguishes between surface-level and deep-level actions. Examples for surface-level actions are adjusting the threshold of an edge detection routine or rotating the robot’s camera. Examples for deep-level actions change the robot’s position to observe its environment. • Sensor fusion: The applied sensors have different accuracy rates. Some sensors are more accurate and consequently, their sensor data is more reliable. However, data generated by different sensors might be conflicting. In robotics, for instance, decisions about actions are made according to sensor data. If then sensor data is conflicting, the decisions might not be reliable. In order to make decisions more accurate, sensor fusion can be applied. Sensor fusion is the process of combining different sensor data to a single model of some aspect of the world [CY90]. Let us assume n hypotheses ∆1 , . . . , ∆n for the sensor data Γ. We assume that only one of the hypotheses can be true at the same time, indicated by: ∆1 ⊕ . . . ⊕ ∆n = true (3.2) Moreover, none of the hypotheses is entailed from any other. Let us consider a hypothesis ∆k . In the following, we define R which does not contain ∆k : R = ∆1 ⊕ . . . ⊕ ∆k−1 ⊕ ∆k+1 ⊕ . . . ⊕ ∆n (3.3) The explanatory value of ∆k is the conditional probability ∆k given ∆k ⊕ R: P (∆k | ∆k ⊕ R) = P (∆k ) P (∆k ) + P (R) (3.4) where the prior probability of R is: P (R) = n X ! − P (∆k ) (3.5) P (∆k ) i=1,i6=k P (∆i ) (3.6) P (∆i ) i=1 Thus, it follows that (cf. [Sha05]) P (∆k | ∆k ⊕ R) = Pn 32 Abduction for Data Interpretation: Related Work Let us assume the hypothesis ∆i = {Ψ1 , . . . , Ψm }. Assuming the assertions of ∆i are independent, we have: m Y P (∆i ) = P (Ψj ) (3.7) j=1 and we can determine the explanatory value of each explanation ∆i . In the following, we summarize the abductive framework approach defined above for explaining surface-level sensor data Γ: 1. Determine explanations ∆i for sensor data Γ. 2. Calculate the explanatory value of each explanation ∆i according to Equation 3.6. Then select the best explanations based on the explanatory values according to the idea that the best explanations have the highest explanatory values. 3. Determine the expectations of the best explanations. By different approaches, analyze the images again and check whether the expectations are fulfilled. Accept the explanation in case of fulfillment. Otherwise, reject the explanation. 4. Compute again the explanatory values of the selected explanations according to the results of step 3. Then, sort the explanations accordingly. In the above procedure, the explanatory value changes in Step 3. Since in Step 3, the image is reanalyzed and the analysis results are determined again. Thus, the existence of some assertions are confirmed and their certainty values are increased. Similarly, the existence of some assertions are rejected. Thus, the new explanation contains assertions with higher certainty values than the previous explanation. Consequently, the new explanation has a higher explanatory value. Note that this approach can only be applied if the independence assumption holds. Thus, the above procedure cannot be applied for every context, but it paves the general way for dealing with interpreting observations using a probabilistic ranking scheme for alternatives computed by abduction. 3.2 Probabilistic Horn Abduction and Independent Choice Logic One of the main works in the context of logic-based abduction is the independent choice logic (ICL) [Poo97]. The independent choice logic introduces a choice space C and a set of rules R. The choice space C is a set of random variables: C = {X1 , . . . , Xn } (3.8) A set of events ~e(X) is assigned to each random variable X where it holds: ∀X1 , X2 ∈ C, X1 6= X2 : ~e(X1 ) ∩ ~e(X2 ) = ∅ 33 (3.9) 3.2 Probabilistic Horn Abduction and Independent Choice Logic The elements of an event ~e(X) are possible values for the random variable X. Let us assume a Boolean random variable X. The possible values for X are true and false indicated by X = true and X = false, respectively. Thus, the set of events of X is: ~e(X) = {X = true, X = false} (3.10) ~e(X) = {χ, ¬χ} (3.11) Another notation is: For better understanding, we give an example for the choice space C. Let us consider the Boolean random variables Car and DoorSlam. Thus, the choice space C contains two random variables: C = {Car, DoorSlam} (3.12) The sets of events of the above random variables are: ~e(Car) = {car, ¬car}, ~e(DoorSlam) = {doorSlam, ¬doorSlam} where ~e(Car) and ~e(DoorSlam) are disjoint. The rules R are determined according to the Bayesian network. An event α ∈ ~e(X) does not appear in the head of a rule. For every event α ∈ ~e(X), a probability function P is defined where: X P (α) = 1 (3.13) α∈~e(X) Independent choice logic can also determine explanations for a set of observations. Let us consider the following k abduction rules used for explaining the predicate Q: Q ← P1 .. . Q ← Pk where P1 , . . . , Pk indicate predicates and each pair of Pi , Pj cannot be true simultaneously: ∀i, j, i 6= j : Pi ∧ Pj = false (3.14) As it was mentioned before, the rules are determined according to the Bayesian network. In order to explain Q, we consider only the rules where Q is in the head. The probability of an explanation ∆ is defined as follows: P (∆) = Y α∈∆ 34 P (α) (3.15) Abduction for Data Interpretation: Related Work The above equation is defined under the assumption that the atoms α ∈ ∆ are independent from each other. The next equation defines the probability of an observation set Γ: |∆| X P (∆) (3.16) P (Γ) = i=1 where ∆ is a minimal explanation of Γ. The disadvantage of the independent choice logic is that the computation of an explanation value P (∆) is based on the assumption that the atoms α of an explanation ∆ are independent from each other (see Equation 3.15). Generally, this assumption does not hold for real applications. The semantics of the independent choice logic is defined by considering the possible worlds. [Poo08b] introduces a new term called total choice to describe the semantics. A total choice for choice space C is defined by selecting exactly one event of each random variable. A possible world corresponds to a total choice. In addition to the events, there are also some other ground atoms in each world which are generated by applying the rules. Note that there is a single model for each possible world since the logic program is acyclic and the events do not appear in the head of the rules. The probability of a possible world is the product of the probabilities of the events which appear in the possible world. Furthermore, the probability of any ground atom is the sum of the probabilities of the possible worlds in which the ground atom is true. AILOG [Poo08a] is a simple representation and probabilistic reasoning system which is implemented based on the independent choice logic theory [Poo97]. AILog is developed as a simple tell-ask user interface, i.e., the rules are told to the system and queries are asked. Furthermore, AILog acts as a propositional as well as a first-order probabilistic reasoner. Example 4 In this example, we discuss how a Bayesian network [Pea88] can be represented in independent choice logic [Poo97]. Let us consider the following Bayesian network which consists of three random variables Car, DoorSlam and CarEntry where conditional probability tables are assigned to the random variables. The size of the conditional probability table depends on the number of parents of each node. In the following Bayesian network, the conditional probability tables of Car and DoorSlam have only one single entry since these two nodes have no parents. The conditional probability table of CarEntry has four entries since this node has two parents: 35 3.2 Probabilistic Horn Abduction and Independent Choice Logic Figure 3.1: An example for Bayesian network The Boolean random variables Car and DoorSlam are indicated by X1 and X2 , respectively. The sets of events of these random variables are defined as follows: ~e(X1 ) = {car, ¬car} ~e(X2 ) = {doorSlam, ¬doorSlam} In the following, the probabilities over the true events (X = true) are given. These probabilities are defined according to the conditional probability tables: P (car) = 0.7 P (doorSlam) = 0.4 The CarEntry node has two parents. Thus, we introduce four new random variables: X3 X4 X5 X6 = = = = CarEntryIfCarDoorSlam CarEntryIfCarNoDoorSlam, CarEntryIfNoCarDoorSlam CarEntryIfNoCarNoDoorSlam The new random variables show how CarEntry depends on Car and DoorSlam. The 36 Abduction for Data Interpretation: Related Work sets of events of the above random variables are: ~e(X3 ) ~e(X4 ) ~e(X5 ) ~e(X6 ) = = = = {carEntryIfCarDoorSlam, ¬carEntryIfCarDoorSlam} {carEntryIfCarNoDoorSlam, ¬carEntryIfCarNoDoorSlam} {carEntryIfNoCarDoorSlam, ¬carEntryIfNoCarDoorSlam} {carEntryIfNoCarNoDoorSlam, ¬carEntryIfNoCarNoDoorSlam} According to Figure 3.1, the probabilities are: P (carEntryIfCarDoorSlam) P (carEntryIfCarNoDoorSlam) P (carEntryIfNoCarDoorSlam) P (carEntryIfNoCarNoDoorSlam) = = = = 0.9 0.6 0.3 0.1 In order to determine the probabilities P (X = false), we use Equation 3.13. According to Equation 3.13, the sum of the probabilities of the events of a random variable is 1. Thus, for X1 we have: P (car) + P (¬car) = 1 ⇒ P (¬car) = 0.3 In the following, the rules R are defined when CarEntry = true: R = {carEntry carEntry carEntry carEntry ← car ∧ doorSlam ∧ carEntryIfCarDoorSlam ← car ∧ ¬doorSlam ∧ carEntryIfCarNoDoorSlam ← ¬car ∧ doorSlam ∧ carEntryIfNoCarDoorSlam ← ¬car ∧ ¬doorSlam ∧ carEntryIfNoCarNoDoorSlam} The above rules are defined according to the Bayesian network in Figure 3.1. The choice space C which contains the random variables is defined as follows: C = {X1 , . . . , X6 } (3.17) In the following, the corresponding independent choice logic of the Bayesian network in Figure 3.1 is given (in the notation of AILog [Poo08a]): 37 3.2 Probabilistic Horn Abduction and Independent Choice Logic prob car : 0.7. prob doorSlam : 0.4. carEntry carEntry carEntry carEntry prob prob prob prob ← car ∧ doorSlam ∧ carEntryIfCarDoorSlam. ← car ∧ ¬doorSlam ∧ carEntryIfCarNoDoorSlam. ← ¬car ∧ doorSlam ∧ carEntryIfNoCarDoorSlam. ← ¬car ∧ ¬doorSlam ∧ carEntryIfNoCarNoDoorSlam. carEntryIfCarDoorSlam : 0.9. carEntryIfCarNoDoorSlam : 0.6. carEntryIfNoCarDoorSlam : 0.3. carEntryIfNoCarNoDoorSlam : 0.1. Table 3.1: An example for independent choice logic In the ICL specification of a Bayesian network, the probability values of the Bayesian network directly appear [Poo97]. Moreover, an ICL has the same number of probabilities as a Bayesian network. In this example, there are 26 worlds since there are 6 random variables. Each world contains 7 ground atoms where 6 ground atoms are selected from the 6 random variables, respectively. The seventh ground atom, carEntry or ¬carEntry, is generated by applying the rules. An example for a possible world is: W = hcar, doorSlam, carEntryIfCarDoorSlam, ¬carEntryIfCarNoDoorSlam, ¬carEntryIfNoCarDoorSlam, ¬carEntryIfNoCarNoDoorSlam, carEntryi In the above world, car, doorSlam, and carEntryIfCarDoorSlam are fulfilled. Thus, by applying the first rule of R, we have CarEntry = true which is the last ground atom of W . In the independent choice logic representation of this example, four rules are defined which show four cases for CarEntry = true. Thus, there are four explanations for CarEntry = true, namely: ∆1 ∆2 ∆3 ∆4 = = = = {car, doorSlam, carEntryIfCarDoorSlam} {car, ¬doorSlam, carEntryIfCarNoDoorSlam} {¬car, doorSlam, carEntryIfNoCarDoorSlam} {¬car, ¬doorSlam, carEntryIfNoCarNoDoorSlam} 38 Abduction for Data Interpretation: Related Work The probability of each explanation ∆i is calculated based on the Equation 3.15: P (∆1 ) P (∆2 ) P (∆3 ) P (∆4 ) = = = = 0.7 × 0.4 × 0.9 = 0.252 0.7 × 0.6 × 0.6 = 0.252 0.3 × 0.4 × 0.3 = 0.036 0.3 × 0.6 × 0.1 = 0.018 According to Equation 3.16 the probability P (CarEntry = true) is the sum of the above explanation probabilities: P (CarEntry = true) = P (∆1 ) + . . . + P (∆4 ) = 0.558 This example shows that the representation of a Bayesian network [Pea88] in independent choice logic [Poo97] will be complicated if the Bayesian network has nodes with more than two parents, which considerably increases the number of rules. Probabilistic Horn abduction for diagnostic frameworks is introduced by [Poo91]. Let us consider an explanation D for a set of observations obs. Similar to the independent choice logic, the knowledge base is defined according to a Bayesian network. Let us assume: P (obs | D) = 1 (3.18) The above assumption is reasonable if explanation D is a correct explanation for obs. In the following, we give an example to show that the conditional probability P (obs | D) = 1 is reasonable. Let us consider the observation obs = {carEntry} and the explanation D = {car, doorSlam, carEntryIfCarDoorSlam}. Then, the probability P (obs | D) is: P (carEntry | {car, doorSlam, carEntryIfCarDoorSlam}) = 1 (3.19) The above probability is equal to 1 since there is the following rule for the explanation of carEntry: carEntry ← car ∧ doorSlam ∧ carEntryIfCarDoorSlam (3.20) According to the above rule, D is a correct explanation for the observation carEntry. Thus, P (obs | D) = 1 is reasonable. The scoring value of an explanation D given observations obs is determined as follows: P (obs | D) × P (D) P (obs) P (D) = P (obs) P (D | obs) = (3.21) (3.22) In the following, we determine the prior probability of an explanation D = {h1 , . . . , hn }: P (D) = P (h1 ∧ . . . ∧ hn−1 ∧ hn ) = P (hn | h1 ∧ . . . ∧ hn−1 ) × P (h1 ∧ . . . ∧ hn−1 ) 39 3.3 Computing Explanations as Markov Logic Inference The term P (h1 ∧ . . . ∧ hn−1 ) is determined recursively where the final recursive call is P (true) = 1. In the following, we compute the term P (hn | h1 ∧ . . . ∧ hn−1 ). Since D is a minimal explanation and we also assume that the logically independent instances of D are probabilistically independent, it follows that P (hn | h1 ∧ . . . ∧ hn−1 ) = P (hn ) (3.23) Thus, the prior probability of an explanation D is determined as follows: P (D) = P (h1 ∧ . . . ∧ hn ) = n Y P (hi ) (3.24) i=1 Similar to the independent choice logic [Poo97], the probability of an explanation D is the product of the probabilities of the assertions which make up the explanation (Equation 3.15). Thus, Qn P (hi ) (3.25) P (D | obs) = i=1 P (obs) P (obs) is the prior probability of the observations and the value is constant for all explanations. In the context of diagnostic frameworks, we make an assumption that the diagnoses are covering, i.e., considering all existing diagnoses. Moreover, the diagnoses are disjoint (mutually exclusive). Let us assume that {e1 , . . . , en } is the set of all explanations of obs. Thus, the prior probability of observations is determined as follows: P (obs) = P (e1 ∨ e2 ∨ . . . ∨ en ) = P (e1 ) + P (e2 ) + . . . + P (en ) Similar to the independent choice logic, the prior probability of an observation set obs is the sum of the explanation probabilities (Equation 3.16). In the following, we determine the probability of explanation ∆1 given observation set obs = {carEntry}: P (∆1 ) P (obs) 0.252 = 0.558 = 0.45 P (∆1 | obs) = 3.3 Computing Explanations as Markov Logic Inference Probabilistic abduction using Markov Logic networks is discussed in [KM09]. This approach describes how the Abox abduction is performed when the input is probabilistic. 40 Abduction for Data Interpretation: Related Work According to this approach, the abduction rules are considered as formulas for Markov logic formalism [DR07]. Since the inference mechanism based on Markov logic is deductive, the notation of the abduction rules has to be changed appropriately. In the following, it is explained how the abduction rules have to be changed to first-order formulas in MLN. The idea of [KM09] is known as Clarke’s completion [Lac98]. Let us assume the following n abduction rules in the background knowledge which explain the observation Q: Q ← P1 Q ← P2 .. . Q ← Pn where Q and Pi for 1 ≤ i ≤ n indicate concept or role predicates. The above rules are transformed to the next first-order logic formula which is called the reverse implication: Q → P 1 ∨ P 2 ∨ . . . ∨ Pn The reverse implication formula means that at least one of the possible explanations should be true. Additionally, we have to consider the mutual exclusivity formulas for every pair of explanations as follows: Q → ¬P1 ∨ ¬P2 .. . Q → ¬P1 ∨ ¬Pn Q → ¬P2 ∨ ¬P3 .. . Q → ¬P2 ∨ ¬Pn .. . Q → ¬Pn−1 ∨ ¬Pn The mutual exclusivity formulas indicate that multiple explanations cannot be true, simultaneously. In the Abox abduction, we apply abduction rules which contain existential operators. Thus, new individuals are generated. The problem of this approach [KM09] is that during the Abox abduction new individuals cannot be generated and assigned to the types. In order to deal with the existentials, new individuals are in advance assigned to the types, so that they could be used later during abduction. This approach is not completely correct because the number of required individuals during abduction is not known. Since we do not know exactly how many individuals are required during the Abox abduction, some individuals are randomly assigned to the types. Consequently, 41 3.3 Computing Explanations as Markov Logic Inference it is possible that the individuals assigned to the types are either not enough or too many. The number of individuals affects the inference process and the precision of the results. If the default individuals are not enough, the result of the Abox abduction is not precise. Similarly, many unrequired individuals slows down the inference process. By changing the abduction rules according to [KM09] to the notation of formulas in MLN, the Abox abduction is still not defined as desired. The reason is that the explanations are not produced as sets. If there is a small set of abduction rules, the user has a clear overview of them. Thus, the user knows the possible deep level assertions and could ask their probabilities. But if there is a large set of abduction rules, it is not clear any more which assertion should be considered as deep level. Since the user loses the overview, it is not clear which probability should be asked. The other problem of this approach is that the knowledge base contains only the Horn rules. Thus, the space of hypotheses is not limited. According to [KM09], we consider non-weighted abduction rules for the Abox abduction. But if there are weighted abduction rules and we want to convert them to the notation of MLN, it is not clear how a weight should be converted to another weight which is assigned to the equivalent formula in MLN. The other problem is regarding the total number of formulas in MLN. Let us assume n abduction rules with the same predicate in the head. In the following, we determine the total number of equivalent formulas in MLN: Sum = 1 + (n − 1) + (n − 2) + . . . + 1 n(n − 1) = 1+ 2 In the above sum, there is only one reverse implication formula and n(n−1) mutually 2 exclusive formulas. If we compare, the number of abduction rules n with Sum, we notice that: n ≤ Sum (3.26) In the following, we have plotted the Sum function over n: 42 Abduction for Data Interpretation: Related Work Figure 3.2: The number of abduction rules with the same predicate in the head n and the total number of equivalent formulas in MLN Sum The above figure shows that by increasing n, the difference among Sum and n becomes greater. This means if we have too many abduction rules with the same predicate in the head, we will have many more equivalent formulas in MLN. 3.4 Preference Models Using Bayesian Compositional Hierarchies Probabilistic control for interpretation tasks by Bayesian Compositional Hierarchy (BCH) [Neu08] is discussed in [BNHK11, BKN11] (see also earlier work presented in [NM08]). [BKN11] describes a generic framework for ontology-based scene interpretation. The developed framework is applied for real time monitoring of the aircraft service activities. Some examples for aircraft service activities are arrival preparation, unloading, tanking, etc. The input to the interpretation system are analysis results of video streams stemming from various cameras installed at the airport. The output of the interpretation system are high-level activity scene descriptions about events at the airport. Main purposes of real time monitoring of the aircraft service activities are: • To notify possible delays and to counteract as soon as possible. • To provide predictions about the completion of an activity. • To extend monitoring of service activities in order to include unrelated object behaviour (e.g., vehicles which are not allowed to get close to the aircraft.). 43 3.5 DLLP-Abduction Based Interpretation The described scene interpretation framework is domain-independent, which means it can be adapted and used for other applications. The input to the interpretation system arrives piecewise since the input is given by the incremental analysis results from the video stream. Thus, early interpretations are with poor context. Since the scene interpretation is determined stepwise, interpretation alternatives need to be explored in parallel. To assign probabilistic ratings to the interpretation alternatives, a probabilistic formalism is used. The BCH formalism [Neu08] defines an aggregate model (compositional hierarchy). An aggregate model is defined as a joint probability distribution in the following form P (A B 1 . . . B k C) where A, B 1 . . . B k , and C indicate the aggregate header, the parts B i of A, and some conditions C on the parts, respectively. An aggregate header represents a part of an aggregate at the next higher level. The work described in [BKN11] employs beam search for exploring the space of possible interpretations given. Interpretations with low ratings are discarded and interpretations with high ratings are kept. For more details see [BKN11]. The approach described in [BKN11] is similar to the approach defined in this work. The difference is in the applied probabilistic formalism and in the defined scoring functions. [BKN11] uses BCH [Neu08] as the probabilistic formalism whereas we use Markov Logic formalism [DR07]. Using the probabilistic formalism of Markov Logic, the construction space for interpretation is defined in terms of abduction in this thesis. Abduction is based on logic programming (LP [Llo87]) rules applied to Aboxes with description logics (DL) [BCM+ 03] as constraints. The abduction component (called DLLP) is implemented in the RacerPro reasoning system [HM01] and we build on top of it. Basic concepts behind DLLP abduction are described in the next subsection, as part of the abduction-based formalization of interpretation tasks. 3.5 DLLP-Abduction Based Interpretation Abduction can be applied to concepts and Aboxes. An implementation of concept abduction [CDD+ 03] is discussed in [RSDD09]. Since interpretations should describe graph rather than tree structures, concept abduction was found not to be expressive enough. [KES11] introduces a formal computational framework for Abox abduction in the DL ALC. The applied reasoning mechanism is based on regular connection tableau [Häh01] and resolution with set-of-support [Lov78]. In the following, we define the term aggregate instance which is used in this discussion. An aggregate instance [NM06] is a deep-level object which is built on surface-level objects. We consider the analysis results as surface-level objects. The deep-level objects are the results of the explanation procedure. Thus, the surface-level objects are parts of an aggregate instance. In [KES11], no logic programming rules are used for Abox abduction. We argue that by considering only Tbox axioms as proposed in the elegant approach described in [KES11], we cannot generate appropriate aggregate instances involving graph structures as required for interpretation tasks. 44 Abduction for Data Interpretation: Related Work 3.5.1 DLLP Abduction In this section, approaches for computing results for Abox abduction problems are discussed. The main approach is based on description logic [BCM+ 03] and logic programming [Llo87]. Therefore we use the name DLLP abduction. In the scope of this work, the DLLP Abduction has been implemented with RacerPro [HM01] for interpreting multimedia documents. RacerPro is used as the DL-reasoner in this work. The DLLP Abduction has been well tested through the experimental studies. Section 3.5.1 is taken from [EKM09, GMN+ 09a, GMN+ 10a, GMN+ 10c]. In contrast to the well known deduction process, where the conclusion goes from some causes to an effect, the abduction process goes from an effect to some causes [Pei78]. In general, abduction is formalized as Σ ∪ ∆ |=R Γ where background knowledge (Σ), rules (R), and observations (Γ) are given, and explanations (∆) are to be computed. In terms of DLs, ∆ and Γ are Aboxes and Σ is a pair of Tbox and Abox. Abox abduction is implemented as a non-standard retrieval inference service in DLs. In contrast to standard retrieval inference services where answers are found by exploiting the ontology, Abox abduction has the task of acquiring what should be added to the knowledge base in order to answer a query. Therefore, the result of Abox abduction is a set of hypothesized Abox assertions. To achieve this, the space of abducibles has to be defined. We do this in terms of rules. We assume that a set of rules R as defined above (see Section 2.1.2) are specified, and define a non-deterministic function compute explanation as follows [EKM09]:1 • compute explanation(Σ, R, A, P (z)) = transform(Φ, σ) if there is a rule r ∈ BR with the following form: P (X) ← Q1 (Y1 ), . . . , Qn (Yn ) We apply r to an Abox A such that a minimal set of atoms Φ and an admissible variable substitution σ with σ(X) = z can be found such that the query 0 Q := {() | {Q1 (σfresh 0 prefix (Y1 )), . . . , Qn (σfresh prefix (Yn ))} \ Φ} is answered with true. Note that σ might also introduce mappings to individuals not mentioned in A (new individuals). The number of new individuals is bounded by the number of variables. 0 The variable substitution σ 0 is an extension of σ such that σprefix (x) = σ(x) if 0 x ∈ as set(X) and, otherwise, σprefix (x) = concat(prefix , x) where concat is a function for concatenating prefix and x. The string fresh prefix denotes a fresh name for each application of a rule r (variable renaming). 1 The funcion transform is defined in Section 2.1.2. 45 3.5 DLLP-Abduction Based Interpretation • If no such rule r exists in R it holds that: compute explanation(Σ, R, A, P (z)) = ∅ (3.27) The goal of the function compute explanation is to determine what must be added (Φ) such that an entailment Σ ∪ A ∪ Φ |=R P (z) holds. Hence, for compute explanation, abductive reasoning [FK00, Pau93] is used. The set of assertions Φ represents what needs to be hypothesized in order to answer the query Q with true. The definition of compute explanation is non-deterministic due to several possible choices for Φ and r. In order to locally rank different solutions, we associate a score n − 2|Φ| with every Φ. The number n denotes the number of query atoms (see the definition of Q above). The score is accessed by referring to a subscript score in the expression Φscore (see the next section). This score, a value between 0 and 1 assigns higher values to explanations if more assertions are already entailed and need not be hypothesized (relative to the total number of assertions). We refer to [EKM09] for a detailed evaluation of this score. The score is a “local” score in the sense that only the assertions in the explanations are considered (w.r.t. the background knowledge) but not the whole set of observations is used to score an explanation alternative. Applying the compute explanation procedure, we say the set of rules is backwardchained, and since there might be multiple rules in R, backward-chaining is nondeterministic. Thus, multiple explanations are potentially generated. The relation causes can be explained by the following backward chaining rules: BR = {∀x, y causes(x, y) ← ∃z CarEntry(z), hasObject(z, x), hasEffect(z, y), Car(x), DoorSlam(y) ∀x, y causes(x, y) ← ∃z CarExit(z), hasObject(z, x), hasEffect(z, y), Car(x), DoorSlam(y)} In the following, we devise an abstract computational engine for “explaining” Abox assertions in terms of a given set of rules. Explanation of Abox assertions w.r.t. a set of rules is meant in the sense that using the rules some deep-level explanations are constructed such that the Abox assertions are entailed. The explanation of an Abox is again an Abox. For instance, the output Abox represents results of the content interpretation process. The presentation is slightly extended compared to the one in [CEF+ 08]. Let the agenda A be a set of Aboxes Γ and let Γ be an Abox of observations whose assertions are to be explained. The goal of the explanation process is to use a set of rules R to derive “explanations” for elements in Γ. The explanation algorithm implemented in the DLLP Abduction Engine (DLLPA Engine) works on a set of Aboxes A. The complete explanation process is implemented by the DLLPA function [EKM09]: 46 Abduction for Data Interpretation: Related Work Algorithm 1: The abduction algorithm Function DLLPA(Ω, Ξ, Σ, FR, BR, WR, S, A): Input: a strategy function Ω, a termination function Ξ, a knowledge base Σ, a set of forward rules FR, a set of backward chaining rules BR, a set of weighted rules WR, a scoring function S, and an agenda A Output: a set of interpretation Aboxes A0 repeat (A, α, r) := Ω(A, S, WR, BR); A0 := (A \ {A}) ∪ explanation step(Σ, r, FR, A, α); until Ξ(A0 ); return A0 In the above algorithm, the strategy function Ω determines the fiat assertion α, Abox A, and the backward chaining rule r where α ∈ A and α can be explained by applying r. The strategy function Ω is defined in Algorithm 3. Additionally, DLLPA uses a termination function Ξ in order to check whether to terminate due to resource constraints. The termination condition Ξ(A) is defined as follows: ¬∃ a new bunch of observations ∨ ¬∃A ∈ A ∨ ¬∃α ∈ A where α is an unexplained fiat assertion. Moreover, DLLPA applies a scoring function S to rank explanations. The function explanation step is defined as follows: explanation step(Σ, r, FR, A, α): [ consistent completed explanations(Σ, r, FR, A, ∆) ∆∈compute lo explanations(Σ,r,F R,A,α) where compute lo explanations indicates compute locally optimized explanations. We need two additional auxiliary functions: consistent completed explanations(Σ, r, FR, A, ∆): {∆0 | ∆0 = ∆ ∪ A ∪ forward chain(Σ, FR, ∆ ∪ A), consistentΣ (∆0 )} The function consistent(T ,A) (A0 ) determines if the Abox A ∪ A0 has a model which is also a model of the Tbox T . Note the call to the non-deterministic function compute explanation. It may return different values, all of which are collected. In the next chapter, we explain how probabilistic knowledge is used to (i) formalize the effect of the “explanation”, and (ii) formalize the scoring function S used in the DLLPA algorithm explained above. In addition, it is shown how the termination condition (represented with the parameter Ξ in the above procedure) can be defined based on the probabilistic conditions. The Abox abduction algorithm DLLP is implemented in the DL reasoner RacerPro [HM01]. 47 3.5 DLLP-Abduction Based Interpretation 3.5.2 Ranking Simple Explanations A default ranking scheme for explanations (∆s) computed during the abduction process is provided by the implementation is RacerPro [HM01]. A scoring function S evaluates an explanation ∆ according to the two criteria proposed by Thagard for selecting explanations [Tha78], namely simplicity and consilience. According to Thagard, the less hypothesized assertions an explanation contains (simplicity) and the more ground assertions (observations) an explanation involves (consilience), the higher its preference score. The following function can be used to compute the preference score for a given explanation2 [EKM09]: S(Σ, R, A, ∆) := Sc (Σ, R, A, ∆) − Sh (Σ, R, A, ∆) (3.28) The function Sc represents the number of assertions in the analysis Abox A that follow from Σ ∪ ∆, and the function Sh indicates the number of assertions in the explanation that do not follow from Σ ∪ A. Thus, Sc and Sh can be defined as follows: Sc (Σ, R, A, ∆) := ]{α ∈ A | Σ ∪ ∆ |=R α} Sh (Σ, R, A, ∆) := ]{α ∈ ∆ | Σ ∪ A 2R α} (3.29) (3.30) where A denotes the analysis Abox. In the context of multimedia interpretation, we prefer an explanation which is more consilient and simpler. Consequently, an explanation with the highest scoring value is preferred. In the following, an example is given which explains how the scoring values of explanations are determined. Example 5 Let us assume the next Abox: A = {Car(c1 ), DoorSlam(ds1 ), causes(c1 , ds1 )} Moreover, let us assume the next set of backward chaining rules: BR = {∀x, y causes(x, y) ← ∃z CarEntry(z), hasObject(z, x), hasEffect(z, y), Car(x), DoorSlam(y) ∀x, y causes(x, y) ← ∃z CarExit(z), hasObject(z, x), hasEffect(z, y), Car(x), DoorSlam(y)} By applying the above backward chaining rules on A the following explanations are generated: ∆1 = {CarEntry(ind42 ), hasObject(ind42 , c1 ), hasEffect(ind42 , ds1 )} ∆2 = {CarExit(ind42 ), hasObject(ind42 , c1 ), hasEffect(ind42 , ds1 )} 2 For the sake of brevity the parameters of S are not shown in the previous functions. 48 Abduction for Data Interpretation: Related Work The scoring value of ∆1 is determined as follows: Sc,1 = 0 Sh,1 = 3 S1 = Sc,1 − Sh,1 = −3 Similarly, the scoring value of ∆2 is determined: Sc,2 = 0 Sh,2 = 3 S2 = Sc,2 − Sh,2 = −3 Since the above explanations have the same scoring values, none of them is preferred to the other. This example shows how the explanations are ranked according to the approach defined in [EKM09]. 3.6 The Need for Well-Founded Control for Sequential Abduction Steps The experiments described in [Kay11], [Esp11] indicate that the elementary score described above is required for solving even the simplest one-step abduction problems for interpretation tasks. Only ad hoc control regimes were established for sequential abduction result. In addition, other simplifications were made in [Kay11], [Esp11]. The precondition of the Abox abduction is that the observations are strict. Additionally, in the Abox abduction algorithm no depth and branch control for sequential abduction steps is provided, as the scoring function introduced so far considers only a single abduction step. As shown in [Kay11], [Esp11], in principle, interpretation of observations can be formalized using DLLP abduction. Although DLLP-abduction based interpretation was shown to be successful from an information retrieval point of view [Kay11] as well as from a content and knowledge management point of view [Esp11], the scores used so far are only a first step, still too many “equal” interpretations can possibly be generated in some contexts. Therefore, based on the basic score introduced above, in this work, we introduce a new application of DLLP-abduction based interpretation with a second ranking level for which we define a probabilistic scoring function used for interpretation purposes. Although, in principle, the BCH approach [BKN11] described in Section 3.4 could be used as a basis for probabilistic ranking, due to the closed connection to first-order logic we investigated Markov Logic [DR07] for this purpose. Due to the fact that only fixed names can be handled by Markov Logic abduction as proposed by Kate and 49 3.6 The Need for Well-Founded Control for Sequential Abduction Steps Mooney [KM09], we do not follow this approach to generate interpretations but use Markov Logic [DR07] only for ranking. Unlike abduction as implemented by [Kay11], the probabilistic ranking approach should handle “weighted” observations as well. Using the probabilistic scoring function, approaches for depth and branch control can be implemented as an extension to [Kay11, Esp11]. In addition, a restriction on logic programming rules as used in [Kay11], namely non-recursivity, should be relaxed. 50 Chapter 4 Probabilistic Control for DLLP-Abduction Based Interpretation In this chapter, control of multiple applications of the DLLP abduction procedure sketched in the previous chapter is investigated in detail. An interpretation for a set of observations is represented by an Abox, which contains so-called deep-level concept and role assertions. The solution to control interpretation generation is based on a probabilistic ranking principle. The interpretation algorithm includes a scoring function and a termination function. The termination function determines if the interpretation process can be stopped at some point during the interpretation process. The idea for stopping the interpretation process is that if, according to the scoring function, no significant improvements of the interpretation results are achieved there is no benefit for further interpretations, and, consequently, the process is terminated [GMN+ 10a, GMN+ 10c] until new input observations arrive. The scoring function defined in this chapter assigns probabilistic scores to interpretations generated by DLLP abduction. 4.1 Abduction-based Interpretation Inside an Agent Since we adopt the view that intelligent agents are used for solving problems while acquiring information, in the following the interpretation problem is analyzed from the perspective of an agent using an operational semantics. Consider an agent given some observation descriptions in a context where the descriptions are the results of shallow, or surface-level, analysis of multimedia documents.1 The objective of this agent is to find explanations for input percepts (aka observations). The question is how the interpretation Aboxes for the observations are determined and how long the interpretation procedure should be performed by the agent. The functionality of this 1 The analysis might also be carried out by the agent. 51 4.1 Abduction-based Interpretation Inside an Agent Media Interpretation Agent is presented in the M I Agent algorithm in Section 4.1. Section 4.1 is mostly taken from [GMN+ 10a, GMN+ 10c]. The main idea is summarized as follows. The agent selects observations one after the other for explanation and, basically in a greedy approach, only the best interpretation is used to explain further observations. It might turn out however, that interpretation alternatives currently explored receive lower scores than one of the previous interpretations (that did not explain some observation but may be others very well). Due to space constraints not all interpretations can be kept, however. Thus an ordered agenda of possible interpretations to work on is maintained. Let us assume that the analysis Abox A contains n assertions where k assertions are weighted and the remaining assertions are strict: A = {w1 α1 , . . . , wk αk , αk+1 , . . . , αn } In the above Abox, α1 , . . . , αn denote the ground atoms of the M LN and w1 , . . . , wk indicate weights. In the first-order knowledge base, we have weighted rules (or formulas) WR enforcing that models form a compositional hierarchy with interconnections among the parts [BKN11]: w C3 (z) ∧ r2 (z, x) ∧ r3 (z, y) ⇒ r1 (x, y) ∧ C1 (x) ∧ C2 (y) (4.1) where w indicates a real number weight from the interval (−∞, +∞). In order to explain the terms on the right side of the above weighted rule in the spirit of DLLP abduction, i.e., for technical reasons, we transform the above rule into the following backward chaining rules: r1 (x, y) ← C1 (x), C2 (y), C3 (z), r2 (z, x), r3 (z, y) C1 (x) ← r1 (x, y), C2 (y), C3 (z), r2 (z, x), r3 (z, y) C2 (y) ← C1 (x), r1 (x, y), C3 (z), r2 (z, x), r3 (z, y) (4.2) (4.3) (4.4) Similarly, we consider the next weighted rule: w C2 (z) ∧ r(z, x) ⇒ C1 (x) (4.5) The following backward chaining rule is required to explain the term on the right side of the above weighted rule: C1 (x) ← C2 (z), r(z, x) (4.6) The derivation of backward chaining rules for DLLP abduction is straightforward and can be automatized. Thus, the interpretation knowledge of the agent can be specified as a set of weighted rules of the form explained above. Next, we introduce the term fiat assertion which is used frequently in subsequent algorithms. Fiat assertions are the assertions which require explanations. The predicate of a fiat assertion is matched 52 Probabilistic Control for DLLP-Abduction Based Interpretation with the head of a backward chaining rule. Note that each fiat assertion represents an observation. In the following, we define Fiats(A) which returns a set containing fiat assertions of A w.r.t. a set of backward chaining rules BR: Fiats(A) = {α ∈ A | ∃r ∈ BR : predicateSym(head(r)) = predicateSym(α)} where predicateSym(α) returns the predicate symbol (role or concept name) mentioned in the assertion α. Moreover, head(r) returns the head of a backward chaining rule r. The interpretation procedure interpret is defined in Algorithm 2 shown below. Before discussing the algorithm in detail, the auxiliary operation P (A, WR) in the interpret algorithm is explained. The P (A, WR) function determines the scoring value of the interpretation Abox A with respect to a set of weighted rules WR. In an interpretation Abox, the fiat assertions are divided into two disjoint sets, namely explained and unexplained fiat assertions. The defined scoring value in this work is the average of the fiat assertion probabilities, including explained and unexplained ones. We assume that the probability of an unexplained fiat assertion to be true is 0.5 (the value could be supplied as a parameter or could be made context-specific). Let us assume that the interpretation Abox A has n explained and m unexplained fiat assertions. The scoring value of A is determined based on the Markov logic formalism [DR07] as follows: ! n X 1 m × 0.5 + PM LN (A,WR) (Qi (A) | ~e(A)) (4.7) P (A, WR) = n+m i=1 where Qi (A) and ~e(A) denote the query and the evidence vector, respectively. The query Qi (A) is defined as follows: Qi (A) = hαi = truei where αi is an explained fiat assertion of A. The query Q(A) and evidence vector ~e(A) are functions of the interpretation Abox A since each A contains different fiat assertions. Note that during the interpretation procedure the query might change. This is due to increasing fiat assertions by applying forward and backward chaining rules. Additionally, by explaining fiat assertions, the number of explained fiat assertions in the interpretation Abox increases and the number of unexplained fiat assertions decreases. The evidence vector ~e(A) contains in addition to the observations, assertions generated by forward, and backward chaining rules except the fiat assertions: ~e(A) = {α = true | α ∈ A \ Fiats(A)} The observations and the hypothesized assertions (except the fiat assertions) are considered as evidences for the determination of the probabilistic scoring values. In order to answer the query PM LN (A,WR) (Q(A) | ~e(A)) the function M LN (A, WR) is called. 53 4.1 Abduction-based Interpretation Inside an Agent This function returns a Markov logic network (FM LN , WM LN ) where r ∈ F and w ∈ W iff w r ∈ WR. Note that for constructing Markov logic network [DR07] only the weighted rules WR are considered as formulas. Furthermore, the constants of the Markov logic network are the individuals which exist in A. There are also some predicates in the weighted rules with no constants. Thus, constants are defined and assigned to the types of those predicates. As a result, the constant set contains the individuals which exist in A and the defined constants. Note that the strict assertions of A are not considered as formulas for constructing the Markov logic network. In the following, the interpretation algorithm interpret is presented: Algorithm 2: The interpretation algorithm Function interpret(A, Γ, (T , A0 ), FR, BR, WR) Input: an agenda A, an Abox of observations Γ, a Tbox T , an Abox A0 , a set of forward chaining rules FR, a set of backward chaining rules BR, and a set of weighted rules WR Output: an agenda A0 , and a new interpretation Abox N ewI Σ := (T , A0 ); S := λ(A).P (A ∪ A0 , WR); A0 := DLLP A(Ω, Ξ, Σ, FR, BR, WR, S, A); N ewI := argmaxA∈A0 (P (A, WR)); return (A0 , N ewI); In the above algorithm, the DLLP A function is called which returns an agenda A0 . The function argmaxA∈A S(A) selects those elements A ∈ A for which the scoring function S applied to A returns a maximal value: max score(A) = argmaxA∈A S(A) (4.8) Thus, the interpretation Abox NewI with the maximum score among the Aboxes A of A0 is selected. The reason for considering all interpretation Aboxes on the agenda and not only the one with the highest scoring value is that the fiat assertions in the interpretation Aboxes A are not the same. Since the query is a function of the interpretation Abox Q(A), it might be the case that after the explanation procedure we can find an interpretation Abox with a higher scoring value than the Abox which has currently the maximum score. In the following, we define the strategy function Ω, which is one of the parameters of DLLPA: 54 Probabilistic Control for DLLP-Abduction Based Interpretation Algorithm 3: The strategy algorithm Function Ω(A, S, WR, BR) Input: a set of interpretation Aboxes A, a scoring function S, a set of weighted rules WR, and a set of backward chaining rules BR Output: an Abox A, a fiat assertion α, and a backward chaining rule r A := sort(A, S) A := max score(A); B = ∅; foreach α ∈ UnexplainedFiats(A) do foreach wr ∈ WR do // Determine the tuples containing a fiat assertion and the supporting weighted rule B = B ∪ {(αi , wi ri )}; max ws = {(αn , wn rn ) ∈ B | ¬∃(αl , wl rl ) ∈ B, αn 6= αl , rn 6= rl , wl > wn }; (αm , wm rm ) = random select(max ws); Find r ∈ BR so that r is the corresponding backward chaining rule of wm rm ; return (A, αm , r); In the following, we define the supporting weighted rule SWR for the fiat assertion α: SWR(α) = {wr ∈ WR | predicateSym(α) = predicateSym(head(wr))} Let us assume an Abox containing multiple fiat assertions. In this algorithm, we discuss which fiat assertion should be explained first, so that the scoring values assigned to the interpretation Aboxes are plausible. We consider the top k interpretation Aboxes according to the scoring values. The scoring value assigned to an interpretation Abox is the average of the fiat assertion probabilities given the evidences. Furthermore, the knowledge base contains weighted rules which are used for computing the scoring values of the interpretation Aboxes. The weights of the weighted rules are not necessarily equal. This means the fiat assertions are supported by the weighted rules with high and/or small weights. Thus, the order to select the next fiat assertion affects the scoring value assigned to the interpretation Abox. In case of selecting a fiat assertion supported by a weighted rule with a small weight, a small scoring value is assigned to the interpretation Abox which could have a high scoring value. Thus, the generated interpretation Abox is most probably not among the top k interpretation Aboxes and is discarded. In order to avoid such cases, we have to select at each step the fiat assertion which has a supported weighted rule with the highest weight. The above strategy function Ω contains as argument the agenda A which is a set of Aboxes A, a scoring function S, a set of weighted rules WR, and a set of backward chaining rules BR. We determine the set B containing the tuples with a fiat assertion α ∈ A and a supporting weighted rule for α. Then, we find the tuples from B containing the maximum weight. If there is more than one tuple with the maximum weight, one 55 4.1 Abduction-based Interpretation Inside an Agent of the tuples is randomly selected. The next fiat assertion to be explained is the first element αm in the tuple with the maximum weight. According to the weighted rule in the selected tuple, the corresponding backward chaining rule from BR is found which is used for explaining the fiat assertion αm . The above algorithm returns the Abox A with the maximum score, the next fiat assertion αm ∈ A to be explained and a backward chaining rule r. As introduced at the beginning of this chapter, the media interpretation agent extracts observations from the analysis queue. The analysis results do not arrive at equal time distances. Thus, we check the analysis queue in a loop if it is not empty. In the following, we present the media interpretation agent algorithm M I Agent. This algorithm calls the interpret function. 56 Probabilistic Control for DLLP-Abduction Based Interpretation Algorithm 4: The multimedia interpretation agent algorithm Function MI Agent(QΓ , die(), (T , A0 ), A, FR, BR,WR) Input: a queue of observations QΓ , a termination function die(), a background knowledge base (T , A0 ), an agenda A, a set of forward chaining rules FR, a set of backward chaining rules BR, and a set of weighted rules WR Output: – currentI = ∅; A0 = {∅}; Σ := (T , A0 ); stopProcessing = false; repeat Γ := extractObservations(QΓ ); W := M AP (Γ, T ) ; Γ0 := select(W, Γ); label : A = argmaxA∈A (P (A, WR)) A=A\A Consider the remaining unexplained fiats of A as explained A = A ∪ A0 ∪ Γ0 if A is consistent then A = A ∪ forward chain(Σ, FR, A) if A is consistent then A0 = A0 ∪ A (A, newI) := interpret(A0 , Γ0 , Σ, FR, BR, WR) currentI := newI else goto label else goto label until stopProcessing=true or die(); The M I Agent uses a set of standard functional programming patterns such as filter , and zip. Furthermore, a function select is defined which uses the latter two functions. In the following, the function filter is defined: ( ∅ if f (x) = false filter (f, X) := {f (x)} else x∈X [ The function filter takes as parameters a function f and a set X and returns a set consisting of the values of f applied to every element x ∈ X if f (x) 6= false. The 57 4.1 Abduction-based Interpretation Inside an Agent function zip is defined as follows: [ zip(X, Y ) := {(x, y)} x∈X,y∈Y The function zip produces as output a set of tuples by taking as parameters two sets X and Y and pairing their successive elements. In order to select elements y from an ordered set Y using a bit vector which is also represented by an ordered set X, the function select is defined as follows: select := λ(X, Y ).filter (λ(x, y).if x then y else false, zip(X, Y )) The select function determines the positive ground atoms of the most probable world which also exist in the Abox. Since A might contain probabilistic assertions, the most probable world has to be determined by MAP operation [DLK+ 08]. By applying MAP operation, we convert the probabilistic logic into the classical logic where the assertions are strict. The positive ground atoms which also exist in A are determined by MAP operation. These positive ground atoms stand as evidences in the database file. Each assertion of the database file indicates the truth value of the corresponding ground atom in the Markov logic network. In the M I Agent function, the current interpretation currentI is initialized to empty set and the agenda A0 to a set containing empty set. Since the agent performs an incremental process, it is defined by a repeat-loop. In case the agent receives a percept result Γ, it is sent to the queue QΓ . In order to take the observations Γ from the queue QΓ , the M I Agent calls the extractObservations function. The function select(W, Γ) then selects the positive assertions from the input Abox Γ using the bit vector W . The selected positive assertions are the assertions which require explanations. The select operation returns as output an Abox Γ0 . The determination of the most probable world by the M AP function and the selection of the positive assertions is carried out on Γ0 ∪ A ∪ A0 . In the next step, a set of forward chaining rules FR is applied to Γ0 ∪ A ∪ A0 . The generated assertions in this process are added to Γ0 ∪ A ∪ A0 . In the next step, only the consistent Aboxes are selected and the inconsistent Aboxes are removed. Then the interpret function is called which determines the interpretation Aboxes. Afterwards, the Abox currentI is assigned to newI. The termination condition of the M I Agent function is that the stopP rocessing is activated externally by the user or the die() function is true. In case of QΓ = ∅, the M I Agent waits at the function call extractObservations. Example 6 Let us consider the Tbox axiom: DoorSlam v ¬EngineSound and the following Abox: A = {1.3 Car(c1 ), 1.2 DoorSlam(ds1 ), causes(c1 , ds1 ), 0.6 EngineSound(ds1 )} 58 Probabilistic Control for DLLP-Abduction Based Interpretation In order to determine the most probable world, the MAP operation [DLK+ 08] is applied to the above Abox. The result is the following vector: X = h1, 1, 1, 0i where each element of X corresponds to the elements of the following vector: Y = hCar(c1 ), DoorSlam(ds1 ), causes(c1 , ds1 ), EngineSound(ds1 )i The zip(X, Y ) function generates the next set of pairs: zip(X, Y ) = {(1, Car(c1 )), (1, DoorSlam(ds1 )), (1, causes(c1 , ds1 )), (0, EngineSound(ds1 ))} Applying the function f (x, y) = if x then y else false on the above set of pairs returns the following components, respectively: Car(c1 ), DoorSlam(ds1 ), causes(c1 , ds1 ), false The result of the select(X, Y ) function is: select(X, Y ) = {Car(c1 )} ∪ {DoorSlam(ds1 )} ∪ {causes(c1 , ds1 )} ∪ ∅ = {Car(c1 ), DoorSlam(ds1 ), causes(c1 , ds1 )} The above set contains the positive ground atoms of A existing in the most probable world. In this work, we are only interested in the positive ground atoms of the most probable world. The reason is that we consider the open-world assumption [BCM+ 03]. According to the open-world assumption, if the knowledge base Σ does not entail assertion C(i), it does not follow ¬C(i): Σ 2OWA ¬C(i) if Σ 2OWA C(i) (4.9) This example shows, how the positive ground atoms of an Abox existing in the most probable world are determined. After having presented the above algorithms, the unanswered questions mentioned above can be discussed. A reason for performing the interpretation process and explaining the fiat assertions is that the probability of P (A, WR) will increase through the interpretation process [GMN+ 10a, GMN+ 10c]. In other words, by explaining the observations the agent’s belief of the observations being true will increase. 59 4.2 Controlling the Interpretation Process 4.2 Controlling the Interpretation Process In the previous section, we described the interpretation process. In this section, we define controlling the interpretation process which is the main objective of this work. We control the interpretation process by applying Markov logic formalism [DR07]. We apply Markov logic formalism for ranking interpretation alternatives but not for generating interpretations. We cannot apply Markov logic abduction approach described in [KM09] for generating interpretations since this approach requires fixed names for individuals. The reason for controlling the interpretation process is that the agent’s resources for computing explanations are limited. In the following, we define three different approaches for controlling the interpretation process, namely controlling branching, controlling abduction depth, and controlling reactivity. 4.2.1 Controlling Branching Controlling branching is one of the procedures for controlling the interpretation process. In the following, we define branching problem. Branches are generated if there are more than one explanation ∆ for a fiat assertion α. Multiple explanations lead to multiple interpretation alternatives. Thus, the computation of the interpretation procedure is a resource-consuming task. In order to reduce the required resources, we apply controlling branching procedure. According to controlling branching procedure, we use an agenda to maintain possible interpretation alternatives. The reason is that maintaining all interpretation alternatives is a resource-consuming task. On the agenda, the interpretation alternatives are sorted by scoring values. Let us assume an agenda with size k. An agenda with size k maintains the top k interpretation alternatives. Thus, we keep interpretation alternatives with high ratings and discard interpretations with low ratings. Consequently, we reduce the memory space required for the interpretation procedure, enormously. In the following, we define controlling branching procedure applied in this work. According to this approach, we apply Beam search to explore the interpretation space. The reason for applying Beam search is that Beam search is non-exhaustive. Thus, we reduce the size of the interpretation space. Additionally, we apply Formula 4.7 as the probabilistic scoring function. Thus, we rate interpretation alternatives, probabilistically. The probabilistic scoring function 4.7 is defined according to Markov logic formalism [DR07]. We consider the set of weighted rules WR for ranking interpretation alternatives. According to the scoring function 4.7, the rank of an interpretation alternative is the average of the fiat assertion probabilities given the evidences. For the interpretation process, the agent selects fiat assertions of an interpretation alternative one after the other for explanation. In this work, the weights of the weighted rules are not necessarily equal. Thus, the selection order of fiat assertions affects the ranks of the interpretation alternatives. In this work, we defined strategy algorithm 3 which determines the selection order of fiat assertions. Strategy algorithm 3 selects a fiat 60 Probabilistic Control for DLLP-Abduction Based Interpretation assertion which has a supported weighted rule with the highest weight. Consequently, the ranks of the interpretation alternatives are plausible. According to Beam search, at each step only the interpretation alternative with the highest score is chosen to explain further fiat assertions. Thus, the fiat assertions of the other interpretations with smaller scores are not explained any more. Consequently, only the interpretation with the highest score is replaced by new interpretations. Thus, we have branchings only for this interpretation alternative. It might happen that the generated interpretations receive smaller scores than the previous ones which were not selected for further explanation of fiat assertions. In this case, the interpretation alternatives on the agenda are sorted by scoring values. Let us assume an agenda with size k. In the following, we summarize the mentioned steps: 1. To select the interpretation alternative with the highest score ImaxScore from the agenda. 2. To select a fiat assertion α of ImaxScore according to strategy algorithm 3. 3. To explain fiat assertion α selected in Step 2. 4. To rank interpretation alternatives according to Formula 4.7. 5. To sort interpretation alternatives by their ranks. 6. To maintain the top k interpretation alternatives on the agenda. The advantage of controlling branching is that we reduce time and memory space required for computation of the interpretation procedure. Thus, we reduce the complexity of the branching problem, enormously. Furthermore, controlling branching reduces the size of the interpretation space. 4.2.2 Controlling Abduction Depth In addition to controlling branching described above, we define controlling abduction depth which is another procedure for controlling the interpretation process. In 4.2.1, we defined 6 steps which are applied for controlling branching. According to controlling abduction depth, we determine how many times these steps are performed. Thus, the steps in 4.2.1 are inside a loop which iterates m times, maximally. m indicates the maximum abduction depth. Consequently, controlling abduction depth determines how many fiat assertions are maximally explained. The explained fiat assertions are not necessarily from an Abox. The reason is that at Step 1, we explain a fiat assertion of the interpretation alternative with the highest rank ImaxScore . Note that ImaxScore changes over time and does not indicate the same Abox. Thus, an explained fiat assertion is from an Abox which has the highest rank at a point of time. 61 4.2 Controlling the Interpretation Process Let us assume k fiat assertions. Furthermore, we assume that the steps in 4.2.1 are inside a loop which iterates m times, maximally. For k < m, the loop iterates k times. Thus, k fiat assertions are explained. For k ≥ m, the loop iterates m times and m fiat assertions are explained. The question is how to select variable m, appropriately. By selecting a small number for m, we might lose some interpretation results. The reason is that all fiat assertions might not be explained, completely. By selecting a large value for m, the interpretation space becomes very large. Thus, we cannot control the abduction depth anymore. A future work topic is the selection of variable m, automatically. Similar to controlling branching, the advantage of controlling abduction depth is that we reduce time and memory space required for the computation of the interpretation procedure. Furthermore, we reduce the size of the interpretation space. In the following, we give two examples. Example 7 shows how the scoring value of an interpretation alternative increases monotonically by explaining fiat assertions. Example 8 shows how controlling branching and controlling abduction depth defined in this work are applied. Example 7 Let us consider the following Abox A: A = {Car(c1 ), DoorSlam(ds1 ), causes(c1 , ds1 ), EnvConference(ec1 ), Env(e1 ), hasT opic(ec1 , e1 )} Furthermore, let us assume that the following weighted rules are given. We consider them for the determination of the scoring values of the interpretation Aboxes: 5 ∀x, y, z CarEntry(z) ∧ hasObject(z, x) ∧ hasEffect(z, y) → Car(x) ∧ DoorSlam(y) ∧ causes(x, y) 5 ∀x, y, z EnvP rot(z) ∧ hasEvent(z, x) ∧ hasTheme(z, y) → EnvConference(x) ∧ Env(y) ∧ hasT opic(x, y) The next backward chaining rules are constructed according to the above weighted rules: ∀x, y causes(x, y) ← ∃z CarEntry(z), hasObject(z, x), hasEffect(z, y), Car(x), DoorSlam(y) ∀x, y hasT opic(x, y) ← ∃z EnvP rot(z), hasEvent(z, x), hasTheme(z, y), EnvConference(x), Env(y) (4.10) (4.11) For simplification reasons, we assume there are no forward chaining rules to be applied. In A, the fiat assertions are: F iats(A) = {causes(c1 , ds1 ), hasT opic(ec1 , e1 )} 62 Probabilistic Control for DLLP-Abduction Based Interpretation The evidence vector considered for the inference determination contains the observations (except the fiat assertions): ~e(A) = hCar(c1 ) = true, DoorSlam(ds1 ) = true, EnvConference(ec1 ) = true, Env(e1 ) = truei The probability of an unexplained fiat assertion is considered as 0.5. Since A has two unexplained fiat assertions, its scoring value is: P (A, WR) = 1 (0.5 × 2) = 0.5 2 We begin by explaining the fiat assertion causes(c1 , ds1 ). This fiat assertion is explained by the backward chaining rule 4.10. Thus, the interpretation Abox A0 is: A0 = A ∪ {CarEntry(ind42 ), hasObject(ind42 , c1 ), hasEffect(ind42 , ds1 )} The evidence vector ~e(A0 ) contains the observations, and the hypothesized assertions except the fiat assertions: ~e(A0 ) = ~e(A) ∪ hCarEntry(ind42 ) = true, hasObject(ind42 , c1 ) = true, hasEffect(ind42 , ds1 ) = truei In the following, we determine the probability of the first fiat assertion given the evidences: P (causes(c1 , ds1 ) | ~e(A0 )) = 0.824 Thus, the scoring value of the interpretation Abox A0 is: 1 P (A0 , WR) = (0.824 + 0.5) = 0.662 2 By applying 4.11, we explain the second fiat assertion. The new interpretation Abox is: A00 = A0 ∪ {EnvP rot(ind44 ), hasEvent(ind44 , ec1 ), hasTheme(ind44 , e1 )} The evidence vector ~e(A00 ) is extended by the hypothesized assertions: ~e(A00 ) = ~e(A0 ) ∪ hEnvP rot(ind44 ) = true, hasEvent(ind44 , ec1 ) = true, hasTheme(ind44 , e1 ) = truei 63 4.2 Controlling the Interpretation Process In order to determine the scoring value, we determine the probability of the fiat assertions: ~ 00 )) = 0.824 P (causes(c1 , ds1 ) | e(A ~ 00 )) = 0.824 P (hasT opic(ec1 , e1 ) | e(A The new scoring value is: P (A00 , WR) = 1 (0.824 + 0.824) = 0.824 2 The next table shows the summary of the results: P (A, WR) P (A0 , WR) P (A00 , WR) = = = 0.5 0.662 0.824 Table 4.1: The summary of the results This example shows that by successively explaining fiat assertions of an Abox, the scores of the generated interpretation Aboxes increase monotonically (the weights used in the weighted rules are assumed to be positive). Example 8 In this example, we discuss how the branching and depth controlling described in this work are applied. Let us assume that an Abox A is given, with four unexplained fiat assertions. For simplification reasons, the maximum number of fiat assertions to be explained m is 2. In this example, we discuss the first three steps S1 , S2 , S3 according to the algorithm. We assume at step S1 , two backward chaining rules are applicable to explain one of the fiat assertions. Thus, A is replaced by A1 and A2 where the scores are 0.8 and 0.7, respectively. Thus, A1 and A2 have one explained and three unexplained fiat assertions. Since we apply Beam search for finding the final interpretation Abox, we continue the next step with the Abox with the highest score, namely A1 . At S2 , we explain one of the unexplained fiat assertions of A1 . Let us assume that two backward chaining rules are again applicable to explain one of the unexplained fiat assertions. Thus, A1 is replaced by A3 and A4 where the scores are 0.9 and 0.6. A3 and A4 have two explained and two unexplained fiat assertions. Let us assume the agenda size is three. Since the interpretation alternatives on the agenda have to be sorted by scoring values, at S3 we sort the interpretations on the agenda. This example is depicted in Figure 4.1: 64 Probabilistic Control for DLLP-Abduction Based Interpretation Figure 4.1: Example for branching and depth controlling In the above example, we show branching and depth controlling in this work. At each step, only a fiat assertion of the Abox with the maximum score is explained. Thus, we avoid the generation of many interpretation Aboxes with possibly lower scores. The procedure realizes a kind of beam search and is analogous to the procedure implemented in [BKN11] as explained in Section 3.4. What is different is the way we compute the scoring values. This example is similar to the example given in [Has11]. 4.2.3 Controlling Reactivity Controlling reactivity is the processing procedure of bunches of input data. Since the observations arrive incrementally [HBB+ 13,GMN+ 10c], we have to apply approaches to incrementally process data. In this section, we discuss controlling reactivity procedure applied in this work. We apply two different approaches when a new bunch of assertions arrives: • stop-processing • non-stop-processing 65 4.3 Comparison with Independent Choice Logic According to the stop-processing approach, we stop processing the current bunch of assertions when the following bunch arrives. Thus, the remaining unexplained fiat assertions of the current bunch are not explained any more. According to the non-stop-processing approach, we continue processing the unexplained fiat assertions of the current bunch although the following bunch has already arrived and waits for processing in the analysis queue. Each approach has advantages and disadvantages. The disadvantage of the stopprocessing approach is that we might lose some data since we do not explain the remaining unexplained fiat assertions. The advantage of this approach is that we switch quickly to the new input data and might gain more important data. The disadvantage of the non-stop-processing approach is that the processing of some bunches are delayed since the previous bunches might have many assertions to be explained. Thus, with delay we will gain the new information which might be very important. The advantage of this approach is that all observations are explained completely and we do not lose any data. 4.3 Comparison with Independent Choice Logic In this section, we compare the preference score in this work with the one used for Independent Choice Logic [Poo91]. In both contexts, preference scores are used for two different purposes. Poole’s preference score is defined for diagnostic frameworks whereas in this work the preference score is defined for controlling Abox abduction in the context of multimedia interpretation. According to the Poole’s approach, the probability of observations obs given hypothesis D is 1: P (obs | D) = 1 (4.12) where each hypothesis D = {h1 , . . . , hn } is determined according to a rule in ICL. In order to determine hypothesis D, we consider only the rules where the predicate of the observation matches with the head of the rule. The following example shows how an explanation for a set of observations is determined. Let us consider the observation obs = {carEntry} and the following rules: carEntry ← car ∧ doorSlam ∧ carEntryIfCarDoorSlam carEntry ← car ∧ ¬doorSlam ∧ carEntryIfCarNoDoorSlam (4.13) (4.14) Two explanations for obs = {carEntry} are: D1 = {car, doorSlam, carEntryIfCarDoorSlam} D2 = {car, ¬doorSlam, carEntryIfCarNoDoorSlam} (4.15) (4.16) In this work, the observations are uncertain and we support the observations by the interpretation procedure. Thus, it holds that P (obs | D) 6= 1 66 (4.17) Probabilistic Control for DLLP-Abduction Based Interpretation Additionally, the instances of hypotheses in Poole’s approach are logically independent, which does not hold in our work. The preference score in [Poo91] is defined according to the Maximum Likelihood approach [RN03] as follows: argmaxD P (D | obs) (4.18) whereas the preference score of this work is determined according to the Maximum A Posteriori approach [DLK+ 08]: argmaxD P (obs | D) (4.19) Note that the determination of the scoring value depends on the context it is used for. We cannot apply Poole’s approach in this work since the logical independency does not hold among the assertions of an explanation in our context. The other important difference is that Poole determines the probability of an explanation whereas our approach is more general and determines the probability of an interpretation, which still might contain unexplained assertions. Additionally, unlike Poole, we deal with uncertain data P (obs | D) 6= 1. 4.4 Conversion of the Knowledge Base into ML Notation The applied knowledge base Σ = (T , A) in this work is a tuple composed of a Tbox T consisting of Description Logic axioms and an Abox A consisting of weighted/strict assertions. In this section, we discuss that the Tbox axioms do not have any influence on the determination of the scoring values of the interpretation Aboxes. Thus, the Tbox axioms are not converted into the Markov logic notation. The other reason why we do not consider Tbox axioms during the determination of the scoring values is that they weaken the influence of the weighted rules and do not allow the weighted rules to play any role. Considering the Tbox axioms during the determination of the scoring values sets the scoring value of each interpretation Abox to 1 and it is not any more possible to see how the scoring values step by step increase by explaining the fiat assertions. Example 9 Let us assume the following Abox A: A = {Car(c1 ), DoorSlam(ds1 ), causes(c1 , ds1 )} Furthermore, we consider the next backward chaining rule: ∀x, y causes(x, y) ← ∃z CarEntry(z), hasObject(z, x), hasEffect(z, y), Car(x), DoorSlam(y) 67 4.4 Conversion of the Knowledge Base into ML Notation By applying the above backward chaining rule, the following interpretation Abox A0 is generated: A0 = A ∪ {CarEntry(ind42 ), hasObject(ind42 , c1 ), hasEffect(ind42 , ds1 )} Moreover, let us assume the Tbox axiom: ∀z CarEntry(z) ⇒ M ovement(z) ∧ ∃x [hasObject(z, x) ⇒ Car(x)] ∧ ∃y [hasEffect(z , y) ⇒ DoorSlam(y)] and the weighted rule: 5 ∀z, x, y CarEntry(z) ∧ hasObject(z, x) ∧ hasEffect(z, y) ⇒ Car(x) ∧ DoorSlam(y) ∧ causes(x, y) Thus, the scoring value of A0 is: P (A0 , WR) = 1 This example shows that by considering the Tbox axioms in the knowledge base, the scoring values of the interpretation Aboxes are 1. Thus, with strict Tbox axioms considered it is not possible any more to sensibly compare the scoring values of the generated interpretation Aboxes. We emphasize that the Tbox is used during the abduction process as a filter to remove the inconsistent interpretation Aboxes. Thus, for the determination of the scoring values, a Tbox is not required. According to the above discussion, we consider only the weighted rules WR for the determination of the scoring values of Abox A. Thus, we use the notation P (A, WR) for the determination of scores. In the following, the function AlchemyM LN (A, WR) is given which describes how weighted rules WR and the Abox A are converted to Alchemy’s Markov logic notation [KSR+ 10]. See Appendix A for details about the systems. The function AlchemyM LN takes as input two parameters, namely an Abox A, and a set of weighted rules WR. Based on Alchemy’s processing techniques, the type of each concept must be specified where the type is a non-empty set containing constants of that type. Note that constants in the context of Markov logic formalism [DR07] correspond to individuals in Description Logics [BCM+ 03]. In case of a resulting empty set for certain types, individuals are randomly generated and assigned to the extension of the types. Similarly, the domain and range types of each role are specified by nonempty types. In case of having empty types, the MLN cannot be constructed by Alchemy [KSR+ 10]. The output of the AlchemyM LN function is a knowledge base Σ = (T, P, F) containing three sets T, P and F which indicate the type set, predicate set, and the formula 68 Probabilistic Control for DLLP-Abduction Based Interpretation set, respectively. At the beginning of the AlchemyM LN function these sets are initialized to empty sets. Since the role names in the Markov logic notation begin with capital letters [KSR+ 10], the function CapitalLetter is applied to the role names of the DL-knowledge base. The set of formulas F contains the weighted rules. In the next algorithm, we define types for all concepts and roles mentioned in the weighted rules. 69 4.4 Conversion of the Knowledge Base into ML Notation Algorithm 5: The knowledge base conversion algorithm Function AlchemyMLN(A, WR): Input: an Abox A, and the set of weighted rules WR Output: a knowledge base Σ = (T, P, F) with the type set T, predicate set P, and the formula set F Σ = (∅, ∅, ∅); foreach C ∈ Concepts(WR) do t := GenerateN ewT ypeN ame(); q := {(x) | C(x)}; result = evaluate(q); if result = ∅ then NewInd := GenerateNewIndividual (); T := T ∪ {t + ” = ” + ”{” + NewInd + ”}”}; else T := T ∪ {t + ” = ” + ”{” + CapitalLetter(result) + ”}”}; P = P ∪ {C(t)}; foreach r ∈ Roles(WR) do t1 = GenerateN ewT ypeN ame(); t2 = GenerateN ewT ypeN ame(); q := {(x, y) | r(x, y)}; result = evaluate(q); if result = ∅ then NewInd 1 := GenerateNewIndividual (); T := T ∪ {t1 + ” = ” + ”{” + NewInd 1 + ”}”}; NewInd 2 := GenerateNewIndividual (); T := T ∪ {t2 + ” = ” + ”{” + NewInd 2 + ”}”}; else T := T ∪ {t1 + ” = ” + ”{” + CapitalLetter(MapFirst(result)) + ”}”}; T := T ∪ {t2 + ” = ” + ”{” + CapitalLetter(M apSecond(result)) + ”}”}; R = CapitalLetter(r); P = P ∪ {R(t1 , t2 )}; foreach wr ∈ WR do F = F ∪ {wr}; return Σ; 70 Probabilistic Control for DLLP-Abduction Based Interpretation The generated knowledge base Σ is saved in a file with the extension .mln [KSR+ 10]. In the above algorithm, Concepts(WR) and Roles(WR) return a set containing concepts and roles mentioned in the weighted rules WR, respectively. The function GenerateN ewT ypeN ame generates type names beginning with noncapital letters and the function GenerateNewIndividual generates individual names beginning with capital letters. The functions MapFirst and M apSecond are defined as follows: [ MapFirst((X, Y )) := {x} = X (4.20) x∈X [ MapSecond ((X, Y )) := {y} = Y (4.21) y∈Y The functions MapFirst and M apSecond take as parameter a set of pairs (X, Y ) and return a set consisting of the first and the second elements of the pairs, respectively. The next query which is applied in the previous algorithm determines instances which have the same type: q = {(X) | C(X)} Let us consider the next query which retrieves the instances of the concept Car: q = {(X) | Car(X)} Furthermore, let us assume that the result of the above query is: r = {c1 , c2 } Let us consider the following query: q = {(X, Y ) | causes(X, Y )} and let us assume that the result of the query q is as follows: r = {(c1 , ds1 ), (c2 , ds2 )} Applying MapFirst and M apSecond returns the domain and range instances of causes, respectively: MapFirst(r) = {c1 , c2 } M apSecond(r) = {ds1 , ds2 } In the following, we discuss how the Abox assertions are converted into a proper form for inference according to the Markov logic formalism [DR07]. For Markov logic inference, there is a separate file called evidence file or database file with extension .db [KSR+ 10]. The database file contains evidences which are strict assertions of A. The Abox which 71 4.4 Conversion of the Knowledge Base into ML Notation should be converted into an evidence file is consistent and contains weighted/strict assertions where some assertions are fiat assertions. Since the evidence file contains only strict assertions, the most probable world given the Abox is determined. Then, the positive assertions of the most probable world existing in A are selected. Among the positive assertions, there are some fiat assertions. The fiat assertions are used for the generation of the query at each step. Except fiat assertions, the remaining positive assertions stand as strict assertions in the evidence file. The following table depicts the correspondence among the strict Abox assertions in Description Logics [BCM+ 03] and in Markov logic notation [KSR+ 10]. Note that according to the Markov logic notation the role names and individual names begin with capital letters. Abox assertion A(ind) r(ind1 , ind2 ) Markov logic evidence A(Ind ) R(Ind 1 , Ind 2 ) Table 4.2: The correspondence among the Abox assertions in DL and in MLN In the above table, A and r indicate an atomic concept name and an atomic role name, respectively. The algorithm AlchemyDB describes how the evidence file is generated. This algorithm takes as parameter the set of concept names C, the set of role names R, Tbox T and an Abox A. The Abox A contains weighted/strict assertions where some assertions are fiat assertions. The output of this algorithm is an evidence vector ~e which contains strict non-fiat assertions. The evidence vector is the content of an evidence file with extension .db. At the beginning of the algorithm, the evidence vector is initialized to an empty set. In order to consider only positive assertions of the most probable world given the Abox A, Select and M AP functions are applied. Since according to the Markov logic formalism, the individual names and the role names begin with capital letters [KSR+ 10], we apply the function CapitalLetter to these names. Each role assertion is examined whether it is a non-fiat assertion. In this case, it is added to the evidence vector ~e. The fiat assertions are considered for the query generation. 72 Probabilistic Control for DLLP-Abduction Based Interpretation Algorithm 6: The database generation algorithm Function AlchemyDB(C, R, T , A): Input: a set of concept names C, a set of role names R, a Tbox T , and an Abox A Output: an evidence vector ~e ~e = ∅; A = Select(M AP (A, T ), A); foreach C(ind) ∈ A where C ∈ C do ~e = ~e ∪ {C(Ind)}; foreach r(ind1 , ind2 ) ∈ A where r ∈ R and fiat(r(ind1 , ind2 )) = false do R = CapitalLetter(r); ~e = ~e ∪ {R(Ind1 , Ind2 )}; return ~e; In Algorithm 6, we did not consider the fiat assertions of the Abox A for the generation of the evidence file. Furthermore, we mentioned that the fiat assertions are considered for the generation of the queries. The following algorithm defines how the query is generated. A query is generated based on the fiat assertions of the Abox A. We assume that some role assertions are fiat assertions. The input to the algorithm AlchemyQuery is the Abox A and the output is the query Q. Since the role names and individual names in MLN notation begin with capital letters [KSR+ 10], CapitalLetter function is applied to these names. Again we have to determine the most probable world and select the positive assertions of the most probable world. At each step, the fiat assertions are divided into explained and unexplained fiat assertions. Q is an explained fiat assertion of the Abox A. In the following algorithm, we select an explained fiat assertion of A, randomly. We assume that the probability of an unexplained fiat assertion to be true is 0.5. Thus, we do not generate queries for unexplained fiat assertions. Consequently, we reduce the required time for query generation and query answering of unexplained fiat assertions. In the following algorithm, the function Explained F iats(A) returns a set containing explained fiat assertions of A. 73 4.4 Conversion of the Knowledge Base into ML Notation Algorithm 7: The query generation algorithm Function AlchemyQuery(A,T ): Input: an Abox A, and a Tbox T Output: a query Q A = Select(M AP (A, T ), A); r(ind1 , ind2 ) = random select(Explained F iats(A)); Explained F iats(A) = Explained F iats(A) \ {r(ind1 , ind2 )}; R = CapitalLetter(r); I1 = CapitalLetter(ind1 ); I2 = CapitalLetter(ind2 ); Q = R(I1 , I2 ); return Q; Example 10 The next example shows how the equivalence of an Abox A in Markov logic notation is determined. Let us assume the following sets of concept - and role names which are determined according to the Tbox T : C = {Car, DoorSlam, CarEntry, CarExit} R = {causes, hasObject, hasEffect} Consider the next Abox: 1.3 Car(c1 ) 1.2 DoorSlam(ds1 ) causes(c1 , ds1 ) CarEntry(ind42 ) hasObject(ind42 , c1 ) hasEffect(ind42 , ds1 ) Table 4.3: Example for an Abox Unlike in description logics [BCM+ 03], the types must be specified according to the Markov logic formalism [DLK+ 08]. The types are non-empty sets which contain individuals. In order to determine the constants of a concept type, the following queries are generated: q1 q2 q3 q4 = = = = {(X) | Car(X)} {(X) | DoorSlam(X)} {(X) | CarEntry(X)} {(X) | CarExit(X)} 74 Probabilistic Control for DLLP-Abduction Based Interpretation The results of the above queries are as follows: result1 result2 result3 result4 = = = = {c1 } {ds1 } {ind42 } ∅ Similarly, in order to determine the constants of a domain and range type of a role, the next queries are generated: q5 = {(X, Y ) | causes(X, Y )} q6 = {(X, Y ) | hasObject(X, Y )} q7 = {(X, Y ) | hasEffect(X, Y )} The results of the above queries are as follows: result5 = {(c1 , ds1 )} result6 = {(ind42 , c1 )} result7 = {(ind42 , ds1 )} In order to determine the individuals which belong to the domain type of each relation the MapFirst function should be applied: MapFirst(result5 ) = {c1 } MapFirst(result6 ) = {ind42 } MapFirst(result7 ) = {ind42 } Similarly, in order to determine the individuals which belong to the range type of each relation the M apSecond function should be applied: M apSecond(result5 ) = {ds1 } M apSecond(result6 ) = {c1 } M apSecond(result7 ) = {ds1 } Thus, in total there are 10 types t1 , . . . , t10 where t1 , . . . , t4 indicate the types of the concepts Car, DoorSlam, CarEntry, and CarExit, and t5 , t6 , t7 denote the domain types of the relations causes, hasObject, and hasEffect and t8 , t9 , t10 denote the range type of these relations, respectively. Since result4 is an empty set, a new individual newInd1 is generated and assigned to t4 . Since the individual names begin with capital letters [KSR+ 10], the function CapitalLetter is applied to the individual names. The type set T contains the generated types and the corresponding assigned individuals. Unlike the type names which do not begin with capital letters, the individual names begin with capital letters. In the following, the knowledge base Σ = (T, P, F) generated by the AlchemyM LN function is given where T, P, F indicate the type set, the predicate set and the formula set, respectively. The knowledge base Σ is in a file with extension .mln called in this example example.mln: 75 4.4 Conversion of the Knowledge Base into ML Notation t1 = {C1 } t2 = {Ds1 } t3 = {Ind42 } t4 = {N ewInd1 } t5 = {C1 } t6 = {Ind42 } t7 = {Ind42 } t8 = {Ds1 } t9 = {C1 } t10 = {Ds1 } Car(t1 ) DoorSlam(t2 ) CarEntry(t3 ) CarExit(t4 ) Causes(t5 , t8 ) HasObject(t6 , t9 ) HasEffect(t7 , t10 ) 5 CarEntry(z) ∧ HasObject(z, x) ∧ HasEffect(z, y) ⇒ Car(x) ∧ DoorSlam(y) ∧ Causes(x, y) 5 CarExit(z) ∧ HasObject(z, x) ∧ HasEffect(z, y) ⇒ Car(x) ∧ DoorSlam(y) ∧ Causes(x, y) Table 4.4: example.mln file Note that the .mln file contains only the weighted rules. The forward and the backward chaining rules are not in this file. The AlchemyDB function generates the evidence vector ~e which builds the database file with the extension .db. This file contains only the strict non-fiat assertions and in this example is called evidence.db. Car(C1 ) DoorSlam(Ds1 ) CarEntry(Ind42 ) HasObject(Ind42 , C1 ) HasEffect(Ind42 , Ds1 ) Table 4.5: The database file evidence.db 76 Probabilistic Control for DLLP-Abduction Based Interpretation The AlchemyQuery function generates the query Q which is an explained fiat assertion of the Abox. Since there is only one explained fiat assertion in the Abox, the query is Q = Causes(C1 , Ds1 ) and the conditional probability we are interested in is: P (Causes(C1 , Ds1 ) | ~e) (4.22) The command which determines the above conditional probability in Alchemy system [KSR+ 10] is: infer -i example.mln -r example.result -e evidence.db -q Causes(C1 , Ds1 ) The value of the conditional probability expression given in 4.22 is the scoring value of the Abox A and is given in an output file. In the following, the output file of the query 4.22 is given which is called in this example example.result. Note that the output file has the extension .result. Causes(C1 , Ds1 ) 0.91 Table 4.6: The output file example.result The output file example.result indicates that: P (Causes(C1 , Ds1 ) | ~e) = 0.91 The output file example.result for query Q has the following form: Q p Table 4.7: The general form of the output file The above output file has one row and two columns. The first column indicates the query Q and the second column denotes the corresponding probability p where: p = P (Q | ~e) where ~e indicates the evidence vector. The probability value we are interested in is p. 77 4.4 Conversion of the Knowledge Base into ML Notation 78 Chapter 5 Evaluation In the previous chapter, we presented an approach for Abox abduction using probabilistic branch and depth control. In this chapter, we evaluate the results of our approach. The performance and the quality of the results are evaluated. The experiments were run on a Linux CALLEO 552 server (Ubuntu 10.04.3 LTS) with an AMD Eight-Core Opteron 6136 (2.4 GHz) processor and 128 GB of main memory. In these experiments, a set of videos from the environmental domain has been used. The videos were analyzed by state-of-the-art analysis tools [HBB+ 13]. The analysis results which are sent incrementally build an Abox. The analysis results are sent to the interpretation system which employs the probabilistic interpretation algorithm presented in the previous chapter. 5.1 Optimization Techniques A Markov logic network [DR07] consists of a set of weighted/strict first-order formulas and a set of constants. In this work, the weighted rules are considered as formulas, and the individuals mentioned in the observations or in the hypothesized assertions are the constants of the network. These constants are associated to the corresponding predicate types. A predicate type is a non-empty set. Thus, for the remaining predicates of the weighted rules with no constants, new constants are generated and assigned. The newly introduced constants represent objects of a certain predicate which might be used later during the inference. In our experiments, the constructed Markov logic networks were quite large, despite the fact that not too many weighted rules were specified. Since for answering a particular query the entire network is not required, we construct only a relevant subnetwork. Thus, we vastly reduce time and memory space required for inference. The required constants for constructing the subnetwork are mentioned in the observations and in the hypothesized assertions. According to the predicates which appear in the observations and the hypothesized assertions, the relevant subset of the weighted rules for answering a particular query is determined. The selected weighted rules are the 79 5.1 Optimization Techniques relevant formulas for constructing the subnetwork. The remaining weighted rules are irrelevant and are left out consequently. The approach explained above is inspired by knowledge-based model construction (KBMC) approaches (e.g., [NH97]) where only a fraction of the entire knowledge base is selected and instantiated. The selected fraction is relevant for answering a particular query. Thus, based on the interpretation Abox A, we determine the required knowledge base Σ for the inference procedure. Consequently, the considered knowledge base is a function of the interpretation Abox Σ(A). The KBMC approach is also applied for inference in the context of Markov logic networks in [DLK+ 08]. According to [DLK+ 08], the results are the same if we consider Σ or Σ(A) as the knowledge base. For optimization purposes, Alchemy [KSR+ 10] uses sampling algorithms for inference e.g. MC-SAT [PD06], Gibbs sampling [GG84]. Unfortunately, the sampling algorithms implemented in Alchemy do not provide for required optimizations if we consider a knowledge base with many axioms and too many assertions. Thus, in this work we had to provide the Alchemy engine with KBMC reduced set of formulas for optimization purposes. Example 11 In this example, we show how the KBMC approach is applied in the context of this work. During the experiments, the query and the database file are constant but the number of weighted rules to be considered changes. Let us assume the following weighted rules: 5 0.5 3 0.5 5 ∀z, x, y CarEntry(z) ∧ HasObject(z, x) ∧ HasEffect(z, y) ⇒ Car(x) ∧ DoorSlam(y) ∧ Causes(x, y) ∀z, x, y CarExit(z) ∧ HasObject(z, x) ∧ HasEffect(z, y) ⇒ Car(x) ∧ DoorSlam(y) ∧ Causes(x, y) ∀z, x, y EnvW orkshop(z) ∧ HasSubEvent(z, x) ∧ HasLocation(z, y) ⇒ CarEntry(x) ∧ Building(y) ∧ OccursAt(x, y) ∀z, x, y EnvP rot(z) ∧ HasEvent(z, x) ∧ HasT heme(z, y) ⇒ EnvConference(x) ∧ Env(y) ∧ HasT opic(x, y) ∀z, x, y RenewableEnergy(z) ∧ HasP art(z, x) ∧ HasP art(z, y) ⇒ Energy(x) ∧ W inds(y) ∧ EnergyT oW inds(x, y) .. . Furthermore, let us assume that the database file of the Abox A contains the following assertions: ~e(A) = {Car(C1 ), DoorSlam(DS1 ), CarEntry(Ind42 ), HasObject(Ind42 , C1 ), HasEffect(Ind42 , DS1 ), Building(Ind43 ), EnvW orkshop(Ind45 ), HasSubEvent(Ind45 , Ind42 ), HasLocation(Ind45 , Ind43 )} 80 Evaluation There are two explained fiat assertions in A, namely: Causes(C1 , DS1 ), OccursAt(Ind42 , Ind43 ) Then, the queries are: Q1 (A) = hCauses(C1 , DS1 ) = truei Q2 (A) = hOccursAt(Ind42 , Ind43 ) = truei In the following, we determine the values of the above mentioned queries: PM LN (A, WR)(Q1 (A) | ~e(A)) = 0.83 PM LN (A, WR)(Q2 (A) | ~e(A)) = 0.76 The scoring value of A is determined according to the next formula: 1 (PM LN (A, WR)(Q1 (A) | ~e(A)) + PM LN (A, WR)(Q2 (A) | ~e(A))) 2 = 0.80 P (A, WR) = For answering the query, only the first three weighted rules are relevant since the predicates in the database and in the query are mentioned only in the first three weighted rules. Thus, the remaining rules are irrelevant and can be left out. Thus, we reduce the knowledge base and the problem complexity. In the next table, the results of removing more and more irrelevant rules are given: Number of irrelevant weighted rules 0 1 2 3 .. . Inference value 0.80 0.80 0.80 0.80 .. . Table 5.1: Results of the KBMC approach in this example The above table shows that the inference value does not change by increasing the number of irrelevant weighted rules. Thus, we have the same results with Σ(A) and Σ. According to Algorithm 5, the performance of the Alchemy engine [KSR+ 10] decreases with a large set of weighted rules. In the following, we reduce the size of the knowledge base (the weighted rules). The required subset of the weighted rules is determined according to the interpretation Abox A. A weighted rule W is considered as relevant if a mentioned predicate in 81 5.1 Optimization Techniques A is a conjunct of W . According to Algorithm 5, the type of each predicate mentioned in the weighted rules is defined. Note that the number of the relevant weighted rules increases during the inference procedure since newly generated hypothesized assertions are added to the interpretation Abox. Thus, the new predicates mentioned in the Abox leads to considering more relevant weighted rules. Consequently, the considered subnetwork increases over the time. Accordingly, we define types for the concepts and relations mentioned in the newly considered weighted rules. The next algorithm is an improvement of Algorithm 5 where we determine the relevant weighted rules. Accordingly, we determine the required predicates. Thus, the knowledge base and consequently the runtime are reduced. 82 Evaluation Algorithm 8: The knowledge base conversion algorithm Function AlchemyMLN(A, WR): Input: an Abox A, and the set of weighted rules WR Output: a knowledge base Σ = (T, P, F) with the type set T, predicate set P, and the formula set F Σ = (∅, ∅, ∅); foreach α ∈ A do foreach wr ∈ WR do if predicateSym(α) ∈ predicateSymbols(wr) then F = F ∪ {wr}; foreach C ∈ Concepts(F) do t := GenerateN ewT ypeN ame(); q := {(x) | C(x)}; result = evaluate(q); if result = ∅ then N ewInd := GenerateN ewIndividual(); T := T ∪ {t + ” = ” + ”{” + N ewInd + ”}”}; else T := T ∪ {t + ” = ” + ”{” + CapitalLetter(result) + ”}”}; P = P ∪ {C(t)}; foreach r ∈ Roles(F) do t1 = GenerateN ewT ypeN ame(); t2 = GenerateN ewT ypeN ame(); q := {(x, y) | r(x, y)}; result = evaluate(q); if result = ∅ then N ewInd1 := GenerateN ewIndividual(); T := T ∪ {t1 + ” = ” + ”{” + N ewInd1 + ”}”}; N ewInd2 := GenerateN ewIndividual(); T := T ∪ {t2 + ” = ” + ”{” + N ewInd2 + ”}”}; else T := T ∪ {t1 + ” = ” + ”{” + CapitalLetter(MapFirst(result)) + ”}”}; T := T ∪ {t2 + ” = ” + ”{” + CapitalLetter(M apSecond(result)) + ”}”}; R = CapitalLetter(r); P = P ∪ {R(t1 , t2 )}; return Σ; 83 5.2 Case Study: CASAM where predicateSym(α) returns the mentioned predicate p in the assertion α: predicateSym(α) = p (5.1) Furthermore, predicateSymbols(wr) returns a set containing predicates mentioned in the weighted rule wr: predicateSymbols(wr) = {p | p is a mentioned predicate in wr} (5.2) Moreover, Concepts(F) and Roles(F) return a set containing concepts and roles mentioned in the formula set, respectively. The above algorithm reduces runtime while preserving correct probability values. 5.2 Case Study: CASAM The goal of the CASAM project [HBB+ 13] is the annotation of multimedia documents through the collaboration of human annotators and machine intelligence. Thus, it speeds up the task of manually annotation of multimedia documents. Additionally, it reduces the effort and increases the accuracy to attach annotations to the multimedia documents. The CASAM project is a European research project under the seventh framework programme (FP7) for research and technological development. CASAM stands for Computer Aided Semantic Annotation of Multimedia (FP7-217061). The main components of the CASAM project are KDMA, RMI and HCI. In the following, these components and their functionalities are described: • The analysis component of CASAM called ”Knowledge-Driven Multimedia Analysis” (KDMA) processes the multimedia documents by exploiting media analysis tools and detects objects within the multimedia document. The output of this component is an Abox which contains surface level information. The generated Aboxes are sent to the RMI component described below. • The reasoning component of CASAM called ”Reasoning for Multimedia Interpretation” (RMI) is the core of the CASAM project. This component receives assertions produced by KDMA and infers higher level interpretations of the multimedia content. The generated interpretations are sent to the next component called HCI. Additionally, in order to disambiguate the interpretations, queries are generated [GMN+ 10b,HBB+ 13] and sent to the KDMA and HCI components in order to indicate what would be relevant information to discriminate between similarly ranked interpretation possibilities. In this work, we do not discuss query generation and feedback issues. • The end-user component of CASAM called ”Human Computer Interaction” (HCI) displays the multimedia document to the user. Furthermore, the surface level assertions generated by KDMA and the interpretations generated by RMI are 84 Evaluation presented to the user through a graphical user interface which was developed during the CASAM project [HBB+ 13]. The user decides whether the annotation process can be terminated. This will happen when the user recognizes that the multimedia document is sufficiently annotated. The objective of the CASAM project is to develop a system for semantic annotation of multimedia documents. For the CASAM project, the environmental domain has been chosen as the application scenario [GMN+ 09b]. It is a large domain covering many aspects such as environmental pollution, conferences, catastrophes, hazards and conservation attempts. In order to represent general knowledge of this domain, a domain ontology has been developed. The set of concept and role names are called the signature of the ontology. Two different ontologies have been defined during the CASAM project [HBB+ 13], namely the Environmental Domain Ontology (EDO) [GMN+ 09b] and the Multimedia Content Ontology (MCO) [GMN+ 11]. In EDO, we define the concepts and relations in terms of the environmental domain. The MCO defines the concepts and relations required for describing the structure of a multimedia document. The concepts and relations defined in the MCO are modality specific since each multimedia document might contain multiple modalities including audio, video, text. According to MCO, it is possible to indicate from which modality the assertions have been generated. 5.3 Hypotheses and Results Before defining the hypotheses, we briefly explain the video analysis approach. A video document is divided into multiple video segments [HBB+ 13, GMN+ 11] which do not necessarily have the same length. Video segments (shots) are determined automatically according to certain patterns of scene changes in the video. Each video segment is analyzed by a set of video analysis tools. The analysis component delivers description in a piecewise manner [HBB+ 13]. Each “piece” is a set of Abox assertions, which we call “bunch” here. A bunch of assertions contains the analysis results of a video segment. Since the video segments do not necessarily have the same length, the bunches of assertions do not arrive at equal time distances at well-defined time points. The other consequence is that the number of fiat assertions in the bunches is not necessarily the same. Let us consider two bunches of assertions where a bunch with many fiats is followed by a bunch with only a few fiats. Accordingly, there is a delay to handle the second bunch. Processing it in time might be very beneficial, however. Thus, the processing strategy must be adaptive. The CASAM dataset is just an example for many similar datasets, with the characteristics that one bunch of assertions has not yet been processed completely while the next one already arrives. Thus, there is a tradeoff between spending computational power on completely interpreting a bunch and focusing on current observations, while 85 5.3 Hypotheses and Results skipping possible interpretations for older input. Multiple strategies are investigated in experiments and the results of the experiments are described in the following. Let us assume that ti is the arrival time of bunch Bi . Similarly, the following bunch is Bi+1 with the arrival time ti+1 . We assume furthermore that m indicates the maximum number of fiat assertions to be explained in a bunch. Let us assume that Bunch Bi has k fiat assertions. We consider t as the required time for explaining k fiats of Bi . In the following, we consider two different cases and we assume that fiat assertions of Bi−1 are explained in time and there is no time overrun: • ti + t ≤ ti+1 and k ≤ m: Thus, all fiat assertions of Bi are explained before ti+1 and there is no time overrun. • ti + t > ti+1 and k > m: Thus, m fiat assertions of Bi are explained before ti+1 . The system behaviour at time point ti+1 is to stop explaining the remaining k − m unexplained fiat assertions of Bi and start explaining the fiat assertions of Bi+1 . The selection of m affects the interpretation results. If m is too large, there might be a delay to explain fiat assertions of the following bunches. Thus, the interpretation results of the following bunches might be generated later than required. Since the interpretation results might be relevant, the system might react later than it should do. In this case, we ignore some bunches. If m is a small number, we might not explain some fiat assertions of a bunch. Thus, we might lose the corresponding interpretation results. There is a tradeoff between the maximum number of fiat assertions to be explained in a bunch and the number of bunches to be processed. A future work topic could be to discuss how variable m should be selected automatically. Stop processing can arrive at any point of time since it is activated externally by the user. In the following, the next set of hypotheses and the evaluation results are given where the data set is the surface-level analysis results of a video with the topic W ind Energy Approval. 1. By increasing the sampling size, the processing time increases, linearly. The sampling size (maxSteps) is one of the Alchemy parameters used for the determination of the probability [KSR+ 10]. In order to have more accurate results, we increase the sampling size. In the next experiment, we determine the required time for the interpretation of a data set with 9 bunches of assertions. Figure 5.1 shows the required processing time as a function of sampling size according to MC-SAT algorithm [PD06]. The time is the accumulated time to process all bunches of assertions in seconds. Figure 5.1 shows that by increasing the sampling size, the processing time increases, linearly. 86 Evaluation Figure 5.1: The number of samples (x) and the time (y) spent in seconds according to MC-SAT algorithm for the interpretation of a data set with 9 bunches of assertions. 2. By increasing the sampling size, the final score converges to a limit. In order to have more precise scores, we increase the sampling size (maxSteps). Sampling size (maxSteps) is one of the Alchemy parameters used for the determination of the probability [KSR+ 10]. Figures 5.2, and 5.3 depict the scores of the final interpretation Abox as a function of sampling size according to MCSAT [PD06] and Gibbs sampling algorithms [GG84], respectively. Figures 5.2, and 5.3 show that by increasing the sampling size, the final score converges to a limit. Furthermore, the next figures show that MC-SAT algorithm is faster than Gibbs sampling to reach convergence. Additionally, the next figures show that Gibbs sampling requires more samples than MC-SAT algorithm to reach convergence. 87 5.3 Hypotheses and Results Figure 5.2: The number of samples (x) and the final score (y) determined according to MC-SAT algorithm for the interpretation of a data set with 9 bunches of assertions. Figure 5.3: The number of samples (x) and the final score (y) determined according to Gibbs sampling algorithm for the interpretation of a data set with 9 bunches of assertions. 3. By increasing the number of fiat assertions to be explained in a bunch, the number of ignored bunches increases. Additionally, the agenda size does not have any influence on the number of ignored bunches. In order to control abduction depth procedure, we defined in this work variable m which indicates the maximum number of fiat assertions to be explained in a bunch of assertions. In the next experiment, we determine whether m and the 88 Evaluation agenda size has influence on the number of ignored bunches. For this purpose, we iterate m and the agenda size in the next experiment. Figure 5.4 depicts the number of ignored bunches as a function of m and the agenda size. The results show that the agenda size does not have any influence on the number of ignored bunches. The reason is that there are not so many alternatives. Unlike the agenda size, m has an influence on the number of ignored bunches. According to the next figure, bunches are ignored when m is greater than 20. Thus, there is a tradeoff between the maximum number of fiat assertions to be explained in a bunch and the number of ignored bunches. By selecting a large value for m, we have many ignored bunches. Thus, we lose the corresponding interpretation results. By selecting a small value for m, all fiat assertions of a bunch might not be explained, completely. Thus, we might not have the complete interpretation results of each bunch. Consequently, selecting a small/large value for m affects the interpretation results. Note that the reason for ignoring bunches is that stop processing is activated, externally. In the next figure, the color represents the height of the surface. Figure 5.4: The number of fiat assertions to be explained in a bunch (x), the agenda size (y), and the number of ignored bunches (z). 4. By successively explaining fiat assertions of a bunch, the scores increase, monotonically. The main requirement of this work is that by explaining observations successively, the ranks of the interpretation alternatives increase, monotonically. In 89 5.3 Hypotheses and Results this case, the agent’s belief in its observations increases [GMN+ 10a, GMN+ 10c]. Additionally, we reduce the uncertainty of observations. In this work, we control abduction depth through the maximum number of fiat assertions m to be explained in a bunch of assertions. Note that each fiat assertion represents an observation. In the next experiment, we test whether variable m has an influence on the scoring value of an interpretation alternative. Note that the input data arrives incrementally to the interpretation engine [HBB+ 13, GMN+ 10c]. In the following, we iterate the number of fiat assertions m to be explained in a bunch. In Figure 5.5, the scoring value of an interpretation alternative is a function of m and the required time for processing fiat assertions. The x-axis, y-axis, and z-axis indicate the required time for explaining fiat assertions without any delays and any idle times, m, and the scoring values, respectively. Additionally, vertical lines indicate the beginning of a new bunch. Since x-axis does not contain any delays and any idle times, Figure 5.5 depicts the compact representation of results. Figure 5.5 shows that by successively explaining fiat assertions of a bunch, the preference scores increase, monotonically. Thus, the main requirement of this work has been met. 90 Evaluation Figure 5.5: The required time for explaining fiat assertions without any delays and any idle times (x), the number of fiat assertions to be explained in a bunch (y), and the scoring value (z). We consider two strategies when the current bunch is being processed and a new bunch arrives: • stop-processing • non-stop-processing The first strategy that we consider is to stop explaining fiat assertions of the current bunch when the following bunch arrives. In this case, we switch to the following bunch and stop processing the current bunch. Thus, the remaining unexplained fiat assertions of the current bunch are deleted. Figure 5.6 depicts the results when the maximum number of fiat assertions to be explained in a bunch is m = 28. This means if a bunch contains less or equal 28 fiats, and the following bunch has not still arrived, the fiats of the bunch are explained completely and there might be an idle time for the arrival of the following bunch. Figure 5.6 shows that by explaining fiat assertions of a bunch, the scores increase, monotonically. Furthermore, we can see that the bunches do not arrive at equal 91 5.3 Hypotheses and Results time distances. A longer bunch contains more fiat assertions than a short bunch. The red lines indicate the idle times. Figure 5.6: Strategy is stopProcessing=true, the time axis indicated with the arrival time of bunches (x), the scoring value of the interpretation Abox by explaining fiat assertions (y) As a new bunch of assertions arrives, the final scores either decrease or increase: • The final score decreases if there are more unexplained fiat assertions (with probability p = 0.5) in the following bunch than explained fiat assertions (with probability p > 0.5) in the previous bunch. Thus, the average is smaller than the final score in the previous bunch. • The final score increases if in the previous bunch the number of unexplained fiat assertions (with probability p = 0.5) is much more than the number of explained fiat assertions (with probability p > 0.5). In Figure 5.6, by the arrival of a new bunch, the scores decrease in most of the cases. Only by the arrival of B6 the final score increases. The advantage of stop-processing approach is that bunches of assertions are processed on time. Thus, there is no delay for processing the following bunches. Additionally, there are not any ignored bunches. The disadvantage of this approach is that all fiat assertions of a bunch might not be explained, completely. Thus, we might not have the complete interpretation results of each bunch. The next strategy is to continue processing the current bunch although the following bunch has arrived. Figure 5.7 shows the results. In this figure, before 92 Evaluation switching to the following bunch, all fiat assertions of a bunch are explained completely. The red lines indicate the idle times and the blue lines indicate the delays. There are three ignored bunches in the next figure. The advantage of non-stop-processing approach is that all fiat assertions of each bunch are explained, completely. Thus, we have the complete interpretation results of each bunch. The disadvantage of this approach is that there are delays for processing the following bunches. Additionally, there are ignored bunches. Consequently, we lose the corresponding interpretation results. Thus, there is a tradeoff between the maximum number of fiat assertions to be explained in a bunch of assertions and the number of ignored bunches. Figure 5.7: Strategy is stopProcessing=false, the time axis indicated with the arrival time of bunches (x), the scoring value of the interpretation Abox by explaining fiat assertions (y) 5.4 Quality of the Interpretation Results In the previous chapter, we have presented an algorithm for the interpretation of the multimedia documents. In this section, we evaluate the quality of the interpretation results generated by the interpretation engine. Important measurements for the evaluation of the results are recall and precision. In accordance with Khoshafian and Baker’s notation [KB96, page 358], we define recall and precision in the context of multimedia interpretation: Recall = N umber of Relevant Objects Returned T otal N umber of Relevant Objects in the Collection P recision = N umber of Relevant Objects Returned T otal N umber of Objects Returned 93 (5.3) (5.4) 5.4 Quality of the Interpretation Results For the determination of the recall values, the total number of relevant objects in the video documents should be determined. For this purpose, the video documents should be annotated completely. Manual video annotation is an extremely time-consuming and cost-intensive task. Thus, in the following we determine only the precision values. Note that manual effort is also required for precision determination. But the manual effort for precision determination is less than the effort required for the recall determination. The reason is that for the determination of recall, the video document should be annotated completely. But for the precision determination, we check whether the generated high level annotations occur in the video document and then we count their numbers. We consider 6 videos for which we have the surface-level analysis results. The videos have different topics. The surface-level analysis results of the videos are sent to the interpretation engine. Thus, for each video an interpretation Abox is generated. We determine the total number of the deep-level concept assertions generated by the interpretation engine. In order to determine the precision values of each video, a human expert checks manually whether each deep-level concept assertion of the interpretation Abox exists in the video. Then, the human expert counts the deep-level concept assertions existing in the interpretation Abox. Table 5.2 depicts the precision values for the 6 videos. The columns of the next table depict the video topic, the total objects returned for each video, the number of relevant objects returned, and the precision values, respectively. The total number of objects returned are the deep-level objects generated during the Abox interpretation. Thus, the surface-level analysis results are not considered. A human expert checks whether the deep-level objects occur in the video document. In this case, their numbers are counted and given in the third column of the following table. The forth column gives the precision values which are determined by the formula |Relevant Objects|/|T otal objects|. Video topic Wind energy approval Wind power station Electricity generation Economic espionage Wind power park Wood gas |T otal objects| 67 45 64 56 62 63 |Relevant Objects| 64 43 59 47 54 60 Precision 0.96 0.96 0.92 0.83 0.87 0.95 Table 5.2: Precision values for videos with different topics The reason for precision determination is to check whether the controlling Abox abduction approaches applied in this work generate correct results. It might be the case that the generated deep-level objects do not occur in the video documents at all. But the precision values given in Table 5.2 show that we produced acceptable results. Thus, the approaches applied in this work, namely controlling branching, controlling abduction depth, and controlling reactivity for the interpretation process, produce high94 Evaluation quality interpretation results. 95 5.4 Quality of the Interpretation Results 96 Chapter 6 Conclusions Fully manual annotation of multimedia documents is a resource-consuming task. An objective of this work is to generate annotations, automatically. Thus, we speed up the annotation task and decrease the required resources. The generated annotations are attached to the multimedia documents and are used for the semantics-based retrieval process. In the following, we mention the objectives of this work: 1. To define an approach to deal with uncertain observations which are the input data to the interpretation engine. 2. To define a media interpretation agent to generate high-level explanations for the low-level observations. 3. To increase the expressivity of the knowledge representation language described in [Kay11] by considering recursive Horn rules. 4. To define a probabilistic scoring function according to the Markov logic formalism [DR07] for ranking interpretation alternatives. 5. To control abduction procedure in terms of branching and depth by applying Markov logic formalism. 6. To define an approach to incrementally process the input data to the probabilistic interpretation engine. 7. To increase the ranks of interpretation alternatives monotonically by successively explaining observations. This leads to reducing the uncertainty of observations and increasing the agent’s belief in its observations. This point is the main requirement of this work. In this work, we achieved the above mentioned objectives. In the following, we summarize this work, conclude the results, and present some topics for the future work. 97 In this work, the surface-level analysis results generated by state-of-the-art analysis tools are uncertain. Thus, the interpretation engine deals with uncertain input data. To handle the uncertainty in input data, we applied Maximum A Posteriori [DLK+ 08]. This approach determines the most probable world given the evidence. Additionally, we defined a media interpretation agent to generate high-level explanations for the surface-level analysis results which are observations of the agent. Moreover, we selected Markov logic [DR07] as the probabilistic formalism. An objective of this work is to increase the expressivity of the knowledge representation language described in [Kay11] by supporting recursive Horn rules. Furthermore, we applied Markov logic formalism to define an additional knowledge base which is composed of a set of weighted rules. Thus, we increased the expressivity of the knowledge representation language. Additionally, we defined a probabilistic scoring function according to the Markov logic formalism. According to this knowledge base, we ranked interpretation alternatives, probabilistically. Note that we applied Markov logic formalism only for ranking interpretation alternatives but not for generating interpretations. The reason is that according to the Markov logic abduction defined in [KM09], we require fixed names for individuals. Since during Abox abduction new individuals are generated, we cannot apply the approach defined in [KM09]. The scoring function defined in this work is more general than the one defined in [EKM09]. The reason is that our scoring function deals with uncertain observations and determines the rank of an interpretation alternative whereas the one defined in [EKM09] deals with strict observations and determines the score of an explanation. Furthermore, our scoring function considers the complete set of assertions (including observations) to determine the rank of an interpretation alternative whereas the one defined in [EKM09] considers for ranking process only the assertions which exist in the explanation. Moreover, our scoring function is probabilistic whereas the one in [EKM09] is non-probabilistic. The other important point is that the scoring function defined in [EKM09] considers a single abduction step. Thus, it is possible that the interpretation engine generates too many interpretation alternatives which have equal ranks. The main goal of this work is to control Abox abduction procedure. The reason for controlling Abox abduction is that the agent’s resources for computing explanations are limited. In this work, we showed that by applying Markov logic formalism [DR07], we can control Abox abduction procedure. For the abduction procedure, the agent selects observations one after the other for explanation. Generally, there are two different runtime problems in the Abox abduction, namely branching problem and depth problem. To solve branching problem during Abox abduction, we defined controlling branching procedure. For this purpose, we applied Beam search to explore the interpretation space. At each step, only the Abox with the highest score is chosen to explain further observations. Consequently, we have branchings only for this Abox. According to this approach, we avoid branchings for Aboxes with lower scores. 98 Conclusions Moreover, we introduced a procedure for controlling abduction depth. According to this procedure, we defined the maximum number of fiat assertions m to be explained in a bunch of assertions. By selecting a small value for m, we might lose some interpretation results. The reason is that all fiat assertions of a bunch are not explained, completely. By selecting a large value for m, there might be a delay to explain fiat assertions of the following bunches. Thus, the interpretation results of the following bunches might be generated later than required. Furthermore, by selecting a large value for m, we cannot control abduction depth anymore. Thus, the selection of variable m affects the interpretation results. Controlling branching and controlling abduction depth reduce the space of possible interpretations of media content. Thus, we reduce time and memory space required for computation of the interpretation procedure. In this work, each multimedia document is divided into several segments and each segment is analyzed, separately. Thus, the surface-level analysis results arrive incrementally to the interpretation engine [HBB+ 13, GMN+ 10c]. In other words, the interpretation engine deals with bunches of assertions where the processing of a bunch of assertion has not yet been finished completely while the following bunch arrives. In order to process the data incrementally, we defined controlling reactivity procedure. For this purpose, we introduced two different approaches, namely stop-processing and non-stopprocessing. According to stop-processing, the interpretation engine stops processing the current bunch of assertions, and starts processing the newly arrived observations of the following bunch. According to non-stop-processing approach, the interpretation engine continues processing the current observations although a new bunch of assertions has already arrived. Each approach has advantages and disadvantages. The disadvantage of stop-processing approach is that we might lose some interpretation results. The reason is that we might not explain observations of a bunch, completely. The advantage of this approach is that by quickly switching to the new observations of following bunches, bunches of assertions are processed on time. Thus, we might gain more important data. Moreover, there are no ignored bunches. The disadvantage of non-stop-processing approach is that the processing of some bunches might be delayed. This happens when a bunch with many fiat assertions is followed by a bunch with few fiat assertions. Thus, with delay we gain the interpretation results of the second bunch. Furthermore, there might be some ignored bunches. The advantage of non-stop-processing approach is that all observations of a bunch are explained, completely. Thus, we do not lose any interpretation results. The defined approaches for controlling Abox abduction namely, controlling branching, controlling abduction depth, and controlling reactivity can be applied for every other domain. Thus, these approaches are not domain specific. Each interpretation alternative is composed of large sets of assertions. Thus, maintaining all interpretation alternatives is a resource-consuming task. In order to reduce the memory space required for the interpretation process, we did not keep all inter99 pretation alternatives. In this thesis, we used an agenda which maintained possible interpretation alternatives. On the agenda, the interpretation alternatives are sorted by scoring values. Each agenda has a certain size. An agenda with agenda size k maintains the top k interpretation alternatives. Thus, we reduce the memory space required for the interpretation process, enormously. We compared our preference score with the one defined in [Poo91]. The preference scores are defined according to two different approaches. The preference score in [Poo91] is defined according to the Maximum Likelihood approach [RN03] whereas our preference score is determined according to the Maximum A Posteriori approach [DLK+ 08]. Furthermore, the instances of hypotheses in [Poo91] are logically independent, which does not hold in our work. Our preference score is more general than the one defined in [Poo91]. The reason is that our preference score determines the rank of an interpretation alternative whereas the one defined in [Poo91] determines the rank of an explanation. Furthermore, our approach deals with uncertain observations whereas the approach defined in [Poo91] deals with strict observations. In this thesis, we determined the probabilities according to the Markov logic formalism [DR07]. In order to increase the accuracy of a probability value, we increased the sampling size. The results showed that by increasing the sampling size, the processing time increases, linearly. We considered for this measurement the accumulated time to process all bunches of assertions according to MC-SAT algorithm [PD06]. Furthermore, by increasing the sampling size, the final score of an interpretation alternative converges to a limit. We applied two different algorithms namely, MC-SAT [PD06] und Gibbs sampling [GG84]. The results showed that MC-SAT algorithm converges faster than Gibbs sampling to limit. In order to control abduction depth procedure, we defined maximum number of fiat assertions m to be explained in a bunch. The results showed that by increasing the number of fiat assertions to be explained in a bunch, the number of ignored bunches increases. Thus, there is a tradeoff between the maximum number of fiat assertions to be explained in a bunch and the number of bunches to be processed. Consequently, value m affects the interpretation results. Furthermore, the results showed that the agenda size does not have any influence on the number of ignored bunches. During the interpretation procedure, the observations of an Abox are successively explained. The results showed that by successively explaining observations, the score of the interpretation Abox increases, monotonically [GMN+ 10a, GMN+ 10c]. Thus, the agent’s belief in its observations increases. Consequently, the interpretation procedure reduces the uncertainty of observations. Thus, the main requirement of this work has been met. We developed the semantic interpretation engine, a software system to automatically generate annotations for multimedia documents. Furthermore, we evaluated the probabilistic interpretation engine in a practical scenario. Finally, the quality of the interpretation results generated by the interpretation engine were evaluated in terms of precision [KB96]. The evaluation results showed that we produced high-quality inter100 Conclusions pretation results although we applied approaches for controlling Abox abduction. Thus, the applied approaches for controlling Abox abduction did not lead to low-quality interpretation results. Consequently, the quality of the interpretation results is not affected by controlling Abox abduction. In the following, we present some topics for the future work: • Before analyzing a video document, the video document is divided into several segments. Then, the surface-level analysis results are associated to the segments. In this work, we did not consider for the interpretation procedure from which video segment the assertions are generated. As a future work, we can generate interpretations for each segment separately. Then, we consider the interpretations of all segments for the fusion procedure. For this purpose, we can apply a set of forward chaining rules which consider the chronologically appearance of the video segments. Accordingly, we can generate deep-level assertions. • In the strategy Algorithm 3, we specified in which order the fiat assertions of an Abox should be explained so that the scoring values assigned to the interpretation Aboxes are plausible. According to Algorithm 3, we determined the supporting weighted rules of each fiat assertion. Then, we determined the weighted rule with the maximum weight. Thus, the next fiat assertion to be explained is the fiat assertion supported by a weighted rule with the highest weight. If we do not consider the weights of the supporting weighted rules, a small scoring value is assigned to an interpretation Abox which could have a high scoring value. A topic for the future work could be to determine the selection order of the fiat assertions for the explanation procedure and examine the results. • In this work, we used an additional knowledge base which contained weighted rules. We assigned unequal positive weights to the weighted rules. However, the weights of the weighted rules can be learned [LD07, DLK+ 08]. Learning the weights of the weighted rules could be another topic for future work. • For controlling abduction depth, we defined maximum number of fiat assertions m to be explained in a bunch of assertions. In this work, we discussed the advantages and disadvantages of selecting a small/large value for m. We also discussed that the selection of variable m affects the interpretation results. A future work topic could be to select variable m, automatically. 101 102 Appendix A Alchemy Knowledge Representation System and Language Alchemy [KSR+ 10] is an open-source software package developed at the University of Washington. Alchemy supports Markov logics [DR07] as the underlying formalism. The system provides algorithms for probabilistic logic inference and statistical relational learning. Appendix A is partly taken from [GMN+ 09a]. A.1 Alchemy Knowledge Representation Language The knowledge base in Alchemy is based on the following assumptions [DLK+ 08]: • The unique name assumption: which means that two different names refer to two different objects in the universe. • The domain closure assumption: which means that there are no other objects in the universe than those specified by the constants of the knowledge base. Based on the domain closure assumption, existentially quantified formulas are replaced by disjunctions of groundings of the quantified formula with constants used in the knowledge base. Universal quantifiers are replaced by the conjunction of the corresponding groundings as well. Example 12 An example for the domain closure assumption is helpful to clarify the consequences. Let us assume the following formula and two constants M ary and Sara: F = ∃x, y MotherOf (x, y) (A.1) Applying the domain closure assumption means that the formula F can be replaced by: MotherOf (M ary, M ary) ∨ MotherOf (M ary, Sara) ∨ MotherOf (Sara, M ary) ∨ M otherOf (Sara, Sara) 103 A.1 Alchemy Knowledge Representation Language For answering probability queries, the number of true groundings needs to be determined (see Section 2.2.3). Thus, answering probability queries corresponds to modelchecking first-order formulas. The problem of answering probability queries is PSPACEcomplete [Var82]. The same holds for the entailment problem for a probability assertion. The next figure depicts the main units in the Alchemy module: Figure A.1: Alchemy module The Alchemy module is composed of the following units: • World Generator: Based on the Tbox and the input Abox this component produces all Markov logic worlds, which are indicated in Figure A.1 by w1 , w2 , . . . , wn . Note that a world is a vector of all ground atoms of the domain. Let us assume there are m ground atoms. Consequently, the number of worlds is 2m . The generated worlds are the input to the next component called world filter. • World Filter: This component removes the impossible worlds. In this step, the subsumption axioms, domain and range restrictions, and the evidences in the evidence file .db are considered for the world elimination process. The remaining worlds indicated in the above figure by wi , . . . , wj are known as possible worlds. A knowledge base consists of three parts namely types, predicates and formulas [KSR+ 10, DLK+ 08]. The first two parts are required whereas the last part is optional. 1. Types: In the first part, types are defined where a set of constants is assigned to each type, e.g., city = {Hamburg, Berlin} indicates a type city with two constants Hamburg and Berlin. Each defined type must have at least one constant. In Alchemy, a constant can have different types. 2. Predicates: In the second part, the applied predicates and their corresponding types are introduced, e.g., AirPollution(city) defines a predicate AirPollution with type city. AirPollution(city) means that the type of the individuals for the predicate AirP ollution is city. By considering city = {Hamburg, Berlin}, the only options are: AirP ollution(Hamburg) AirP ollution(Berlin) The advantage of using types is to speed up the inference process since the world generator of Alchemy produces only the worlds which correspond to the correct typings. 104 Alchemy Knowledge Representation System and Language 3. Formulas: In the last part of the knowledge base, the so-called hard- and soft formulas (in first-order logic) are defined. Hard formulas, also known as strict formulas, are the formulas which cannot be violated and are valid for every object of the domain. In order to distinguish hard and soft formulas of the knowledge base, hard formulas are terminated by a period, and soft formulas are preceded by a weight. Based on the Markov logic theory [DLK+ 08], the weight of a hard formula is infinity, but the Alchemy engine [KSR+ 10] assigns a “high” weight (which is unequal to infinity) to the hard formulas internally. Later, we will discuss how Alchemy determines the weight of a hard formula. Let us assume two predicates CityWithIndustry(city) and CityWithAirPollution(city) (see item 2. above). In the following, an example for a hard formula is given: CityWithIndustry(x ) ⇒ CityWithAirPollution(x ). Due to the typing constraints, the variable x can only be substituted by individuals of type city. Let us consider the predicates CityWithRain(city) and CityWithFlood(city) (see item 2. above). The next example shows a soft formula: 0 .1 CityWithRain(x ) ⇒ CityWithFlood (x ) A weight is assigned to the above formula and consequently, makes it a soft formula. Weighted ground atoms, e.g., 0.6 CityW ithRain(Berlin), are considered as formulas and stand in the formula part of .mln file. Unlike weighted ground atoms, the strict ground atoms, e.g. CityW ithRain(Berlin), stand as evidences in the evidence file .db. Note that the evidence file contains only strict ground atoms. The weights of formulas can be learned [LD07, DLK+ 08] using the weight learning tool of Alchemy [KSR+ 10]. A.2 Interfaces to Alchemy The command for solving inference problems in Alchemy, e.g., for determining the probabilities based on the given evidences is infer, and it has the following form [KSR+ 10]: infer -i uniform.mln -r uniform.result -e evidence.db -q QueryFormula The options indicate: • -i: input file “uniform.mln” • -r: output file “uniform.result” • -e: evidence file “evidence.db” 105 A.3 Inference Services • -q: query formula which can be a concept name, a role name, a ground atom or a conjunction of ground atoms. We can specify more than one query formula separated by comma. In the following, the above mentioned file types for performing inference are introduced: • Markov logic network file (with extension .mln): The MLN file contains types, predicates, and formulas as introduced above. • An optional evidence file (with extension .db): The .db file contains evidence ground atoms which can be either true or false. By default, evidence predicates are treated using the closed-world assumption, meaning that if they are not present in the .db file, they are assumed to be false. Non-evidence predicates are treated using the open-world assumption by default (a predicate is called an evidence predicate if at least one grounding of a predicate exists in the evidence file.). The evidence file can be empty, since it is optional. It is also possible to use multiple evidence files. • Output file (with extension .result): In case of performing probabilistic inference, this file contains the probabilities of query atoms given the evidence file. In case of MAP inference, this file shows the most likely state of query atoms. A.3 Inference Services Alchemy [KSR+ 10] can solve the problem of answering probability queries as well as the entailment problem for probability assertions using exact inference as described in 2.2.3. For exact inference, runtimes usually turn out to be too long in practice. Therefore, Alchemy can be instructed to perform approximate inference based on sampling techniques. The exactness of approximate inference can be controlled. In order to produce more accurate results, the maximum number of steps to run sampling algorithms can be increased (option -maxSteps). By increasing the number of samples, the results of approximate inference converge to the results of exact inference. Some effects of the approximation techniques used by Alchemy have to be understood, however. We have seen that strict formulas can reduce the number of worlds that have non-zero probability. Based on the theory of Markov logics [DLK+ 08], the weight of a strict formula is positive infinity. According to the manual [KSR+ 10], Alchemy assigns (positive) finite weights to strict formulas, however. To determine these weights for strict formulas, Alchemy converts input formulas into conjunctive normal form (CNF). Afterwards, the weight of a formula is divided equally among its CNF clauses. The weight assigned to a strict formula depends on the inference type. Alchemy performs two types of inference, namely [KSR+ 10]: 106 Alchemy Knowledge Representation System and Language • Probability queries: Probabilistic inference methods currently implemented in Alchemy are based on two general algorithms: Markov Chain Monte Carlo (MCMC) [GRS96] and (lifted) belief propagation (option -bp) [YFW01, SD08]. Based on MCMC different inference algorithms have been implemented in Alchemy: Gibbs sampling (option -p) [GG84], simulated tempering (option -simtp) [MP92], and MC-SAT (option -ms) [PD06]. The default algorithm is lifted belief propagation [SD08]. The advantage of lifted inference in comparison to the fully grounded network is the runtime and memory usage [KSR+ 10]. For the above-mentioned inference methods, the number of iterations can be specified. In the following, the number of iterations is set to 1000: infer -i uniform.mln -r uniform.result -e evidence.db -q Car,DoorSlam -maxSteps 1000 The query formula contains two predicates Car and DoorSlam. Let us assume Car(t1 ) and DoorSlam(t2 ) where: t1 = {C1 , C2 } t2 = {DS1 , DS2 } As a result, by the above command the next four conditional probabilities are determined: P (Car(C1 ) = true | ~e) P (Car(C2 ) = true | ~e) P (DoorSlam(DS1 ) = true | ~e) P (DoorSlam(DS2 ) = true | ~e) where ~e indicates the evidence vector. The default weight assigned to the clauses of a strict formula based on MCMC inference [GRS96] is twice the maximum weight mentioned in the complete set of weighted input formulas [KSR+ 10]. • MAP inference: This type of inference is called Maximum A Posteriori (MAP) (option -a) [DLK+ 08] and computes the most-likely state of query atoms given the evidence [SD05]. In other words, the output consists of ground atoms associated with zeros and ones (denoting a world). Note that the output file contains all ground atoms of the predicates mentioned in the QueryFormula of the infer command. The default weight assigned to the clauses of a strict formula based on MAP inference is the sum of weights appearing in the set of weighted input formulas plus 10 [KSR+ 10]. The command which solves the MAP problem in Alchemy has the next form [KSR+ 10]: infer -a -i uniform.mln -r uniform.result -e evidence.db -q QueryFormula 107 A.4 Peculiarities of the Inference Engine Alchemy The query formula contains predicate names separated by coma. The following command determines the most probable world considering two predicate names Car and DoorSlam: infer -a -i uniform.mln -r uniform.result -e evidence.db -q Car, DoorSlam Example 13 In this section, an example for a MLN file is given. city = {Hamburg, Berlin} CityWithIndustry (city) CityWithAirPollution (city) CityWithRain (city) CityWithFlood (city) CityWithIndustry(x) ⇒ CityW ithAirP ollution(x). 0.1 CityWithRain(x) ⇒ CityWithFlood (x) 0.3 CityWithIndustry (Hamburg) Table A.1: Example for a MLN file Additionally, an example for a DB file based on the above MLN file is given: CityWithRain (Hamburg) CityWithAirPollution (Berlin) CityWithFlood (Berlin) Table A.2: Example for a DB file A query formula for this example could be: CityWithAirPollution(Hamburg) ∧ CityWithFlood (Hamburg) A.4 (A.2) Peculiarities of the Inference Engine Alchemy In this section, we discuss the problems we faced with as we applied the inference engine Alchemy [KSR+ 10]. In the Markov logic formalism [DLK+ 08], it is essential to define the type of each predicate. Note that a type is a non-empty set containing constants. If a type does not have any constant, Alchemy engine generates error. To avoid this type of error, constants are assigned to each predicate type. Note that a constant can have several types. In the following examples, we discuss the important issue of typing. 108 Alchemy Knowledge Representation System and Language Example 14 In this example, we define a subsumption Tbox axiom containing two predicates. Additionally, we define the type of each predicate where the types are not the same. Moreover, the super concept type is an empty set: politician = {P1 } Politician (politician) P erson (person) P olitician(x) ⇒ P erson(x). P olitician(P1 ). Table A.3: An example for .mln file For the above example, Alchemy generates the following error: ”Type person has no constant. A type must have at least one constant.” As it can be seen, no relation is defined among the types in the Alchemy engine. At least, a relation could have been defined among the predicate types of the strict subsumption axioms. In this example, the following relation must hold: politician ⊆ person (A.3) The person type must have at least the constant P1 . A solution to this problem is to define a non-empty set type for person. In the following, we define three suggestions: person = {P1 } person = {Ind 42 } person = {Ind 42 , P1 } The next query can be asked even though P1 ∈ / person: P (P erson(P1 ) = true) = 1 (A.4) Example 15 In this example, we have the same subsumption Tbox axiom as in the previous example and the same predicate types. Unlike the previous example, the subconcept type is an empty set. 109 A.4 Peculiarities of the Inference Engine Alchemy person = {P1 } Politician (politician) P erson (person) P olitician(x) ⇒ P erson(x). Table A.4: An example for .mln file The Alchemy engine generates the next error: ”Type politician has no constant. A type must have at least one constant.” Since P1 could be a Politician, the constant P1 could have been automatically added to the type politician. In the following, we define some politician types to avoid the error: politician = {P1 } politician = {Ind 42 } politician = {P1 , Ind 42 } We can ask the next query even though P1 ∈ / politician: P (P olitician(P1 ) = true) = 0.5 This example shows that the relation among predicate types is not defined for strict subsumption Tbox axioms. Example 16 This example shows that the Alchemy engine [KSR+ 10] does not produce any errors although the input is inconsistent. This means, during the syntax controlling, the consistency checking according to the given axioms is not performed. In this example, a disjoint axiom is given. Moreover, types are defined for the predicates where the types contain the same constant. According to Alchemy, a constant can have several types. Thus, it does not lead to any error generation. 110 Alchemy Knowledge Representation System and Language interview = {P } conference = {P } Interview (interview) Conference (conference) Interview (x) ⇒!Conference(x). Interview (P ). Conference(P ). Table A.5: An example for .mln file The above input is inconsistent since Interview (P ) and Conference(P ) are true even though the predicates Interview and Conference are disjoint. The Alchemy engine does not produce any error although the above input is inconsistent. But the answers to the following queries show that Alchemy performs conflict resolution before answering the queries: P (Interview (P ) = true) = 1 P (Conferece(P ) = true) = 0 The above answers show that Alchemy randomly selects one of the assertions as true and the other one as false. Similarly, the most probable world for a conflicting input is correctly determined. Example 17 In this example, we discuss the difference between the Description logic [BCM+ 03] and the weighted logic according to the Markov logic formalism [DR07]. In the following, we define a subsumption Tbox axiom with two predicates. Moreover, the corresponding non-empty types are defined. 111 A.4 Peculiarities of the Inference Engine Alchemy person = {P1 , P2 } politician = {P3 } Person (person) P olitician (politician) P olitician(x) ⇒ P erson(x). P olitician(P3 ). P erson(P1 ). Table A.6: An example for .mln file The unique name assumption holds in the Description logic [BCM+ 03] and the Markov logic formalism [DLK+ 08]. Thus it follows, the constants P1 , P2 and P3 refer to different objects: P1 = 6 P2 P1 = 6 P3 P2 = 6 P3 According to the Description logic, there are two P erson instances and only one P olitician instance. Since P olitician v P erson, it follows: P1 = P3 or P2 = P3 But according to the unique name assumption, they do not follow. Thus, the subsumption formula P olitician v P erson should be wrong. But based on the Markov logic formalism, P1 and P2 might be Politician instances. Similarly, P3 might be a P erson instance. Thus, we can calculate the next probabilities: P (P olitician(P1 ) = true) P (P erson(P3 ) = true) P (P olitician(P2 ) = true) P (P erson(P2 ) = true) = = = = 0.51 1 0.32 0.64 A solution to this problem is to propagate the constants upwards and downwards according to the subsumption axioms. person = {P1 , P2 , P3 } politician = {P1 , P2 , P3 } 112 Alchemy Knowledge Representation System and Language Example 18 According to the Description logics [BCM+ 03], extending the knowledge base with the implicit knowledge does not change the knowledge base at all. In this example, we show that unlike the Description logics, adding an implicit formula to the knowledge base changes the knowledge base and consequently the probabilities. The only formula which can be added to the knowledge base and does not change the knowledge base is a formula with the weight zero. Consider the next .mln file: person = {P1 } politician = {P1 } Person (person) P olitician (politician) 1 P olitician(x) 1 P olitician(x) ⇒ P erson(x) Table A.7: An example for .mln file In the above knowledge base there is an implicit formula, namely: w P erson(x) (A.5) In order to determine the weight w, we ask the next probability: P (P erson(P1 ) = true) = 0.65 Now we can determine w: p = 0.62 1−p By extending the knowledge base with A.5 and asking the previous probability, we notice that the probability value is not the same as before: w = ln P (P erson(P1 ) = true) = 0.79 This problem arises since the assertion and the axiom in the above knowledge base are not strict. Considering the above knowledge base with strict axiom and strict assertion, it holds: P (P erson(P1 ) = true) = 1 By adding P erson(P1 ) to the knowledge base, we have the situation like in Description logics where the integration of the implicit knowledge does not change the knowledge base. 113 A.4 Peculiarities of the Inference Engine Alchemy 114 Appendix B RacerPro and its Module for DLLP Abduction The interface for DLLP Abduction is part of the RacerPro description logic inference system. In order to explain a single concept or role assertion, we use in RacerPro [HM01] retrieve-with-explanation() which is called as follows: retrieve-with-explanation()(i j roleName)(:Ω) In the above notation, i and j indicate individual names. Furthermore, the strategy parameter Ω denotes strategies for variable instantiation. Two strategies are defined namely ’reuse existing individuals’ and ’use new individuals’. We apply the strategy ’reuse existing individuals’ with Ω =reuse-old if we want to use the individuals mentioned in the Abox. If no strategy parameter is mentioned, the applied strategy is ’use new individuals’ where new individuals are hypothesized. RacerPro can be obtained from the website: www.racer-systems.com. 115 116 Appendix C The DLLP-Abduction Based Interpretation System In order to demonstrate that the results obtained in this thesis are reproducible, the DLLP-abduction based interpretation module is available as open-source software. The data generator and the log file producer have been implemented by Maurice Rosenfeld in Java language. The data flow in CASAM system [HBB+ 13] is composed of AssertionSender and InterpretationReceiver which run on RMI server with the IP address and port number 134.28.70.136 : 9092. The following figure depicts the data flow [Has11]: Figure C.1: Data flow in the CASAM system The RMI test framework including the log file interpreter have been implemented by Björn Hass as part of his Master thesis [Has11] in Python language. The AssertionSender is a real time data source [HBB+ 13]. Thus, the input data is sent in real time and incrementally to the RMI server [HBB+ 13,GMN+ 10c]. Thus, the RMI server incrementally processes the data. The log files are parsed for the relevant data by applying regular expressions. In order to test the framework, we use command line options. The following command line parses the log file log7.txt and sends the result to the output file 7.rmi [Has11]: Python testscript.py --scan log7.txt 7.rmi In order to plot the output file 7.rmi, we use the following command line [Has11]: Python testscript.py --plot 6 7.rmi --show --what "Intp; r; igBunch" 117 where we use plot number 6. The parameters ”Intp”, ”r”, and ”igBunch” indicate the number of interpretations, the maximum number of fiat assertions to be explained in a bunch of assertions and the number of ignored bunches, respectively. 118 Bibliography [BCM+ 03] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, Cambridge, NY, USA, January 2003. [BHS05] F. Baader, I. Horrocks, and U. Sattler. Description Logics as Ontology Languages for the Semantic Web. In Dieter Hutter and Werner Stephan, editors, Mechanizing Mathematical Reasoning: Essays in Honor of Jörg H. Siekmann on the Occasion of His 60th Birthday, volume 2605 of Lecture Notes in Artificial Intelligence, pages 228–248. Springer, 2005. [BKN11] Wilfried Bohlken, Patrick Koopmann, and Bernd Neumann. SCENIOR: Ontology-Based Interpretation of Aircraft Service Activities. Technical report, University of Hamburg, Department of Informatics, Cognitive Systems Laboratory, February 2011. [BNHK11] Wilfried Bohlken, Bernd Neumann, Lothar Hotz, and Patrick Koopmann. Ontology-Based Realtime Activity Monitoring Using Beam Search. In Proceedings of ICVS 2011, volume 6962 of Lecture Notes in Computer Science, pages 112–121. Springer, 2011. [CDD+ 03] Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio, Francesco M. Donini, and Marina Mongiello. Concept Abduction and Contradiction in Description Logics. In Diego Calvanese, Giuseppe De Giacomo, and Enrico Franconi, editors, Proceedings of the 16th International Workshop on Description Logics (DL2003), volume 81 of CEUR Workshop Proceedings, Rome, Italy, September 2003. CEUR-WS.org. [CEF+ 08] Silvana Castano, Sofia Espinosa, Alfio Ferrara, Vangelis Karkaletsis, Atila Kaya, Ralf Möller, Stefano Montanelli, Georgios Petasis, and Michael Wessel. Multimedia Interpretation for Dynamic Ontology Evolution. In Journal of Logic and Computation, volume 19 of 5, pages 859–897. Oxford University Press, 2008. 119 BIBLIOGRAPHY [Coo90] G. F. Cooper. The Computational Complexity of Probabilistic Inference using Bayesian Belief Networks. Artificial Intelligence, 42:393–405, 1990. [CY90] James Joseph Clark and Alan L. Yuille. Data Fusion for Sensory Information Processing Systems, volume 105 of The Springer International Series in Engineering and Computer Science. Kluwer Academic Publishers, Norwell, MA, USA, 1990. [DK02] Marc Denecker and Antonis C. Kakas. Abduction in Logic Programming. In Antonis C. Kakas and Fariba Sadri, editors, Computational Logic: Logic Programming and Beyond, volume 2407 of Lecture Notes in Computer Science, chapter 16, pages 402–436. Springer, 2002. [DLK+ 08] Pedro Domingos, Daniel Lowd, Stanley Kok, Hoifung Poon, Matthew Richardson, and Parag Singla. Just Add Weights: Markov Logic for the Semantic Web. In Paulo Cesar G. da Costa, Claudia d’Amato, Nicola Fanizzi, Kathryn B. Laskey, Kenneth J. Laskey, Thomas Lukasiewicz, Matthias Nickles, and Michael Pool, editors, Uncertainty Reasoning for the Semantic Web I, volume 5327 of Lecture Notes in Computer Science, pages 1–25. Springer, 2008. [DR07] Pedro Domingos and Matthew Richardson. Markov Logic: A Unifying Framework for Statistical Relational Learning. In L. Getoor and B. Taskar, editors, Introduction to Statistical Relational Learning, pages 339–371. Cambridge, MA: MIT Press, 2007. [EKM09] Sofia Espinosa, Atila Kaya, and Ralf Möller. Formalizing Multimedia Interpretation based on Abduction over Description Logic Aboxes. In Bernardo Cuena-Grau, Ian Horrocks, and Boris Motik, editors, Proceedings of the 22nd International Workshop on Description Logics (DL2009), Oxford, UK, July 2009. [EKM11] Sofia Espinosa, Atila Kaya, and Ralf Möller. Logical Formalization of Multimedia Interpretation. In G. Paliouras, C. D. Spyropoulos, and G. Tsatsaronis, editors, Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, volume 6050 of Lecture Notes in Computer Science, pages 110–133. Springer, 2011. [Esp11] Sofia Espinosa. Content Management and Knowledge Management: Two Faces of Ontology-Based Deep-Level Interpretation of Text. PhD thesis, Hamburg University of Technology (TUHH), Hamburg, Germany, 2011. [FK00] Peter A. Flach and Antonis C. Kakas. Abductive and Inductive Reasoning: Background and Issues. In Peter A. Flach and Antonis C. Kakas, editors, 120 BIBLIOGRAPHY Abduction and Induction: Essays on Their Relation and Integration , pages 1–27. Kluwer Academic Publishers, 2000. [GG84] S. Geman and D. Geman. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741, 1984. [GM10] Oliver Gries and Ralf Möller. Gibbs Sampling in Probabilistic Description Logics with Deterministic Dependencies. In Thomas Lukasiewicz, Rafael Penaloza, and Anni-Yasmin Turhan, editors, Proceedings of the First International Workshop on Uncertainty in Description Logics (UnIDL-2010), Edinburgh, UK, 2010. [GMN+ 08] Oliver Gries, Ralf Möller, Anahita Nafissi, Kamil Sokolski, and Maurice Rosenfeld. Formalisms Supporting First-order Probabilistic Structures. CASAM Project Deliverable D3.1, October 2008. [GMN+ 09a] Oliver Gries, Ralf Möller, Anahita Nafissi, Maurice Rosenfeld, Kamil Sokolski, and Michael Wessel. Basic Reasoning Engine: Report on Optimization Techniques for First-Order Probabilistic Reasoning. CASAM Project Deliverable D3.2, September 2009. [GMN+ 09b] Oliver Gries, Ralf Möller, Anahita Nafissi, Kamil Sokolski, and Maurice Rosenfeld. CASAM Domain Ontology. CASAM Project Deliverable D6.2, April 2009. [GMN+ 10a] Oliver Gries, Ralf Möller, Anahita Nafissi, Maurice Rosenfeld, Kamil Sokolski, and Michael Wessel. A Probabilistic Abduction Engine for Media Interpretation based on Ontologies. In Pascal Hitzler and Thomas Lukasiewicz, editors, Proceedings of 4th International Conference on Web Reasoning and Rule Systems (RR-2010), volume 6333 of Lecture Notes in Computer Science, pages 182–194, Bressanone/Brixen, Italy, September 2010. Springer. [GMN+ 10b] Oliver Gries, Ralf Möller, Anahita Nafissi, Maurice Rosenfeld, Kamil Sokolski, and Michael Wessel. Meta-Level Reasoning Engine, Report on Meta-Level Reasoning for Disambiguation and Preference Elicitation. CASAM Project Deliverable D3.4, October 2010. [GMN+ 10c] Oliver Gries, Ralf Möller, Anahita Nafissi, Maurice Rosenfeld, Kamil Sokolski, and Michael Wessel. Probabilistic Abduction Engine: Report on Algorithms and the Optimization Techniques used in the Implementation. CASAM Project Deliverable D3.3, May 2010. 121 BIBLIOGRAPHY [GMN+ 11] Oliver Gries, Ralf Möller, Anahita Nafissi, Maurice Rosenfeld, Kamil Sokolski, and Sebastian Wandelt. Dealing Efficiently with OntologyEnhanced Linked Data for Multimedia. Technical report, Institute for Software Systems (STS), Hamburg University of Technology, Germany, 2011. [GRS96] Walter R. Gilks, Sylvia Richardson, and D. J. Spiegelhalter, editors. Markov Chain Monte Carlo in Practice. Chapman and Hall, 1996. [Häh01] Reiner Hähnle. Tableaux and Related Methods. In John Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, chapter 3, pages 101–178. Elsevier Science Publishers B.V., 2001. [Has11] Björn Hass. Evaluation of Media Interpretation Algorithms. Master’s thesis, Hamburg University of Technology (TUHH), Hamburg, Germany, October 2011. [HBB+ 13] Robert J. Hendley, Russell Beale, Chris P. Bowers, Christos Georgousopoulos, Charalampos Vassiliou, Petridis Sergios, Ralf Möller, Eric Karstens, and Dimitris Spiliotopoulos. CASAM: Collaborative HumanMachine Annotation of Multimedia. In Multimedia Tools and Applications Journal (MTAP), 2013. [HM01] Volker Haarslev and Ralf Möller. RACER System Description. In Proceedings of the International Joint Conference on Automated Reasoning (IJCAR 2001), volume 2083 of Lecture Notes in Computer Science, pages 701–705. Springer, 2001. [HMS04] Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Reducing SHIQDescription Logic to Disjunctive Datalog Programs. In Proceedings of the 9th International Conference on the Principles of Knowledge Representation and Reasoning (KR 2004), pages 152–162, 2004. [Kay11] Atila Kaya. A Logic-Based Approach to Multimedia Interpretation. PhD thesis, Hamburg University of Technology (TUHH), Hamburg, Germany, 2011. [KB96] Setrag Khoshafian and A. Brad Baker. Multimedia and Imaging Databases. Morgan Kaufmann Publisher, 1996. [KES11] Szymon Klarman, Ulle Endriss, and Stefan Schlobach. ABox Abduction in the Description Logic ALC. Journal of Automated Reasoning, 46(1):43–80, 2011. 122 BIBLIOGRAPHY [KM09] Rohit J. Kate and Raymond J. Mooney. Probabilistic Abduction using Markov Logic Networks. In Proceedings of the IJCAI-09 Workshop on Plan, Activity, and Intent Recognition (PAIR-09), Pasadena, CA, July 2009. [Kna11] Ulrich Knauer. Algebraic Graph Theory: Morphisms, Monoids, and Matrices. De Gruyter Studies in Mathematics. Walter de Gruyter, 2011. [Kol50] Andrey N. Kolmogorov. Foundations of the Theory of Probability. Chelsea Publishing Company, 1950. [Kow11] Robert Kowalski. Computational Logic and Human Thinking: How to be Artificially Intelligent. Cambridge University Press, 2011. [KP97] Daphne Koller and Avi Pfeffer. Object-Oriented Bayesian Networks. In Dan Geiger and Prakash P. Shenoy, editors, Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI97), pages 302–313, Brown University, Providence, Rhode Island, USA, August 1997. Morgan Kaufmann. [KSJ97] Henry Kautz, Bart Selman, and Yueyen Jiang. A General Stochastic Approach to Solving Problems with Hard and Soft Constraints. In D. Gu, J. Du, and P. Pardalos, editors, The Satisfiability Problem: Theory and Applications, pages 573–586. American Mathematical Society, New York, NY, 1997. [KSR+ 10] Stanley Kok, Parag Singla, Matthew Richardson, Pedro Domingos, Marc Sumner, Hoifung Poon, Daniel Lowd, Jue Wang, and Aniruddh Nath. The Alchemy System for Statistical Relational AI: User Manual. Department of Computer Science and Engineering, University of Washington, Seattle, WA, January 2010. http://alchemy.cs.washington.edu/user-manual/. [Lac98] Nicolas Lachiche. Abduction and Induction from a Non-Monotonic Reasoning Perspective. In In Abduction and Induction: Essays on their Relation and Integration, pages 107–116. Kluwer Academic Publishers, 1998. [Las08] Kathryn Blackmond Laskey. MEBN: A Language for First-Order Bayesian Knowledge Bases. Artificial Intelligence, 172(2–3):140–178, February 2008. [LD07] Daniel Lowd and Pedro Domingos. Efficient Weight Learning for Markov Logic Networks. In Joost N. Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladenic, and Andrzej Skowron, editors, Proceedings of the Eleventh European Conference on Principles and Practice of Knowledge Discovery in Databases, volume 4702 of Lecture Notes in Computer Science, pages 200–211. Springer, 2007. 123 BIBLIOGRAPHY [Llo87] J. W. Lloyd. Foundations of Logic Programming. Symbolic Computation Series. Springer, Berlin, Germany, Second edition, 1987. [Lov78] Donald W. Loveland. Automated Theorem Proving: A Logical Basis, volume 6 of Fundamental Studies in Computer Science. North-Holland Publishing Company, 1978. [LY02] Fangzhen Lin and Jia-Huai You. Abduction in Logic Programming: A New Definition and an Abductive Procedure Based on Rewriting. Artificial Intelligence, 140:2002, 2002. [MN08] Ralf Möller and Bernd Neumann. Ontology-Based Reasoning Techniques for Multimedia Interpretation and Retrieval. In Yiannis Kompatsiaris and Paola Hobson, editors, Semantic Multimedia and Ontologies: Theory and Applications, pages 55–98. Springer, Heidelberg, 2008. [MP92] Enzo Marinari and Giorgio Parisi. Simulated Tempering: A New Monte Carlo Scheme. Europhysics Letters, 19:451–458, 1992. [Nea93] Radford M. Neal. Probabilistic Inference using Markov Chain Monte Carlo Methods. Technical report, University of Toronto, Department of Computer Science, September 1993. [Neu08] Bernd Neumann. Bayesian Compositional Hierarchies - A Probabilistic Structure for Scene Interpretation. In Anthony G. Cohn, David C. Hogg, Ralf Möller, and Bernd Neumann, editors, Logic and Probability for Scene Interpretation, number 08091 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2008. Schloss Dagstuhl - Leibniz - Zentrum fuer Informatik. [NH97] Liem Ngo and Peter Haddawy. Answering Queries from Context-Sensitive Probabilistic Knowledge Bases. Theoretical Computer Science, 171:147– 177, 1997. [NM06] Bernd Neumann and Ralf Möller. On Scene Interpretation with Description Logics. In H.I. Christensen and H.-H. Nagel, editors, Cognitive Vision Systems: Sampling the Spectrum of Approaches, volume 3948 of Lecture Notes in Computer Science, pages 247–278. Springer, Heidelberg, 2006. [NM08] Tobias Henrik Näth and Ralf Möller. ContraBovemRufum: A System for Probabilistic Lexicographic Entailment. In Proceedings of the Twenty-First International Workshop on Description Logics (DL2008), 2008. [Pau93] Gabriele Paul. Approaches to Abductive Reasoning: An Overview. Artificial Intelligence Review, 7(2):109–152, 1993. 124 BIBLIOGRAPHY [PD06] Hoifung Poon and Pedro Domingos. Sound and Efficient Inference with Probabilistic and Deterministic Dependencies. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 458– 463, Boston, Massachusetts, July 2006. AAAI Press. [Pea88] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Series in Representation and Reasoning. Morgan Kaufmann Publishers, San Francisco, CA, USA, 1988. [Pei78] Charles Sanders Peirce. Deduction, Induction and Hypothesis. In Popular Science Monthly 13, pages 470–482, 1878. [Poo91] David Poole. Representing Bayesian Networks within Probabilistic Horn Abduction. In Bruce D’Ambrosio and Philippe Smets, editors, Proceedings of the Seventh Annual Conference on Uncertainty in Artificial Intelligence (UAI 1991), pages 271–278, University of California at Los Angeles, Los Angeles, CA, USA, July 1991. Morgan Kaufmann Publishers. [Poo97] David Poole. The Independent Choice Logic for Modelling Multiple Agents under Uncertainty. Artificial Intelligence, 94(1–2):7–56, 1997. [Poo08a] David Poole. AILog User Manual. Department of Computer Science, University of British Columbia, Vancouver, B.C., Canada, December 2008. [Poo08b] David Poole. The Independent Choice Logic and Beyond. In Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton, editors, Probabilistic Inductive Logic Programming: Theory and Application, volume 4911 of Lecture Notes In Computer Science, pages 222–243, Department of Computer Science, University of British Colombia, Vancouver, B.C., Canada, 2008. Springer. [Rei87] Raymond Reiter. On Closed World Data Bases. In Matthew L. Ginsberg, editor, Readings in Nonmonotonic Reasoning, pages 300–310. Morgan Kaufmann Publishers Inc., Los Altos, CA, USA, 1987. [RN03] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, Second edition, 2003. [Rot96] Dan Roth. On the Hardness of Approximate Reasoning. Artificial Intelligence, 82:273–302, 1996. [RSDD09] Michele Ruta, Floriano Scioscia, Tommaso Di Noia, and Eugenio Di Sciascio. Reasoning in Pervasive Environments: An Implementation of Concept Abduction with Mobile OODBMS. In Proceedings of IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent 125 BIBLIOGRAPHY Technology (WI-IAT 2009), volume 01, pages 145–148, Milan, Italy, September 2009. IEEE Computer Society. [SD05] Parag Singla and Pedro Domingos. Discriminative Training of Markov Logic Networks. In Manuela M. Veloso and Subbarao Kambhampati, editors, Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, volume 2, pages 868–873, Pittsburgh, Pennsylvania, USA, July 2005. AAAI Press / The MIT Press. [SD08] Parag Singla and Pedro Domingos. Lifted First-Order Belief Propagation. In Dieter Fox and Carla P. Gomes, editors, Proceedings of the TwentyThird AAAI Conference on Artificial Intelligence (AAAI 2008), pages 1094–1099, Chicago, Illinois, USA, July 2008. AAAI Press. [Sha05] Murray Shanahan. Perception as Abduction: Turning Sensor Data into Meaningful Representation. Cognitive Science, 29(1):103–134, 2005. [SKC96] Bart Selman, Henry Kautz, and Bram Cohen. Local Search Strategies for Satisfiability Testing. In David S. Johnson and Michael A. Trick, editors, Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, volume 26 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 521–531, Washington, DC, 1996. American Mathematical Society. [SP06] Evren Sirin and Bijan Parsia. Pellet System Description. In Proceedings of the 2006 Description Logic Workshop (DL-2006). CEUR Electronic Workshop Proceedings, 2006. [Tha78] Paul R. Thagard. The Best Explanation: Criteria for Theory Choice. The Journal of Philosophy, 75(2):76–92, February 1978. [Var82] Moshe Y. Vardi. The Complexity of Relational Query Languages. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing (STOC ’82), pages 137–146, San Fransisco, California, USA, May 1982. ACM. [YFW01] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized Belief Propagation. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 689–695. MIT Press, Cambridge, MA, 2001. 126