v0.5 - Eclipse Wiki
Transcrição
v0.5 - Eclipse Wiki
Crawler Agent UNZIP Rules Index Update Stellent TextMining Storage Update Specialized Algorithm(s) Crawler Agent Crawler Agent iFilter Annotations Duplicate Finder Image Recognition Index Deployment Connectivity Connectivity Onto Store Index (Index) Processing Queue Key Ideas •Crawlers/Agents push data into Connectivity / Entry Point •Connectivity Module filters, converts versions, extracts binaries etc. and pushes into queue Typical Server •Message-driven queue stores data and guarantees delivery Typical Process •1…n servers respond to messages, process data and write back to queue •Potentially multiple instances of servers for load balancing and increased throughput •Open issue: synchronization of persistence •1…n processes inside server arranged via BPEL (~pipelines, ~strategies) •Search yet to be defined separately but the objective is to separate the processes of (a) filling the index and (b) using the index for search (unlike in e:IAS) Persistence Monitoring Object Communication XML Communication Rules Rules TextMining TextMining Retrieval Retrieval Completion Completion Suche Index Indizierung Crawler Agent Crawler Agent Crawler Agent UNZIP Rules Stellent TextMining iFilter Annotations Duplicate Finder Image Recognition Index Update Specialized Algorithm(s) Connectivity Connectivity Onto Store (Index) Processing Queue Storage Update Index Deployment IncUpdate Crawler Agent IncUpdate Crawler Agent Stellent Stellent Connectivity Connectivity PreProcess Connectivity Connectivity PreProcess Zone 1 Zone 2 Crawler Agent UNZIP Rules Index Update Stellent TextMining Storage Update Specialized Algorithm(s) Crawler Agent Crawler Agent iFilter Annotations Duplicate Finder Image Recognition Index Deployment Main Sys Connectivity Connectivity Onto Store (Index) Processing Queue Index Im wesentlichen e:IAS Connect als BPEL Prozess UNZIP Crawler Agent Stellent Lucene Crawler Agent HTML2TXT Crawler Agent … Connectivity Connectivity Index v0.5 (Index) Processing Queue Prototyp Implementierung Konzept 1+1 Overhead Queue 1 Storage 1 7 MM XML Storage 1 BPEL+XML 1 Komponentisierung 1 M1 Erstimplementierung 2+2 Spezifikationen 2 Komponentisierung 2 30 MM Doku+WIKI 1 DEV Orga 1 M0 M1 M1 M2 M1 M4 Im wesentlichen e:IAS als black box angebunden, Connect ausgelagert, noch ohne BPEL Crawler Agent Crawler Agent Crawler Agent UNZIP Rules Stellent TextMining HTML2TXT Annotations … Index Update Connectivity Connectivity Index v0.6 (Index) Processing Queue e:IAS als BPEL Server, aber noch mit proprietärer Ontologie Crawler Agent Crawler Agent Crawler Agent UNZIP Rules Stellent TextMining HTML2TXT Annotations … Index Update Connectivity Connectivity Index v0.7 (Index) Processing Queue Ontologie auf Basis RDF/OWL, e:IAS Services entkoppelt von OOML Crawler Agent Crawler Agent Crawler Agent UNZIP Rules Index Update Stellent TextMining Storage Update HTML2TXT Annotations Index Deployment … Index Update Connectivity Connectivity Onto Store v0.9 (Index) Processing Queue Index