v0.5 - Eclipse Wiki

Transcrição

v0.5 - Eclipse Wiki
Crawler
Agent
UNZIP
Rules
Index
Update
Stellent
TextMining
Storage
Update
Specialized
Algorithm(s)
Crawler
Agent
Crawler
Agent
iFilter
Annotations
Duplicate
Finder
Image
Recognition
Index
Deployment
Connectivity
Connectivity
Onto
Store
Index
(Index) Processing Queue
Key Ideas
•Crawlers/Agents push data into Connectivity / Entry Point
•Connectivity Module filters, converts versions, extracts binaries etc. and pushes into
queue
Typical Server
•Message-driven queue stores data and guarantees delivery
Typical Process
•1…n servers respond to messages, process data and write back to queue
•Potentially multiple instances of servers for load balancing and increased throughput
•Open issue: synchronization of persistence
•1…n processes inside server arranged via BPEL (~pipelines, ~strategies)
•Search yet to be defined separately but the objective is to separate the processes of
(a) filling the index and (b) using the index for search (unlike in e:IAS)
Persistence
Monitoring
Object Communication
XML Communication
Rules
Rules
TextMining
TextMining
Retrieval
Retrieval
Completion
Completion
Suche
Index
Indizierung
Crawler
Agent
Crawler
Agent
Crawler
Agent
UNZIP
Rules
Stellent
TextMining
iFilter
Annotations
Duplicate
Finder
Image
Recognition
Index
Update
Specialized
Algorithm(s)
Connectivity
Connectivity
Onto
Store
(Index) Processing Queue
Storage
Update
Index
Deployment
IncUpdate
Crawler
Agent
IncUpdate
Crawler
Agent
Stellent
Stellent
Connectivity
Connectivity
PreProcess
Connectivity
Connectivity
PreProcess
Zone 1
Zone 2
Crawler
Agent
UNZIP
Rules
Index
Update
Stellent
TextMining
Storage
Update
Specialized
Algorithm(s)
Crawler
Agent
Crawler
Agent
iFilter
Annotations
Duplicate
Finder
Image
Recognition
Index
Deployment
Main Sys
Connectivity
Connectivity
Onto
Store
(Index) Processing Queue
Index
Im wesentlichen
e:IAS Connect
als BPEL
Prozess
UNZIP
Crawler
Agent
Stellent
Lucene
Crawler
Agent
HTML2TXT
Crawler
Agent
…
Connectivity
Connectivity
Index
v0.5
(Index) Processing Queue
Prototyp
Implementierung
Konzept
1+1
Overhead
Queue
1
Storage
1
7 MM
XML Storage
1
BPEL+XML
1
Komponentisierung
1
M1
Erstimplementierung
2+2
Spezifikationen
2
Komponentisierung
2
30 MM
Doku+WIKI
1
DEV Orga
1
M0
M1
M1
M2
M1
M4
Im wesentlichen e:IAS als
black box angebunden,
Connect ausgelagert, noch
ohne BPEL
Crawler
Agent
Crawler
Agent
Crawler
Agent
UNZIP
Rules
Stellent
TextMining
HTML2TXT
Annotations
…
Index
Update
Connectivity
Connectivity
Index
v0.6
(Index) Processing Queue
e:IAS als BPEL Server,
aber noch mit proprietärer
Ontologie
Crawler
Agent
Crawler
Agent
Crawler
Agent
UNZIP
Rules
Stellent
TextMining
HTML2TXT
Annotations
…
Index
Update
Connectivity
Connectivity
Index
v0.7
(Index) Processing Queue
Ontologie auf Basis RDF/OWL,
e:IAS Services entkoppelt von
OOML
Crawler
Agent
Crawler
Agent
Crawler
Agent
UNZIP
Rules
Index
Update
Stellent
TextMining
Storage
Update
HTML2TXT
Annotations
Index
Deployment
…
Index
Update
Connectivity
Connectivity
Onto
Store
v0.9
(Index) Processing Queue
Index