TeraText® Solutions
Solutions built around the TeraText product suite are focused on taking archives and collections of text-rich data and documents and providing the customer with high-value information.
Overview
In intelligence gathering, the challenge is akin to finding a needle in a haystack — the analyst must quickly examine vast amounts of information from many sources — and then assemble the precise bits that will lead to life-saving decisions. TeraText Database System (DBS) is highly scalable and updates the indices on the fly, making information instantly available to the analyst.
Large-Scale, Dynamic Applications
In these applications, the information repository can range from a few gigabytes to many terabytes. The repository may be static but, more typically, is continually growing - usually at the rate of several hundred to a thousand documents per second. For example, in the case of a newsfeed, more than one gigabyte of new data can arrive each day from a variety of countries. Other application areas may need to handle even greater dataflow. Some applications also need to migrate legacy data for archiving. For all large-scale, high-load intelligence applications, high-performance hardware and software architectures, such as multiprocessor Unix® workstations, have to be deployed.
Building Information Repositories
The most important task when building an intelligence application is building and maintaining the information repository. When a new document is inserted into the repository, every word in the document must be extracted and indexed. This is a very expensive operation, as a document may contain several thousand words. In addition, the amount of information to process can be very large. TeraText DBS has been optimized for just such high-volume environments, handling the update process efficiently.
Another problem is new documents may be arriving at the same time as the database is in use for searching. Although many existing text database systems support fast batch loading of data as an overnight operation when the database is off-line, they do not allow updates of the repository during the day, when the database is in use. However, for any organization that requires up-to-date access to the most recent data or 24/7 access to its intelligence, this is not acceptable. TeraText DBS has been specifically designed to help support concurrent updates and queries, providing 24-hour access to up-to-date information.
Searching Information Repositories
The reason for building an information repository is to provide access to the data it contains. Because the document collection can be very large, advanced search techniques are needed to locate desired information. TeraText DBS has been developed to support just such sophisticated searching. Queries can use Boolean logic, word position information (such as "same sentence," "same paragraph," "within words"), document structure, and ranked relevance queries (where the documents are returned in order of relevance to the query) to locate target data. Each query type can be combined as required. For example, to achieve high accuracy when querying a collection, a searcher could combine a Boolean query with a ranked query to identify a subset of the collection that can then be ranked against a set of ranking terms. Fuzzy matching is also important. For example, it can be common to have several alternative spellings (or misspellings) of a word. TeraText DBS provides support for fuzzy matching by computing a distance measure between two terms so that the presence of alternate spellings need not frustrate the user's task.
Repository Management
To maintain large, high-performance information repositories, the quality of a system's database administration capabilities are of the utmost importance. For very large repositories, it can be desirable to split the data collection over multiple databases. TeraText DBS has the ability to do just that, while retaining the ability to search each database in parallel. With critical information collections, it is necessary to be able to back up repositories efficiently and robustly, and to be able to monitor and refine database performance. TeraText DBS provides administration utilities that are of the highest quality and reliability, and that deliver the finest level of control.
A Proven Track Record
TeraText DBS provides an advanced, extremely rich and reliable set of capabilities that support high-performance, secure intelligence applications. TeraText DBS has been successfully adopted by the Departments of Defense of both Australia and the United States for managing and searching large repositories of information.
UNIX is a registered trademark of X/Open Company Ltd in the United States and/or other countries.
Download the XML Intelligence White Paper
Discover how XML enables a high-volume, near-real-time Information Analyst Support System (IASS). Download Now »
Questions? Contact Us
For additional information about the TeraText suite of products, please contact us today.
Overview
In addition to TeraText for Legislation, which provides a comprehensive set of tools for drafting, managing and publishing legal and regulatory materials, the TeraText Database System is frequently deployed as a tool to assist in the electronic publication of legal materials.
Example Site: Australian Tax Office (ATO)
The Legal Database is a collection of legal and policy information. Here you have access to much of the material the ATO uses when making decisions, including legislation and supporting material public rulings, determinations and bulletins on ATO Interpretative Decisions Case Decision Summaries (details of important decisions made by the ATO), ATO Practice Statements (directions to ATO staff on how to apply the laws administered by the commissioner), Taxpayer Alerts, tax-related case law, ATO Policy Papers, Annual Regulatory Plans, and Freedom of Information request results.
Example Site:
Australian Tax Office
The Australian Tax Office makes available an archive of legal and policy information. View the site »
Years of Email. Seconds to Find.
Leidos's TeraText Searchable Archive for Files and Email (SAFE) is an enterprise-class search platform that enables government agencies and corporations to archive, store and search emails, files, and attachments in real time. Search results appear within seconds from a single application.
In addition to TeraText SAFE, the TeraText DBS is also used to support archiving requirements, managing collections of archival metadata, federating across multiple collections, and supporting the retrieval of electronic and physical archived objects.
Example Site: British Columbia Archives
An example of advanced search capabilities can be found at the British Columbia archives (Canada). Several search options, including proximity and fuzzy word, are supported.
Example Site:
British Columbia Archives
Experience advanced capabilities such as proximity and fuzzy word search options. Try the "Free Form Query." View the site »
Overview
You may wish to add annotations to your technical documentation to reflect field knowledge obtained when using the manuals to repair various problems. In these cases, the annotations may represent valuable intellectual property. Each client and customer may require that access to those documents be restricted to their own personnel. Thus, the document repository to be delivered to the clients will generally consist of a core of common content, with additional content that is proprietary to specific clients.
The TeraText DBS and Document Management System (DMS) product set is able to maintain and control both the authoring and delivery environments and, if necessary, use XML document transformations to map the authoring document structure to the delivery structure.
Documentation Management Model
One of the keys to successful electronic delivery of technical documentation is the ability to reuse content (i.e., deliver content in a number of different ways from a single source). Reusing content allows the same document and document components to be used over and over again while minimizing both storage and document maintenance. Reuse guarantees consistency: Every user sees the same, correct version of a document. Reuse means efficiency: A document is written once only. Reuse allows for refinement: A document can be developed over time. It also allows, for example, different customized views of the same source documentation to be delivered to different classes of users. Similarly, it allows the same source documentation to be delivered in multiple formats.
Documentation Components
Managing database content is more than just storing the raw text of documents and their accompanying figures. Documents can have internal structure, and there can be an external structure relating separate documents. Documents are often interlinked in a number of ways, and these links are essential parts of the document content. When searching for documents, users often scan indexes to browse the terms contained in the document repository. These terms constitute the vocabulary of the document collection. Sophisticated users may also require the frequency of each of these terms in the document collection when conducting searches to produce more effective queries. Documents can also have associated metadata that provide information about the document, such as author, status, or security level. Metadata can also be used to drive more productive searches.
Customized Delivery and Effectivity
It is essential that an electronic publishing system delivers the correct document content, links, and vocabulary to each class of users accessing the system. The need to provide an accurate snapshot of the database contents (i.e., text, figures, links and vocabulary and term frequencies) for each particular class of users is referred to as "effectivity." Efficient provision of effectivity requires very sophisticated text database support.
Automatic Tables of Content
Another requirement for technical documentation is the ability to dynamically produce a table of contents (TOC) for each document from the XML document structure and content. Technical documents are often long, so that when viewing a fragment of a document, it is important to understand the location of that fragment in the context of the whole document. This can be achieved by displaying the TOC along with a document fragment when the fragment is displayed. Because documents change over time, it is necessary to generate the TOC dynamically as the user views the document.
Dynamic Update
Technical documentation can involve very large document collections, which must be updated dynamically. This means that the delivery systems must provide a scalable solution, one that is able to update and deliver content efficiently for fast-growing document collections.
Key Points
Some of the key requirements for a technical document delivery system include:
- The ability to repurpose content (e.g., support multiple delivery formats from a single source)
- Manage all components of documentation, including content, images, internal structure, links, vocabulary and metadata
- Support effectivity — deliver database snapshot appropriate to each class of users
- Provide dynamic tables of contents from the XML document structure and content
- Update and deliver documents quickly and efficiently
- Provide powerful navigation searching and viewing
- Provide scalable solutions
Questions? Contact Us
For additional information about the TeraText suite of products, please contact us today.
XML, MARC, RDF and Dublin Core
TeraText DBS has been successfully deployed to build and manage bibliographic and metadata repositories. With built-in support for Z39.50 and Bib-1 and MAchine-Readable Cataloging (MARC) bibliographic records, the ability to manage XML useful for managing Resource Description Framework (RDF) encodings of Dublin Core metadata or XML renditions of bibliographic information, libraries to support Open Archives Initiative — Protocol for Metadata Harvesting (OAI-PMH), and federated querying capabilities, the TeraText DBS provides a comprehensive platform for managing bibliographic and other metadata collections.
GILS and Z39.50
Deployments of such solutions either using TeraText Metadata Publishing System (MPS) or the TeraText DBS directly use the TeraText content server to store and search bibliographic or metadata entries. These entries can be accessed via Z39.50 using Dublin Core, extended Dublin Core (such as Government Information Locator Service (GILS) metadata fields), or custom metadata schema. Because the core technology supports Z39.50 and Dublin Core metadata in an extensible fashion, any TeraText solution can be easily designed to support the GILS protocols and requirements. MARC records and Bib-1 can also be supported directly. TeraText MPS provides a number of additional tools for collecting, updating and federating such collections.
Example Site:
Picture Australia®
Search for people, places and events in the collections of libraries, museums, galleries, archives, universities and other cultural agencies, in Australia and abroad — all at the same time. View the site »