Cas File
Content-addressable storage, also referred to as content-addressed storage or abbreviated CAS, is a way to store information so it can be retrieved based on its content, not its location. It has been used for high-speed storage and retrieval of fixed content, such as documents stored for compliance with government regulations. Content-addressable storage is like content-addressable memory.
CAS uses the Slf4J Logging framework as a facade for the Log4J engine by default. The default log4j configuration file is located in src/main/resources/log4j2.xml. By default logging is set to INFO for all functionality related to org.apereo.cas code. For debugging and diagnostic purposes you may want to set these levels to DEBUG. Content-addressable storage (CAS) You walk to the coat-check desk and hand them your coat. The attendant weighs your coat, checks the brand, the size, the number of buttons and zippers, types it all in, and the computer spits out a 'hash code' from 1 to 99999. An empty hanger is found, and the hash code is associated to the hanger number. What is a CAS file? We know of 10 different uses of the CAS file type, one of them being Atari Cassette tape image. Read the details below. The CAS file extension indicates to your device which app can open the file. However, different apps may use the same file extension for different types of data.
One cold winter night in 1991, best friends Vicki Arnold and Julie-Anne Leahy vanished from their town of Atherton, Queensland. The two women had left for a late-night fishing. Click Delete Files. Click OK on the Delete Files dialog box. Close your browser. Click Start, select Settings and Control Panel. Double-click Internet Options. Click Delete Files. Click OK on the Delete Files dialog box. Safari 1.x for Macintosh. Click the Safari menu and selectEmpty Cache.
CAS and FCS[edit]
Content Addressable Storage (CAS) and Fixed Content Storage (FCS) are two different acronyms for the same type of technology. Both are intended to store data that does not change over fixed periods of time. CAS typically uses a cryptographic hash function's digest generated from the document to identify that document in the storage system. If the hash function is weak, meaning that different inputs into the hashing algorithm could result in the same digest being created, there could be a situation where different documents retain the same digest being used to identify them. This exposes a potential weakness of relying on hashes to differentiate data. The inherent weakness of primitive hashing algorithms becomes a concern when working with incredibly large data stores or when data put into such a system could be created maliciously to exploit this weakness.
Content-addressed vs. location-addressed[edit]
When being contrasted with content-addressed storage, a typical local or networked storage device is referred to as location-addressed. In a location-addressed storage device, each element of data is stored onto the physical medium, and its location is recorded for later use. The storage device often keeps a list, or directory, of these locations. When a future request is made for a particular item, the request includes only the location (for example, path and file names) of the data. The storage device can then use this information to locate the data on the physical medium, and retrieve it. When new information is written into a location-addressed device, it is simply stored in some available free space, without regard to its content. The information at a given location can usually be altered or completely overwritten without any special action on the part of the storage device.
Within the scope of this discussion, a good way to think of the above is as container-addressed storage.
In contrast, when information is stored into a CAS system, the system will record a content address, which is an identifier uniquely and permanently linked to the information content itself. A request to retrieve information from a CAS system must provide the content identifier, from which the system can determine the physical location of the data and retrieve it. Because the identifiers are based on content, any change to a data element will necessarily change its content address. In nearly all cases, a CAS device will not permit editing information once it has been stored. Whether it can be deleted is often controlled by a policy.
CAS History[edit]
A hardware device called the Content Addressable File Store (CAFS). CAFS was developed by ICL in the UK in the sixties,[1] and British Telecom was one of the first customers.[2] Developed in the sixties [3] and available in the 1970s and 1980s, it provided location-addressed disk storage with built-in search capability. The search logic was incorporated into the disk controller, such that a query expressed in a high-level query language could be compiled into a search specification that was then sent to the disk controller for execution.
While the idea of content-addressed storage is not new, production-quality systems were not readily available until roughly 2003.[4] In mid-2004, the industry group SNIA began working with a number of CAS providers to create standard behavior and interoperability guidelines for CAS systems.[5]
CAS Efficiency[edit]
CAS storage works most efficiently on data that does not change often. It is of particular interest to large organizations that must comply with document-retention laws, such as Sarbanes-Oxley. In these corporations, a large volume of documents will be stored for as much as a decade, with no changes and infrequent access. CAS is designed to make the searching for a given document content very quick, and provides an assurance that the retrieved document is identical to the one originally stored. (If the documents were different, their content addresses would differ.) In addition, since data is stored into a CAS system by what it contains, there is never a situation where more than one copy of an identical document exists in storage. By definition, two identical documents have the same content address, and so point to the same storage location.
For data that changes frequently, CAS is not as efficient as location-based addressing. In these cases, the CAS device would need to continually recompute the address of data as it was changed. The client systems would be forced to continually update information regarding where a given document exists. For random access systems, a CAS would also need to handle the possibility of two initially identical documents diverging, requiring a copy of one document to be created on demand.
Typical implementation[edit]
Paul Carpentier and Jan van Riel coined the term CAS while working at a company called FilePool in the late 1990s. FilePool was acquired in 2001 and became the underpinnings of the first commercially available CAS system, which was introduced as EMC's Centera platform.[6] The Centera CAS system consists of a series of networked nodes (1-U servers running Linux), divided between storage nodes and access nodes. The access nodes maintain a synchronized directory of content addresses, and the corresponding storage node where each address can be found. When a new data element, or blob (Binary large object), is added, the device calculates a hash of the content and returns this hash as the blob's content address.[7] As mentioned above, the hash is searched to verify that identical content is not already present. If the content already exists, the device does not need to perform any additional steps; the content address already points to the proper content. Otherwise, the data is passed off to a storage node and written to the physical media.
When a content address is provided to the device, it first queries the directory for the physical location of the specified content address. The information is then retrieved from a storage node, and the actual hash of the data recomputed and verified. Once this is complete, the device can supply the requested data to the client. Within the Centera system, each content address actually represents a number of distinct data blobs, as well as optional metadata. Whenever a client adds an additional blob to an existing content block, the system recomputes the content address.
To provide additional data security, the Centera access nodes, when no read or write operation is in progress, constantly communicate with the storage nodes, checking the presence of at least two copies of each blob as well as their integrity. Additionally, they can be configured to exchange data with a different, e.g., off-site, Centera system, thereby strengthening the precautions against accidental data loss.
IBM has another flavor of CAS which can be software-based, Tivoli Storage manager 5.3, or hardware-based, the IBM DR550. The architecture is different in that it is based on hierarchical storage management (HSM) design which provides some additional flexibility such as being able to support not only WORM disk but WORM tape and the migration of data from WORM disk to WORM tape and vice versa. This provides for additional flexibility in disaster recovery situations as well as the ability to reduce storage costs by moving data off the disk to tape.
Another typical implementation is iCAS from iTernity. The concept of iCAS is based on containers. Each container is addressed by its hash value. A container holds different numbers of fixed content documents. The container is not changeable, and the hash value is fixed after the write process.
Open-source implementations[edit]
One of the first content-addressed storage servers, Venti,[8] was originally developed for Plan 9 from Bell Labs and is now also available for Unix-like systems as part of Plan 9 from User Space.
The first step towards an open-source CAS+ implementation is Twisted Storage.[9]
Tahoe Least-Authority File Store is an open source implementation of CAS.
Git is a userspace CAS filesystem. Git is primarily used as a source code control system.
git-annex is a distributed file synchronization system that uses content-addressable storage for files it manages. It relies on Git and symbolic links to index their filesystem location.
Project Honeycomb is an open-source API for CAS systems.[10]
The XAM interface was developed under the auspices of the Storage Networking Industry Association. It provides a standard interface for archiving CAS (and CAS like) products and projects.[11]
Perkeep is a recent project to bring the advantages of content-addressable storage 'to the masses'. It is intended to be used for a wide variety of use cases, including distributed backup, a snapshotted-by-default, a version-controlled filesystem, and decentralized, permission-controlled filesharing.
Irmin is an OCaml 'library for persistent stores with built-in snapshot, branching and reverting mechanisms'; the same design principles as Git.
Cassette is an open-source CAS implementation for C#/.NET.[12]
Arvados Keep is an open-source content-addressable distributed storage system.[13] It is designed for large-scale, computationally intensive data science work such as storing and processing genomic data.
Infinit is a content-addressable and decentralized (peer-to-peer) storage platform that was acquired by Docker Inc.
InterPlanetary File System (IPFS), is a content-addressable, peer-to-peer hypermedia distribution protocol.
casync is a Linux software utility by Lennart Poettering to distribute frequently-updated file system images over the Internet.[14]
See also[edit]
- Content-centric networking / Named data networking
References[edit]
- ^Wikipedia, Content Addressable File Store', Wikipedia
- ^Wikipedia, Content Addressable File Store', Wikipedia
- ^Wikipedia, Content Addressable File Store', Wikipedia
- ^USENIX Annual Technical Conference 2003, General Track - Abstract
- ^CAS Industry standardization activities - XAM: http://www.snia.org/forums/xam
- ^Content-addressable storage - Storage as I See it, by Mark Ferelli, Oct, 2002, BNET.com
- ^Making a hash of file content Content-addressable storage uses hash algorithms., By Chris Mellor, Published: 9 December 2003, Techworld Archived 28 September 2007 at the Wayback Machine Article moved to https://www.techworld.com/data/making-a-hash-of-file-content-235/
- ^'Venti: a new approach to archival storage'. doc.cat-v.org. Retrieved 30 June 2019.
- ^'Twisted Storage'. twistedstorage.sourceforge.net. Retrieved 30 June 2019.
- ^'Archived copy'. Archived from the original on 12 October 2007. Retrieved 1 October 2007.CS1 maint: discouraged parameter (link) CS1 maint: archived copy as title (link)
- ^'The XAM (eXtensible Access Method) Interface specification'.
- ^A simple content-addressable storage system for .NET 4.5 and .NET Core: point-platform/cassette, Point Platform, 6 May 2019, retrieved 30 June 2019
- ^'Keep - Arvados'. dev.arvados.org. Retrieved 30 June 2019.
- ^'Lennart Poettering Announces New Project: casync - Phoronix'. Phoronix.
External links[edit]
This registry reads services definitions from JSON configuration files at the application context initialization time.JSON files are expected to be found inside a configured directory location and this registry will recursively look through the directory structure to find relevant JSON files.
Support is enabled by adding the following module into the overlay:
A sample JSON file follows:
To see the relevant list of CAS properties, please review this guide.
Clustering ServicesYou MUST consider that if your CAS server deployment is clustered, each CAS node in the cluster must haveaccess to the same set of JSON configuration files as the other, or you may have to devise a strategy to keepchanges synchronized from one node to the next.
The JSON service registry is also able to auto detect changes to the specified directory. It will monitor changes to recognizefile additions, removals and updates and will auto-refresh CAS so changes do happen instantly.
Escaping CharactersPlease make sure all field values in the JSON blob are correctly escaped, specially for the service id. If the service is defined as a regular expression, certain regex constructs such as '.' and 'd' need to be doubly escaped.
The naming convention for new JSON files is recommended to be the following:
Based on the above formula, for example the above JSON snippet shall be named: testJsonFile-103935657744185.json
. Remember that because files are created based on the serviceName
, you will need to make sure characters considered invalid for file names are not used as part of the name. Furthermore, note that CAS MUST be given full read/write permissions on directory which contains service definition files.
As you add more files to the directory, you need to be absolutely sure that no two service definitionswill have the same id. If this happens, loading one definition will stop loading the other. While service idscan be chosen arbitrarily, make sure all service numeric identifiers are unique. CAS will also output warningsif duplicate data is found.
JSON Syntax
CAS uses a version of the JSON syntax that provides a much more relaxedsyntax with the ability to specify comments.
A given JSON file for instance could be formatted as such in CAS:
Note the trailing comma at the end. See the above link for more info on the alternative syntax.
Legacy Syntax
A number of legacy service definitions, supported by CAs automatically, are listed below.
CAS Add-ons
Originally developed as an extension for CAS 3.5.x
, this add-on provided JSON syntax support in form of a single file that contained all service definitions. An example legacy JSON file is listed below for reference:
CAS is able to transform this definition into one that is officially supported. The results of transformations are written into a temporary file where the user is warned about the presence of this legacy behavior and the location of the transformed files. Changes should be reviewed and ultimately put into use in the relevant directory location to be loaded by the registry.
To activate support for this legacy syntax, the services registry file needs to be renamed servicesRegistry.json
and must be placed in the same directory as all other JSON service definition files.
A few things to note:
- The
extraAttributes
property is ignored and may not be transformed. - Service identifier patterns in the legacy syntax may be specified as ant patterns. These patterns are automatically massaged by CAS in small ways during transformations to ensure they are turned into a valid regular expression as much as possible. You should of course review the results and make any manual modifications necessary to make the pattern functional.
Jasig Namespace
Cas File Format
CAS automatically should remain backward compatible with service definitionsthat were created by a CAS 4.2.x
instance. Warnings should show up in the logswhen such deprecated service definitions are found. Deployers are advised to review each definitionand consult the docs to apply the new syntax.
Cas File Fluent
An example legacy JSON file is listed below for reference:
Replication
If CAS is to deployed in a cluster, the service definition files must be kept in sync for all CAS nodes. Please review this guide to learn more about available options.
Auto Initialization
Cas File Opener
Upon startup and configuration permitting, the registry is able to auto initialize itself from default JSON service definitions available to CAS. See this guide for more info.