Dynamic distributed data caching and p2p relationship

Introduction to Cache Management | Pivotal GemFire Docs

An Adaptive Peer-to-Peer Network for Distributed Caching of. OLAP Results . tributes may be related via a hierarchy of relationships, a common example of which is proxy-server cannot cache dynamic data, they employ ac- tive caching . A distributed in-network caching scheme for P2P-like content chunk delivery . Wong W., Content routers: Fetching data on network path, . J. Dai, Collaborative hierarchical caching with dynamic request routing for A layered structure is presented to model various relationships between interests. Points out the need for caching dynamic web sites, and the inadequacy of The web is data-oriented, not connection-oriented, or is at least more so than . Since peer-to-peer was designed to share music and not HTML documents, . It's basically a centralized directory with distributed data oculo-facial-surgery.infoforge. net/.

Rsync depends on changes being localized within the file. Files with small changes spread widely across them, such as search engine indices, don't update well using rsync, suggesting that something more flexible would be preferred.

Since the WAR is already Java-based, perhaps specifying Java classes, or pointers to Java classes, in the WAR for performing incremental WAR updates would provide a powerful mechanism for tailoring the update mechanism to the type of files contained in the archive.

Perhaps many of these functions, like deciding the validity of a WAR, should be specified via Java classes, for maximum flexibility. Security and authentication are major concerns, especially in a cached environment. In this case, some protocols exist to provide authentication services, yet have many outstanding issues. Some are not widely deployed - DNS key services, for example.

USB2 - Method and system for dynamic distributed data caching - Google Patents

The most widely deployed solution - X. Web security can't be just for those who can and will shell out hundreds of dollars for certificates that keep expiring. In a heavily cached environment, it's easier than ever to spoof somebody's URLs, and X. For more rapid response time, the Range: Of course, in addition to such a "partial retrieval", a cache could do a "full retrieval", obtaining the entire packaged WAR and begin sharing it with other caches.

The decision of how to choose between partial and full retrieval is left "for further study", in other words, the user has to make those decisions manually until we figure it out better.

Napster has demonstrated that letting the users make caching decisions manually is workable, so long as the cache items are reasonably sized Expires March draft-baccala-dynamic-content-caching A major choice remains, that of the search protocol to find the cached WARs. Mainstream caching research tends to largely ignore the most successful example of a cached network service - Napster and its various spinoffs, most notably Gnutella, which seem to go by the buzzword peer-to-peer file sharing, or P2P.

For example, RFC"Internet Web Replication and Caching Taxonomy", a January document discussing "protocols, both open and proprietary, employed in web replication and caching today," never mentions the word "Napster". Peer-to-peer seems to be the way to go. The legal problems of Napster and the highly critical reception of the technical community to Gnutella suggest against either of these protocols. At present, LDAP seems the best choice, due to its maturity as a protocol, the widespread availability of both client and server implementations, and its straightforward application to the problem at hand.

The only serious issue surrounding LDAP is the lack of a standardized means for server location in a P2P environment, the critical issue swirling around Gnutella. I suggest dealing with both the security issues and the P2P server location issue through a simple solution: This allows site administrators to use resource records to specify both a set of LDAP servers to search for WARs, as well as cryptographic keys to verify the contents of those WARs once they are retrieved.

Although this makes proper operation of a cached web site dependent on proper DNS operation, this should presently be a minor tradeoff, since proper web site operation is already based on DNS, and DNS had proven to be one of the most reliable of the Internet technologies. Thus, to enable dynamic web caching, as outlined in this document, a web server administrator should add two kinds of additional resource records to the web server's DNS records.

These LDAP servers should form a replicated set, so that a response from any one of them should be considered a complete answer by a client. These servers may also allow arbitrary, unauthenticated web caches to add entries to the LDAP directory when they elect to cache one or more of a site's WARs. Since clients are expected to cryptographically verify a WAR upon retrieving it, allowing unauthenticated additions to an LDAP directory should not allow site spoofing, but a large number of bogus WAR entries could form the basis for a denial of service attack.

A benefit of this proposal is that site administrators can select sets of LDAP servers based on their own policies. At least one set of publicly updatable, replicated, highly available LDAP servers should exist for the use of Expires March draft-baccala-dynamic-content-caching Also, the web administrator will need to add at least one KEY record specifying a public key that must be used by clients to validate the integrity of a retrieved WAR.

So, for example, consider the "www. An alternative to retaining the original server Expires March draft-baccala-dynamic-content-caching Thus, any legacy client trying to retrieve a page from this web site would be automatically directed to a web cache. It'd be convenient to specify a CNAME for "www" pointing to a set of A records for the web caches, but of course this would preclude a unique KEY record for the server. A typical client request would follow these steps: Client is configured to use a local web cache, or, attempts a standard retrieval and gets A records for web caches 2.

Client sends request to web cache 3. Web cache validates that WAR was signed using the private key corresponding to the public key retrieved in the DNS KEY record, and recurses to step 5 using a different remote cache if not 7.

If the cache elected to retrieve the entire WAR, it subject to considerations like being behind a firewall registers itself with one of the site's LDAP servers as possessing the WAR and being willing to serve it to other sites 7a LDAP servers replicate this information among themselves 8. Web cache runs the Java in the WAR to generate the dynamic web page and returns the result to the client Several other options present themselves.

Perhaps the LDAP directory should include entries for web caches willing to run the Java and serve the dynamic pages themselves, though this would be present a security risk since those caches might be untrusted by either client or server. Perhaps provision could be made for the server to issue X. Perhaps the user should be prompted before embarking on the potentially time Expires March draft-baccala-dynamic-content-caching Finally, the functionally of a "locally trusted cache" should ultimately be rolled into the client itself, which should retrieve and verify the integrity of the WAR before running the Java itself.

In summary, I recommend the following steps: Recognize the importance of data-oriented design, as opposed to connection-oriented design. Break the dependence on special server configurations and realize that the client has to do almost all the work in a scalable, cached, redundant web architecture.

Select standards for the network delivery of executable web content, and for packaging the contents of a web server into a single compressed archive. Specify the security environment in which these "foreign" WARs will operate. Extend Squid to support the algorithm outlined above. Alternately, extend Apache Tomcat to function as a web cache, with similar features. The caching scheme outlined above is far from perfect. Stated another way, static information comprises content that, when created, is expected to remain the same for an indeterminate amount of time.

For example, a restaurant menu is the same regardless of the user who is accessing the menu because the menu is the same for all people.

US7694076B2 - Method and system for dynamic distributed data caching - Google Patents

However, as chefs at the restaurant change over time, the menu may also change. In contrast, dynamic information comprises content that is expected and designed to change. The dynamic content may change based on the data and criteria used for generating the dynamic content, such as a search result page.

Often, the dynamic information will be expressed as a single HTML web page, but the information within the HTML web page has been generated dynamically based on some suitable criteria. For example, the result of a search using a search engine on the Internet returns different information based on the search terms provided by the user.

Introduction to Cache Management

The search results may also depend on one or more attributes associated with the search request, such as geographic location or the current date. For another example, a user searching for information about current events will want results that are tailored to the user's search terms, the user's location, and the current date. For yet another example, an online book retailer may provide price information and consumer reviews of books available from the online book retailer.

The price information for a particular book may change unexpectedly in response to a sale or a sudden interest in that particular book. Also, the web page listing the book and the consumer reviews changes in response to a consumer entering a new review of the book.

Referring again to FIG. ISP 14 comprises a point of presence on network 16 for communicating data from clients 12 to remote locations. ISP 14 may also define the boundary of community Community 15 comprises a plurality of clients 12 at whom content items retrieved by browsers 30 may be cached in cache portions Community 15 represents a group of clients 12 which cooperate to form a distributed caching system using cache module 26 and portion Requests by browsers 30 within community 15 for content cached within community 15 do not need to be propagated over network 16 since the requested content is available within community In the disclosed embodiment, network 16 comprises the Internet.

Community 18 represents an exemplary cache community based around a corporate Intranet. The distributed caching capabilities of system 10 are not limited to home computers. The 10 megabit, megabit, gigabit and faster LAN technologies used by corporations are well suited to the distributed cache of system Other collections of computers may also form cache communities, communities 15 and 18 represent two examples of possible cache communities.

Community 18 may comprise a corporate intranet having a communications interface 50, a LAN 52 and a plurality of intranet clients Interface 50 comprises a communication interface between LAN 52 and Internet For example, interface 50 may comprise a firewall, a router or other suitable communications interfaces. Interface 50 may also define the boundary of community Intranet clients 54 are similar to ISP clients 12 except that clients 54 are members of an Intranet.

Community 18 operates similarly to community 15, except as otherwise noted. Origin server 19 communicates data over network Origin server 19 may comprise a single computer executing software or may comprise a plurality of computers each executing software.

In the disclosed embodiment, origin server 19 comprises an HTTP server which may also be known as a web server. Origin server 19 may additionally support other protocols such as the file transfer protocol FTP. Origin server 19 retrieves information from one or more data sources not shownsuch as a storage device coupled to server 19 or other origin servers, in response to requests Origin server 19 is operable to retrieve static content, such as prewritten text files, images, and web pages, from the data source in response to requests Origin server 19 is also operable to generate new, dynamic content, for example, by dynamically creating web pages based on content stored in the data source in response to requests For example, origin server 19 may generate a new web page using a common gateway interface CGI script, generate a new web page from the result of a structured query language SQL request and perform other suitable content generation functions.

Origin server 19 may also be operable to generate executable software, such as applications and applets, in response to requests for data. For example, origin server 19 may generate a Java applet in response to an appropriate request In operation, browser 30 generates request 32 for content.

Operation of system 10 will be described with respected to cache community 15, however, it should be noted that cache community 18 operates similarly using clients Cache module 26 then intercepts request 32 before request 32 is communicated to network Cache module 26 examines request 32 to determine whether the requested content is available in community If the requested content is available in community 15, cache module 26 retrieves the requested content from the appropriate storage portion 28 within community 15 and returns the requested information to the browser 30 which requested it.

If the requested content is not available within community 15, then cache module 26 forwards request 32 over link 13 to ISP 14 for normal handling.

Similarly, a request 32 generated by a browser on a client 54 is intercepted by cache module 26 to determine whether the requested content is available within community Cache module 26 may be configured to control the amount of processor power, storage space and bandwidth of a particular client 12 used by community The client-by-client control of usage available to clients 12 allows for individual tailoring of community 15 to particular clients The client-by-client control of usage also allows for different incentive plans for subscribers if ISP For example, a subscriber to ISP 14 may have a second computer separate from the computer normally used by the subscriber.

The subscriber with two computers could dedicate a large percentage of processor 20 and storage 24 to community 15 in exchange for ISP 14 providing a second IP address for the second computer over a DSL type link 13 for free. Community 18, representing a corporate intranet, may allow for centralized control of the percentage of the processing power, storage and bandwidth used by community 18, such as by a corporate information technology IT department.

In one embodiment, cache module 26 may cache content using a conservative mode or an aggressive mode. When in the conservative mode, cache module 26 caches content received by browser 30 which is marked as cacheable.

When in the aggressive mode, cache module 26 caches all content unless the content has been explicitly marked as non-cacheable. In general, by caching all content, unless the content is listed as non-cacheable, more content may be cached in comparison to conservative mode caching.

Cache modules 26 using aggressive mode caching may further include communicating with a data center. More specifically, cache module 26 may communicate with the data center to inform the data center of data cached by cache module Aggressive mode caching may use a content expiration protocol to avoid providing expired, but cached, content. The data expiration protocol may use data expiration commands to inform cache modules 24 that data at an origin server 19 has changed. Alternatively, a single cache module within a community 15, such as the master node discussed below, may communicate with the data center.

By informing the data center of data cached within community 15, the data center can send data expiration commands to community 15 so that cache modules 24 can mark cached content as expired. The data expiration command comprises any suitable message for expiring data stored by cache module The ICSP message may expire any of a single web page, a plurality of web pages at a single web site, a plurality of web pages at a plurality of web sites, a plurality of sites within a single domain and one or more specific objects on a web page, such as an image.

For example, the ICSP message may expire a single web page such as http: The ICSP message may expire a plurality of web pages at a single web site such as http: The ICSP message may expire a plurality of pages at a plurality of web sites such as http: The ICSP message may expire a plurality of web sites such as http: For another example, a single active server page ASP may result in many individual cached pages. A single ASP page may generate large numbers of individual cached pages because a single ASP page can dynamically create multiple different specific pages, such as http: In general, the data center may generate the data expiration command in response to a change in the content at origin server The data center may also generate the data expiration command in response to the elapsing of a predetermined time period.

ICSP supports the synchronization of cached content in community 15 with updated content available at origin server In addition, cache module 26 may provide a guaranteed click delivery capability.

The guaranteed click delivery capability comprises the capability to regularly check whether a particular web page is available and to retrieve the web page when the web page becomes available. For example, a user of client 12 may attempt to retrieve a particular web page. The server providing that web page may be currently overloaded and unable to provide the requested web page.

For example, a busy server may comprise a server which is currently processing substantially all the requests 32 which the server is capable of handling. For another example, a busy server may comprise a server which is providing content and using substantially all of the bandwidth available to the server.

In general, a busy server may comprise a server which is incapable of processing more requests 32 at a given time for one or more reasons. Cache module 26 may then display the retrieved web page in browser 30 or may abandon the attempt to retrieve the web page after a predetermined period of time has elapsed without successfully retrieving the requested web page.

Cache module 26 may also ask the user whether the user wants cache module 26 to attempt to retrieve the requested web page from the busy server. Typically, cache module 26 would attempt to retrieve the requested web page from the busy server while the user retrieves and views other web pages from other origin servers Stated another way, cache module 26 would attempt to retrieve the requested web page in the background while the user may also be performing other tasks.

Yet another capability of cache module 26 is the ability to provide a screen saver to a user associated with client The screen saver displays a graphical representation of the user's response time to one or more origin servers For example, the response time between client 12 and a particular web site may be displayed in a graphical manner. More specifically, the screen saver displays a solar system-like graph with client 12 generally in the center and the distance between client 12 and other web sites displayed based on the round-trip ping time between client 12 and the other web sites.

Community comprises a first clienta second clienta third client and an ISP In the exemplary embodiment of FIG. Clientsand communicate with ISP over respective communication links Client comprises a browserstorageand a cache module Browsersand represent distinct examples of browsers 30 of FIG. Each of storage, and respectively support cache portions, and Storage, and represent distinct examples of storage 24 of FIG.

Cache portions, and represent distinct examples of cache portions 28 of FIG. Cache modules, and support respective location tablesand Each of cache moduleand are operable to generate respective cache status messagesand Cache modules, and represent distinct examples of cache modules 26 of FIG.

Location tablesand represent distinct examples of location table 29 of FIG. Cache status messagesand represent distinct examples of cache status message 27 of FIG. Location tables, and respectively comprise one or more indications of which clientor to cache content in response to requests 32 from browsers, and For example, location table may indicate that content identified by URLs having a domain name beginning with A-D is cached at clientwhile domain names E-H are cached at client and domain names H-Z are cached at client For another example, location tablesand may indicate particular ranges of IP addresses to be cached at particular clientsand In general, tablesand may use any suitable indication for indicating which clientsand to cache content at, such as IP addresses, domain names, portions of URLs or a hash value based on request Cache status messagesand each comprise a message respectively generated by modulesand to indicate to other modulesand that the generating module is activating or deactivating its caching functionality.

For example, when cache module at client is activated it may generate a cache status message indicating caching is active at client Communications link comprises any suitable data communications system. In operation, in one embodiment, community may be formed by dynamically seeking out other active instances of cache module Then, based on a set of performance heuristics, clients 12 are bonded together under favorable conditions. Cache module 26 may use dynamic affiliation algorithms to build and manage communities More specifically, on startup, cache module 26 may communicate with a remote directory provider for assistance in finding other cache modules 26 with which to form a community Using the assistance from the remote directory provider, the client may attempt to contact and possibly join a currently existing community If no communities are found or found communities do not allow cache module 26, then cache module 26 may attempt to start its own cache community.

Alternatively, if no remote directory is available, cache module 26 searches for communities itself. Each community includes a master node and, optionally, one or more normal nodes. A master node comprises a cache module 26 on a particular client 12 which is responsible for supervising the addition and departure of clients from community The master node receives data associated with the addition of a client 12 to community and the departure of a client 12 from community and communicates the data to the other members of community Any cache module 26 may function as the master node.

Any suitable method for electing the master node may be used by cache modules For example, a cache module 26 which has been activated the longest may be selected as the master, with ties being resolved randomly.

The departure of the master node causes the election of a new master node by the remaining members of community Community handles the graceful and non-graceful departure of clients 12 from community A graceful departure comprises an intentional departure of a client 12 from community For example, a graceful departure may occur when a user deactivates cache module A non-graceful departure comprises an unexpected departure of a client 12 from community For example, a non-graceful departure may occur when a client 12 suddenly crashes and ceases operation.

When an active cache module 26 shuts down, for example, the cache module 26 requests to leave community and the request circulates through the remaining community members. The remaining community members would then discontinue forwarding requests to that client In a non-graceful scenario, a managing peer known as a Master watches for dead peers and notifies the rest of a community if this condition is detected.

Similarly, the managing peer may depart gracefully or non-gracefully.

Peer to Peer Network - P2P Network - Fundamental concepts explained with example - What Why How

A graceful departure of the managing peer comprises the managing peer informing community that the managing peer is leaving community An election is then held by the remaining members of the peer to select the new managing peer. When a non-graceful departure occurs, such as when the managing peer crashes, a cache module 26 may detect that the managing peer is not responding and call an election to create a new managing peer.

In general, any suitable techniques may be used to handle the arrival and departure of cache modules 26 from communityand to create and maintain the managing peer. For increased clarity, the operation of the exemplary cache community of FIG. The method begins at step where browser generates a request 32 for content. Next, at step module intercepts request 32 generated by browser Then, at stepcache module determines the URL associated with request Proceeding to stepcache module determines the location where the content associated with the URL associated with request 32 would be cached.

More specifically, cache module determines which of the storage portionsorwould store the requested content based on information in location table In the example of FIG.

Next, at stepcache module checks storage portion for the requested content and at decisional stepdetermines whether the requested content has been cached. More specifically, cache module queries cache module to determine whether the content associated with the URL in request 32 has been cached in portion at client If cache module replies that the requested content is cached in portionthen the YES branch of decisional step is followed to step where the requested content is retrieved from storage portion and, at stepis displayed at browser If the requested content is not cached at portionindicating that the requested content is not available within communitythen the NO branch of decisional step leads to step At stepthe requested content is retrieved from origin server 19 since the requested content is not cached within community The requested content is then displayed on browser at step and, at stepa copy of the requested content is communicated to cache module for storage in portion Then, at stepthe retrieved content is stored in portion by module Returning to the start state of FIG.