Author Topic: TOSEC on the Internet Archive: Open Letter (Read 23461 times)

Jason Scott · « **on:** March 10, 2013, 09:27:36 AM »

Hello, everyone. My name is Jason Scott, and my position is "free-range Archivist" at the Internet Archive (archive.org). I'm here as a private individual, and not as a spokesman for the Archive in any way.

I've been working there for about two years and have used them and interacted with their staff for significant years before that. I'm now a full-time employee and have been busy bringing in several hundred terabytes of data into their stacks. Along the way I've been adding all sorts of material, ranging from videos, magazines, and books, all the way through to software, website snapshots, and scientific papers.

However, my main interests in life seem to center around computer history, especially home computer history of the 1970s and 1980s. To that end, I've made a documentary about computer bulletin board systems, as well as a documentary on text adventures. I've uploaded most of the raw interview footage of these films to archive.org as well.

As a few people noted here (and elsewhere), I've begun uploading collections of program images named in the TOSEC format to archive.org. These are being added as large ZIP files, which works better among the archive.org item framework. A .ZIP browser built into the system allows per-image references. A good example of this system in place is here: http://archive.org/details/Camputers_Lynx_TOSEC_2012_04_23

In the case of these items, I'm standardizing on the date of the set for that platform, with this first set being moved right up to the date of the collection I acquired (2012-04-23). As updates are done, I'll make new items. (I realize this means lots of redundancy in the image collections, but space is not a problem at the Archive, and it's easier to just have multiple items and move people forward over time). It's all still in rough shape and will be refined in the future.

I don't wish to pull anyone's energy or time away from the TOSEC work being done - I just know this project was going to gain attention over here, and I wanted it known I was the person doing this. Now for the why.

Huge organizations, museums and archives and libraries alike, have begun taking an interest in preserving software or aspects of software. In some cases they wish to preserve the items (say, a boxed commercial program) while in others, they find themselves desperately in need of older software (say, a copy of a word processing program or spreadsheet) to allow them to look at acquired old files they've been donated. They are often slow, are constantly hindered in their actions because of management or administrative concerns or standards, and are often forced to make less-of-two-evil decisions when it comes to the software being preserved.

TOSEC, meanwhile, has run a decades-plus massive worldwide effort to agnostically save as much of this software as possible. TOSEC has, with no question, blown past any other professional effort in terms of size and breadth of the software they've quantified and described. It is a stunning achievement. I have brought professional archivists near to tears showing them the work TOSEC has done.

So I've put it on the Archive. I realize there are concerns and debates about this effort, and I understand them. The Internet Archive is a non-profit library with worldwide servers dedicated to bringing humanity's knowledge to as much of the world as possible. We are known the most for the Wayback machine, but we also have scanned over 2 million books and put most of them online, as well as thousands of movies, hundreds of thousands of music tracks, and an extensive amount of television news programs from around the world. Every 90 seconds, the Archive adds a new book: http://statusboard.archive.org and many, many new files are uploaded every day, of all types.

I respect the TOSEC effort, and hope to mirror as much of it as will shake out over the next couple months and years at the Archive. It's a bold experiment, to be sure, but I believe very strongly that computer history needs to move forward and software must be treated like the culturally relevant artifact it is.

I'm reachable at jscott@archive.org for comments and questions.

Thanks.

PandMonium · « **Reply #1 on:** March 10, 2013, 05:37:28 PM »

Hi Jason, welcome aboard

I'll answer here, not sure if you prefer it by mail.
We don't see any problem with the interest of archive.org and others in our project, we have been working to become more open in the last few years. Just as you said, i knew archive.org mostly by the wayback machine, used to check older websites (even tosec ones like tosec.org and tosec.info). Still, i've known for a while that movies and other files were also being saved there but was surprised to software.

I personally don't think it's bad, but actually it might be good, by bringing more attention and hopefully help / interest in our work (cataloging software history). We aim to catalog any software, even non original stuff of very weird and less known companies and make it (our dats) freely available in a non closed/proprietary format. Something we want (and will try) to improve to a next level during this year. As such, people are welcome to use our work in a fair way (and help us out) as long as it does not harm the project.

That said, i think the main reason Crashdisk posted about your actions was copyrights related. As many other projects, we only catalog and distribute our datfiles / knowledge, distributing the files might lift some legal issues, in this case for archive.org. Many of the images (most?) are probably safe since they represent really old software for which the publishers don't care anymore. Nonetheless, some companies are known for their less permissive rules in this aspect (for instance Nintendo). That's why some sites dedicated do distribute files often don't allow anything from these names. We opt to keep away from these actions, don't allow links and state clearly that our intention is to catalog, not distribute in part to avoid any troubles for the project.

That's all, feel free to post around if you need something or contact me by mail if you prefer and thanks again for your interest

Vaxalon · « **Reply #2 on:** March 10, 2013, 11:46:11 PM »

"reduced professional archivists to tears"....

Somehow that makes all that work on the ZX Spectrum seem worth while.

Jason Scott · « **Reply #3 on:** March 11, 2013, 02:47:39 AM »

While I have the attention of the group, may I turn your attention to this site, if attention has not already been given to it:

http://bitsavers.trailing-edge.com/bits/

A large group of worldwide archivers have been rescuing software off paper tape, punch cards, 8" disks, and floppies for the greater good of computer history. They've been at it for 15 years or so. Many of them are pulling one of a kind items out from user groups and other sources. I do not know if this source is being mined, but they are aggressively pulling this material in by the day.

They also do documentation and I've written scripts to mirror the material into the archive.org collection here:

http://archive.org/details/bitsavers

If this is not being handled, I can assure you that it's some wonderful material indeed.

Additionally, myself and many others are uploading thousands of CD-ROM images into this collection:

http://archive.org/details/cdbbsarchive

These are shareware, cover cds, and in some cases vendor-supplied discs. (They'll be sorted more distinctly in the future.) I suspect some are already in TOSEC-ISO but many are probably not.

Zandro · « **Reply #4 on:** March 11, 2013, 03:33:54 AM »

I'll confirm that at least one contributor is already doing initial inspection and retrieval of content from bitsavers, some of which could be processed further some day, if any volunteers are willing to look through it. Actually, I already have 14GB of it in personal stash, and from glancing I agree that it is indeed promising territory for TOSEC. Some of it stands well on its own as an information resource, which can make analysis easier.

As for your consistently jaw-dropping collection of shareware CDs, I'll be satisfied with its existence only when I can find a copy of my early grandma gift 100 SMASH Win95 Games in it. ...I am finding it on eBay, will be in touch.

Aral · « **Reply #5 on:** March 11, 2013, 10:49:19 AM »

Jason it is fantastic that you have found the TOSEC project and even better that you are mirroring it on archive.org. It makes the years of hard work dumping, collecting and cataloging of software worth the effort. One of the reasons i started the TOSEC-PIX project was reading your story on http://digitize.textfiles.com/story/ many many years ago and thinking to myself, FUCK!! i have loads of this shit lying around that i should preserve before i throw it all out. So nearly every day for the last 8 years i have scanned, leeched, borrowed and cataloged manuals, magazines and books for nearly every system around. The best part is that i have another 15 years of stuff here to scan before i'm on top of it so there will always be stuff to add.

Welcome aboard Jason and if you ever want anything from the PIX project just ask

Jason Scott · « **Reply #6 on:** March 18, 2013, 06:59:31 AM »

An update.

The main page for the project is starting to really shape up. It's here: https://archive.org/details/tosec

As you can see, there's now direct platform links (so as I add newer datasets, they can replace the links on the front page), as well as links to the DAT files, and even the current version of the naming convention.

As I'm working to put these together, I'm including machine descriptions and photos, which is why they're not all up on that main list, yet - I want to ensure they're in good shape (the pages) before linking them. But I hope to get them all across the next week or two.

Then I'll rope back to finding stuff not in this list. And then I'm going to start aiming people at you who have collections of materials worth going under this classification system.

PandMonium · « **Reply #7 on:** March 18, 2013, 11:23:15 AM »

Great work, sent you a reply by mail before seeing your post here.

Maddog · « **Reply #8 on:** March 18, 2013, 10:04:10 PM »

Looks great, thanks for all your hard work!

Cassiel · « **Reply #9 on:** March 19, 2013, 05:04:47 PM »

Apologies for being a little late to the party, have been very busy the last few weeks!

This looks excellent, very impressive work. Though I have to say I do agree this seems a rather bold move from a copyright perspective, it does seem you've thought it all through.

I look forward to having a poke around the TOSEC sections of archive.org further. As has already been said from others, anything that raises the profile of the project in a positive way can only be lauded.

I'll get a news post put up on the front page in a day or two as well. If you need anything or have any questions then please free to give me a shout as well (or email me).

Well done.

Jason Scott · « **Reply #10 on:** March 22, 2013, 05:59:40 PM »

When you folks have time.....

I'm continuing to refine the front page of the TOSEC collection on archive.org to be a good link into the most up-to-date collections. That'll allow for easy upgrading over time as the more recent versions become available.

In the meantime, I have been reaching out to various vintage computing groups to have see this collection (it's the first time for many of them) and the reactions (to me, anyway) have been positive. They want to help go through collections to provide to your namers - can I get what the best way to match vintage folks and TOSEC folks is?

PandMonium · « **Reply #11 on:** March 25, 2013, 01:05:46 PM »

Hey Jason,

Any help is welcomed, from providing information and important images that we may be missing in our current dats, to even renaming and being responsible for some datfiles. Obviously the last part is the most complex and time consuming so it really depends on their availability and like (it must be fun or people will quit fast).

I may have misunderstood your post but, trying to simplify things:
1) If they have collections of rare files (images), they can pick our dats for those systems and use the rom manager tools to check if we are missing some. If so, PM some of us to exchange the missing files. (bonus for completely missing systems in our collections). NOTE: Use backups!

2) If they have the real media, may check if we miss that (or have only bad versions of it) and try to dump it. This part is more complex and system dependent. Maybe others can help more than me.

3) Other option is to check our dats / collections and examine each entry (in emulators, other tools, online) and point us out errors (for instance, wrong titles, dates) or missing information (such as a setfile that is missing a flag saying it is a cracked version). In other words, improve our datfiles and for many systems there are tons of errors. Feedback can be given around here, creating a specific topic for that in an easy way. It's open to anyone, if you find a bug there just report us the dat, the setfile (and hash), plus the error.

4) As you may have noticed, we are a small team. So, all the above options may take some time to be applied, depending on the available time from renamers and the affected system (will be faster to an active system than in some where nobody is working). If there are people interested, anyone can become a renamer and take care of one of the dats, improving it and adding new stuff. Obviously this takes a bit more time and knowledge of our naming convention, so people should try the other options first.

Hope this helps, you may get some better advices from others here, with real experience in renaming the collections or examining deeply the images for missing info (e.g. the amiga/c64 folks)

Cassiel · « **Reply #12 on:** April 14, 2013, 06:25:10 PM »

@Jason

You might be interested in this: http://www.tosecdev.org/index.php/news/releases/61-tosec-release-2014-04-13

Cassiel · « **Reply #13 on:** April 15, 2013, 12:28:51 PM »

Guys, check this out:
http://www.tosecdev.org/index.php/forum/index.php?topic=500.0

VERY VERY cool.....

Jason Scott · « **Reply #14 on:** April 15, 2013, 01:25:30 PM »

Work has continued on the TOSEC collection on the archive. As you might infer, I have my hands on a lot of simultaneous projects right now, but this one is rather personally important to me and work will continue on it.

Most notably, the front page on the site (http://archive.org/details/tosec) now has links to all the collections that are on the site. There are two major items in the to-do list to handle.

As you will see, it's split into two sections. The top section is static, meant to be how each entry is. The second set are items that are missing system photos or descriptions or deep links into the file archives. These are being done by hand and will take a little while. But they're being done - it's just there's so many of them!

As I get them into shape, they'll go up into the "main" section. I can do a few every day, and hopefully soon it'll be done.

Then as I acquire new 2013-04-13 sets, I will replace the entry on the main page with the new links to the new copies of the collections. Note we do NOT delete the old sets - the older sets will be available on the archive permanently. If that sounds a little crazy that we'll have multiple versions of the same 30gb or 10gb collection, well, welcome to the Internet Archive, we have 10 petabytes of information.

So that's what's coming up.

In the future, when you're ready for it, I really would like to coordinate matching up these collections of items from the Apple II world that I and others are generating and bring them to your people for classification.

News:

Author Topic: TOSEC on the Internet Archive: Open Letter (Read 23461 times)

Jason Scott

TOSEC on the Internet Archive: Open Letter

PandMonium

Re: TOSEC on the Internet Archive: Open Letter

Vaxalon

Re: TOSEC on the Internet Archive: Open Letter

Jason Scott

Re: TOSEC on the Internet Archive: Open Letter

Zandro

Re: TOSEC on the Internet Archive: Open Letter

Aral

Re: TOSEC on the Internet Archive: Open Letter

Jason Scott

Re: TOSEC on the Internet Archive: Open Letter

PandMonium

Re: TOSEC on the Internet Archive: Open Letter

Maddog

Re: TOSEC on the Internet Archive: Open Letter

Cassiel

Re: TOSEC on the Internet Archive: Open Letter

Jason Scott

Re: TOSEC on the Internet Archive: Open Letter

PandMonium

Re: TOSEC on the Internet Archive: Open Letter

Cassiel

Re: TOSEC on the Internet Archive: Open Letter

Cassiel

Re: TOSEC on the Internet Archive: Open Letter

Jason Scott

Re: TOSEC on the Internet Archive: Open Letter