[Csnd] Mailing list archive amalgamation attempt

The mailing list archives and the demise of nabble have been mentioned quite a bit - while there was a plan to incorporate all messages to forum.csound.com, I am not sure how this is going. Hence I have attempted to put together a site which incorporates everything I could find.

It is at http://ml.csound.1bpm.net/
   and is in a basic/testing stage at the moment, so any suggestions/ideas/bug reports etc are welcome.

Messages are presented in threads so the originating message is shown in the overview and then the replies to that are (should be, if the email's message-id and reply-to headers are right) shown in the thread accordingly. Attachments and multipart messages ie html should be preserved OK (eg attachment in http://ml.csound.1bpm.net/thread/4975 )

The search functionality may be a bit patchy at the moment, especially the full text stuff, but I will try and optimise that or revisit at some point. The messages themselves are actually stored in a NNTP server, so you can connect directly with a newsreader to 1bpm.net and view the messages like that too. I have tried to redact full email addresses where possible via the web frontend and just show names.

The sources I used are as follows:

2007-10 to 2014-09
http://gaule.cs.bath.ac.uk/Csound-archive/

2017-10 to current
Personal copies

2005 to current, but patchy
Gmane

There may be some messages missed so if anyone has any I can try and import them. I have been trying to get hold of the raw messages from HEANET which would cover 2015-10 to current, but not managed to yet.

John mentioned about messages on codemist.co.uk from Feb 1997 to Nov 1999 , but I could not find them when I had a look around.

The counts of messages per year on the site are:

       2005 | 1563
       2006 | 2818
       2007 | 1995
       2008 | 5194
       2009 | 5645
       2010 | 6100
       2011 | 7073
       2012 | 7289
       2013 | 7285
       2014 | 3870
       2015 | 3245
       2016 | 5763
       2017 | 4062
       2018 | 3220
       2019 | 3010
       2020 | 2840
       2021 | 1467
       2022 | 142

Csound mailing list
Csound@listserv.heanet.ie
https://listserv.heanet.ie/cgi-bin/wa?A0=CSOUND
Send bugs reports to
        Issues · csound/csound · GitHub
Discussions of bugs and features can be posted here

Great work. Would it make sense to host this in our csound.github.io site? I could give you access or you could do a PR if you prefer.

Prof. Victor Lazzarini
Maynooth University
Ireland

Can you read foremost.co.uk/cs_archive/

That is the 1997 emails

Get TypeApp for Android

Predictive text!!

codemist.co.uk/cs_archive

Get TypeApp for Android

Thanks for this Richard. It’s great to have all of these in one place again. I tried the search feature, and yeah, I got no hits on terms like oscil, metro, etc?

Yes, I think hosting it in the csound.github.io site is a good idea. However at the moment the pages on my site are generated on the fly, parsed from the raw messages, and the overviews (sender/subject etc) are in a postgres database along with an attempt at full text search indexing.

As far as I understand github pages, they would have to be static (is that right?). This could be dealt with by generating them all as static html pages, at a rough estimate that would result in around 500MB+ of html (the raw messages take up 900MB in total - github pages limit appears to be 1GB per site, so that may be pushing it in future?)
I am not sure how search functionality could be incorporated if the pages are all static, but I suppose eventually they would be indexed by search engines.

Thanks, unfortunately however I can see the directory listing but each of the files gives a 403 forbidden error - maybe the file permissions may need to be different, or something tweaking in the htaccess or apache config.

Ah, yes, the body/subject/sender conditions were being combined with and instead of or - should work a bit better now.

However, the full text search (body) is really slow, probably too slow to be usable. I will have a look and see if that can be optimised, but a better option may be to have the pages as static html and hosted on csound.github.io as Victor suggested, and let search engines take care of the indexing.

actually not sure if they need to
be static. Certainly the site is dynamic and pulls data from various sources in the repo, not sure if that can be leveraged for your archive.

Maybe others will know more. Possibly the best thing is to give you access and let you poke around to see what may be achievable.

Prof. Victor Lazzarini
Maynooth University
Ireland

I changed prmissions etc so could you try again? I am not up to speed on web acess stuff so it may still be wrong.

Still doesn't work unfortunately.
Probably a few things to check with the apache config - can try and give some suggestions - or I could give you a login on one of my servers so you could scp them over to me, if that sounds easier?

Yes, I think that would be useful. I've since had a look around at the repo at GitHub - csound/csound.github.io: Csound Project Homepage , which looks to be the right one.

As far as I understand, the site is served as static html, but it is built dynamically as such using Jekyll, from markdown files, and Github pages doesn't support server side scripting (ie python/databases/etc).

Hence I think there are two fundamental options:
- Generate the mailing list pages as static html (or markdown) and include in the repo/site.
- Have something on the github.io site that uses javascript to interact with an api served by my server.

I'd initially be inclined towards the first as it keeps everything in a central place and would be more performant. However concerns include space usage in the repo and the fact there would need to be some update schedule to keep the repo up to date with the mailing list. The drawback with the second is that I would still have ownership of the archive, and if messages were loaded dynamically with js then that would likely negatively affect search engine indexing.
I'll have a look at generating my site as static html and see how much disk space that uses.