Let's say you want to distribute some big files to the whole world.
You can of course, just drop them onto a website. But perhaps you'd like to
use git-annex to manage those files. And as an added bonus, why not let
anyone in the world clone your site and use git-annex get
!
My site like this is downloads.kitenet.net. Here's how I set it up. --Joey
- Set up a web site. I used Apache, and configured it to follow symlinks.
Options FollowSymLinks
- Put some files on the website. Make sure it works.
git init; git annex init
git config core.sharedrepository world
(Makes sure files are always added with permissions that allow everyone to read them.)git config receive.denyCurrentBranch updateInstead
(Makes the working tree update when changes are pushed to it.)- We want users to be able to clone the git repository over http, because
git-annex can download files from it over http as well. For this to
work,
git update-server-info
needs to get run on the server after commits or pushes to it. The gitpost-update
hook will take care of this, you just need to enable the hook on the server.mv .git/hooks/post-update.sample .git/hooks/post-update
git annex add; git commit -m added
- Make sure users can still download files from the site directly.
- Instruct advanced users to clone a http url that ends with the "/.git/"
directory. For example, for downloads.kitenet.net, the clone url
is
https://downloads.kitenet.net/.git/
When users clone over http, and run git-annex, it will automatically learn all about your repository and be able to download files right out of it, also using http.
The above is a simple way to set that up, but it's not necessarily the best way. Both git and git-annex will be accessing the repository using dumb http, which can be innefficient. And it doesn't allow write access.
For something smarter, you may want to also set up git smart http, and the git-annex equivilant, a smart http server.
Hi,
would it be possible to do this whith the contents of a public repository-group (a non-bare public repository)?
You can choose which files get stored in the public repository, and are thus accessible to the public. However, note that since the git repository is published, anyone could clone it and see all the names and hashes of your files, even if you've not pushed the file contents to the public repository.
Currently the way the "public" repository group works only makes it be usable with special remotes. This is because it uses a
preferreddir
setting in the special remote configuration.Is there a low cost web hosting solution that would support a public git-annex repo relatively simply with simple access to download the public files.
I figure I could set up an Amazon EC2 micro instance and mount an s3 share, hosting the git-annex remote, but this is a lot of overhead for something that dropbox does with 1 click "share dropbox link"?
Any suggestions would be great!
For those not wanting to run their own web server, using Amazon S3 with git-annex can work well; it can be configured to let the public download files over http. See public Amazon S3 remote.
I'm trying to set up a read-only HTTP mirror of my repository, such that I can clone and get files but not upload anything. I've set up an HTTP file server pointing to the repository, and I can clone it over git (dumb protocol):
$ git clone http://server.name/annex/.git clone_test Cloning into 'clone_test'...
Fetching objects: 1987364, done. Checking connectivity: 1987364, done. Updating files: 100% (551410/551410), done.
However, git-annex doesn't seem to like it:
``` $ cd clone_test/ $ git annex init init (merging origin/git-annex into git-annex...) (recording state in git...)
(scanning for unlocked files...)
Remote origin not usable by git-annex; setting annex-ignore ok (recording state in git...) ```
And thereafter I can't get any of the files using
git annex get
:$ git annex get file.name get file.name (not available) Maybe add some of these git remotes (git remote add ...): (...) (Note that these git remotes have annex-ignore set: origin) failed
git-annex: get: 1 failed
$ git annex get file.name --from=origin git-annex: cannot determine uuid for origin (perhaps you need to run "git annex sync"?) (remote.origin.annex-ignore is set)
Running
git annex sync
fails for inability to push to origin, which is as expected since it's meant to be read-only.How can I get git-annex to fetch data over this http remote?
git annex init
on the remote which is exposed as http? Andhttp://server.name/annex/.git/config
needs to be accessible too.@tomdhunt a few problems could result in this error message.
The .git/config file might not be readable by the web server, or some similar problem might cause the web server to fail to serve it. Just for example, a web server might be configured to not serve .git directories, since exposing them can sometimes be a mistake. The web server would have to fail with some problem other than a 404 not found.. in that case there would be no error message.
git might somehow be failing to parse the .git/config file once it is downloaded. You would have to be running an older version of git-annex to get this error message in that situation though; recent versions display a different message if the config parse fails.
I think that's all the possibilities. To determine which it is, I suggest you download the .git/config file from the webserver yourself (that would be
http://server.name/annex/.git/config
in your example), and pass the downloaded file togit config --list --file
to get git to parse it, making sure git doesn't exit with an error.``` [tom@trafalgar clone_test]$ wget http://breitenfeld.lan/annex/.git/config --2021-03-22 10:23:32-- http://breitenfeld.lan/annex/.git/config Resolving breitenfeld.lan (breitenfeld.lan)... 192.168.1.241 Connecting to breitenfeld.lan (breitenfeld.lan)|192.168.1.241|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1064 (1.0K) Saving to: ‘config’
config 100%[========================================================================================================================================>] 1.04K --.-KB/s in 0s
2021-03-22 10:23:32 (153 MB/s) - ‘config’ saved [1064/1064]
[tom@trafalgar clone_test]$ file config config: ASCII text [tom@trafalgar clone_test]$ git config --list --file config core.repositoryformatversion=0 core.filemode=true core.bare=false core.logallrefupdates=true annex.uuid=4ad5f0f5-9b92-475b-9abf-c1845a42c758 annex.version=8 annex.dotfiles=true annex.thin=true annex.addunlocked=include=torrent-downloads/* annex.sshcaching=true filter.annex.smudge=git-annex smudge -- %f filter.annex.clean=git-annex smudge --clean -- %f remote.surface.url=ssh://tom@DESKTOP-K1RQ50D/home/tom/bigdata remote.surface.fetch=+refs/heads/:refs/remotes/surface/ remote.surface.annex-uuid=b4b17fe9-4c46-44a9-b6cb-0c2c17904830 remote.zfsrent2.url=gcrypt::rsync://user@redacted.server.name/rent_data/data/git/bigdata remote.zfsrent2.fetch=+refs/heads/:refs/remotes/zfsrent2/ remote.zfsrent2.gcrypt-participants=BDF1CB2C01162329 remote.zfsrent2.gcrypt-signingkey=BDF1CB2C01162329 remote.zfsrent2.gcrypt-publish-participants=true remote.zfsrent2.gcrypt-id=:id:iSB2By8j7YrAY1Zycx+Q remote.gnubee.url=gcrypt::rsync://archive@gnubee/data2/bigdata remote.gnubee.fetch=+refs/heads/:refs/remotes/gnubee/ remote.gnubee.gcrypt-participants=BDF1CB2C01162329 remote.gnubee.gcrypt-signingkey=BDF1CB2C01162329 remote.gnubee.gcrypt-publish-participants=true remote.gnubee.gcrypt-id=:id:wAwSE2kqUs9HLuq1usaP [tom@trafalgar clone_test]$ git remote -v origin http://breitenfeld.lan/annex/.git/ (fetch) origin DISABLE (push) ``` FWIW on the server side the only requests I'm seeing to the .git/config URL are my wget tests, so it seems like git-annex never even gets as far as trying to fetch it.
I got it. If the server is on a local network, you have to set
annex.security.allowed-ip-addresses
to allow access.@joey Is this intentional for this remote type? If it is, a better error message would be helpful.
chmod +x .git/hooks/post-update
is out of date; on modern git, the right command ismv .git/hooks/post-update.sample .git/hooks/post-update