Project:Gentoostats
Gentoostats | |
---|---|
Description | Gentoostats project maintains and develops the "gentoostats" statistics collection software for Gentoo machines |
Project email | gentoostats@gentoo.org |
Lead(s) |
Last elected: 2017-01-02 |
Member(s) | |
Subproject(s) (and inherited member(s)) |
(none) |
Parent Project | Gentoo |
Project listing |
Gentoostats project tasks itself with the deployment, maintenance, and continued development of gentoostats, a software that collects various statistics from Gentoo machines.
About Gentoostats
Gentoostats is written by Vikraman Choudhury as a Google Summer Code 2011 project. It is written in Python and implements a client-server model. The server component is a WSGI web application, built using the webpy framework. The client component uses the Portage API to collect various statistics from a Gentoo machine, encodes it in the JSON format and submits it to the server. Users have the ability to configure which information is to be transmitted according to their privacy needs.
Suggested features
Upload package build time statistics (veremit)
Instead of reporting it in absolute time, look into using a relative measure like SBU (see:SBU).
Distributed collection servers
Make multiple servers exchange and sync stats with each other. This is similar to the pgp keyservers and the goal is to distribute the load and combat DDOS. The major problem is the collision of host UUIDs across multiple independent servers. Second problem is the trust between servers for which solutions exist. Since the database will grow significantly large, some form of delta-sync will be necessary.
Open problems
Validity of the submitted samples
There's no mechanism to stop a malicious user from flooding the server by synthetically creating a large set of statistics reports and submitting them. There's no way to prove that the submitted statistics come from an actual installation. This can be utilized in the form of denial of service or skewing the statistics. Some form of rate limiting and snapshotting may be useful.
TODO
- Make an initial release of gentoostats server and add it to the tree
- Work with infra about gentoostats deployment
- Update Gentoostats, add a section for deploying private instances, improve the usage text
Discussion regarding which version to deploy
Gentoostats 2011 | Gentoostats 2012 | |
---|---|---|
Pros |
|
|
Cons |
|
|
Gentoostats 2011
Progress reports:
- Progress Report #1: https://archives.gentoo.org/gentoo-soc/message/2f9044ad5390b53a338fc9bca4bebda5
- Progress Report #2: https://archives.gentoo.org/gentoo-soc/message/b345988ca5df929abb4f0f5b9aceb00c
- Progress Report #3: https://archives.gentoo.org/gentoo-soc/message/845d373851b7b06f4ab7ce3662b15b4b
- Progress Report #4: https://archives.gentoo.org/gentoo-soc/message/76a0eb1b38e9101ca44d5da7723dcf60
- Progress Report #5: https://archives.gentoo.org/gentoo-soc/message/0179caaa96f8df9f4619a38d630c8cdb
- Midterm Report: https://archives.gentoo.org/gentoo-soc/message/a982111423d18fb7a714526bf9052708
- Progress Report #6: https://archives.gentoo.org/gentoo-soc/message/635ee0e2c9e3d599be5e9c05cd905f9c
- Progress Report #7: https://archives.gentoo.org/gentoo-soc/message/606094a198354a2938b8b8a10f7b0cb5
- Final Report: https://archives.gentoo.org/gentoo-soc/message/c90536fdd571898e6a15c6c7d9fa0c75
Gentoostats 2012
Apparently, there's another gentoostats project based on django written as part of GSoC 2012:
- Server: https://github.com/gg7/gentoostats_server
- Client: https://github.com/gg7/gentoostats
- Playground (??): https://github.com/vikraman/gentoostats-playground
- Deployment bug: https://bugs.gentoo.org/show_bug.cgi?id=425056
Progress reports:
- Progress Report #1: https://archives.gentoo.org/gentoo-soc/message/a85db0776186d6e4fa032377af2c8634
- Progress Report #2: https://archives.gentoo.org/gentoo-soc/message/b0be0d2f6a5c43457ef6cebd3f8e9b7b
- Progress Report #3: https://archives.gentoo.org/gentoo-soc/message/1b45015692cecc31211f93de4bb701d0
- Progress Report #4: https://archives.gentoo.org/gentoo-soc/message/1e1a675494bca49352097a0b25dd58f9
- Progress Report #5: https://archives.gentoo.org/gentoo-soc/message/a8a0f843bd2b755f834b3f9eacdbf97b
- Progress Report #6: https://archives.gentoo.org/gentoo-soc/message/e8a9ef1386d0bf29d922a86ee5332ea8
- Progress Report #7: https://archives.gentoo.org/gentoo-soc/message/8e9fcbd3ab67cdc7c66c9aab87eea62f
- Final Report: https://archives.gentoo.org/gentoo-soc/message/760cbd58a309b56f31d3697d90f44601
Find out why the code isn't being hosted on infra. Evaluate the functionality. Determine which version is to be deployed and maintained.
Attempting to deploy Gentoostats 2012
This is an ongoing effort to deploy this version of gentoostats on my local machine:
Package list:
- dev-python/django-1.8.9
- dev-python/django-extensions-1.6.1
- dev-python/django-debug-toolbar-1.3.2
- dev-python/django-tastypie-0.9.15
Steps:
- Clone the repo, copy gentoostats/settings.py.example to gentoostats/settings.py and edit accordingly
- "manage.py check" dies with "ImportError: No module named south"
- south is hard masked
- comment out south from INSTALLED_APPS in settings.py
- "manage.py" check dies with the following:
File "/usr/lib64/python2.7/site-packages/tastypie/resources.py", line 2256, in ModelResource @transaction.commit_on_success() AttributeError: 'module' object has no attribute 'commit_on_success'
As a hackaroo, edit tastypie and replace "@transaction.commit_on_success" with "@transaction.atomic", see: https://github.com/macropin/django-registration/issues/51#issuecomment-100579391 Do the same in gentoostats/receivers/views.py.
- Initialize the database with "manage.py syncdb"
/usr/lib64/python2.7/site-packages/django/core/management/commands/syncdb.py:24: RemovedInDjango19Warning: The syncdb command will be removed in Django 1.9
- Run the server with "manage.py runserver"
January 03, 2017 - 01:01:37 Django version 1.8.9, using settings 'gentoostats.settings' Starting development server at http://127.0.0.1:8000/
Dies with "ImportError: No module named transaction"
- Comment out 'django.middleware.transaction.TransactionMiddleware' from MIDDLEWARE_CLASSES in settings.py, see: http://stackoverflow.com/a/33102743
- Try to upload stats, dies with:
INFO 2017-01-03 01:50:31,928 views 27822 140719396648704 process_submission(): Error: Invalid date in LASTSYNC. Traceback (most recent call last): File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 88, in process_submission time.strptime(lastsync, "%a, %d %b %Y %H:%M:%S +0000") File "/usr/lib64/python2.7/_strptime.py", line 478, in _strptime_time return _strptime(data_string, format)[0] File "/usr/lib64/python2.7/_strptime.py", line 332, in _strptime (data_string, format)) ValueError: time data u'Unknown' does not match format '%a, %d %b %Y %H:%M:%S +0000'
This is due to using a git repo in /usr/portage, which doesn't contain the timestamp file and the client is sending the string "Unknown" after patching with similarly to https://gitweb.gentoo.org/proj/gentoostats.git/commit/?id=963afe1163125b8cbed08c0e8edea9a05a37510e. Patch it with
- if lastsync: + if lastsync and lastsync != "Unknown":
and add the following else statement to it:
else: lastsync = None
- Try to upload stats again, dies with:
ERROR 2017-01-03 02:02:24,104 views 810 140115441485568 process_submission(): 'NoneType' object has no attribute '__getitem__' Traceback (most recent call last): File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 369, in accept_submission return process_submission(request) File "/usr/lib64/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/django/utils/decorators.py", line 145, in inner return func(*args, **kwargs) File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 159, in process_submission country = GeoIP().country_name(ip_addr), File "/usr/lib64/python2.7/site-packages/django/contrib/gis/geoip/base.py", line 190, in country_name return self.city(query)['country_name'] TypeError: 'NoneType' object has no attribute '__getitem__'
Comment out the call to "GeoIP().country_name(ip_addr)" in gentoostats/receiver/view.py for now.
- Try to upload stats again, dies with:
ERROR 2017-01-03 02:26:40,646 views 13386 140066905986816 process_submission(): Cannot assign "u": "Submission.sync" must be a "SyncServer" instance. Traceback (most recent call last): File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 369, in accept_submission return process_submission(request) File "/usr/lib64/python2.7/site-packages/django/views/decorators/csrf.py", line 58, in wrapped_view return view_func(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/django/utils/decorators.py", line 145, in inner return func(*args, **kwargs) File "/tmp/gentoostats_server/gentoostats/receiver/views.py", line 186, in process_submission sync = sync, File "/usr/lib64/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/django/db/models/query.py", line 346, in create obj = self.model(**kwargs) File "/usr/lib64/python2.7/site-packages/django/db/models/base.py", line 468, in __init__ setattr(self, field.name, rel_obj) File "/usr/lib64/python2.7/site-packages/django/db/models/fields/related.py", line 642, in __set__ self.field.rel.to._meta.object_name, ValueError: Cannot assign "u": "Submission.sync" must be a "SyncServer" instance.
This is due to gentoostats expecting the SYNC variable in make.conf. From gentoostats/stats/models.py:
# make.conf example: SYNC="rsync://rsync.gentoo.org/gentoo-portage" sync = models.ForeignKey(SyncServer, blank=True, null=True, related_name='+')
Set sync to None for now:
@@ -145,6 +147,7 @@ def process_submission(request): validate_item(lang) sync = data.get('SYNC') + sync = None if sync: sync, _ = SyncServer.objects.get_or_create(url=sync) validate_item(sync)
- Try to upload stats again, viola!