Google Summer of Code 2018 Accepted projects

Από Ελεύθερο Λογισμικό / Λογισμικό ανοιχτού κώδικα
Μετάβαση στην πλοήγηση Πήδηση στην αναζήτηση

Adding Greek language on NLP library Spacy.io[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

We live in the era of data. Every minute, 3.8 billion internet users, produce content; more than 120 million emails , 500.000 Facebook comments, 3 million Google searches. If we want to process that amount of data efficiently, we need to process natural language. Open source projects such as spaCy, textblob, or NLTK contribute signifficantly to that direction and thus they need to be reinforced.

This project is about improving the quality of Natural Language Processing of Greek Language. The first step is to integrate Greek Language to spaCy. During that process, innovative approaches will be used. It is of vital importance for the writer and for the mentors of the program to identify which of them are of practical use for spaCy and to share the results in order to support any other open source enthusiast who is interested. In the fortunate scenario of successful integration of Greek Language to spaCy, the greek model will be trained and used for extraction of valuable information such as emotions detection in Greek texts, entity extraction, etc.

This projects aims to achieve the following goals:

1. Integration of Greek language to spaCy.io platform

2. Natural Language Processing of Greek documents in order to extract valuable information such as named entities, sentiment analysis, tags, etc.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-spacy

Student[επεξεργασία | επεξεργασία κώδικα]

Ioannis Daras

Mentors[επεξεργασία | επεξεργασία κώδικα]

Markos Gogoulos, Panos Louridas



Extraction of Responsibilities per unit in public sector organizations from the Government Gazette[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

The objective of this project is to extend existing Government Gazette (GG) text mining code with Named Entity Recognition features that will allow the identification of Government Directorates and Divisions with the responsibilities assigned to them, the types of services they are required to provide according to their legal framework published in http://www.et.gr/ and the extraction of this information with related metadata (decision number, date of the GG issue).

The aim is to link the management units with assigned roles and services per unit (Directorates, Divisions & Sections) and codify this specific information, which is hidden in the GG issue raw text.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-GG-extraction

Student[επεξεργασία | επεξεργασία κώδικα]

Chris Karageorg Kaneen

Mentors[επεξεργασία | επεξεργασία κώδικα]

Iraklis Varlamis, Sarantos Kapidakis, Dionysios Moschopoulos Theodoros Karounos


Epoptes[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

Epoptes (Επόπτης a Greek word for overseer) is an open source computer lab management and monitoring tool. It allows for screen broadcasting and monitoring, remote command execution, message sending, imposing restrictions like screen locking or sound muting the clients and much more! It can be installed in Ubuntu, Debian and openSUSE based labs that may contain any combination of the following: LTSP servers, thin and fat clients, non LTSP servers, standalone workstations, NX or XDMCP clients etc.

Epoptes has been undermaintained for the last couple of years. It's currently powered by Python 2 and GTK 2, while unfortunately a number of bugs have crept in due to major updates in Linux distribution packages (systemd, consolekit, VNC…).

This project aims at reviving Epoptes with Python 3 and GTK 3 support, while also addressing several outstanding issues.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-epoptes

Student[επεξεργασία | επεξεργασία κώδικα]

Alkis Georgopoulos

Mentors[επεξεργασία | επεξεργασία κώδικα]

Fotis Tsiamis, Avgoustos Tsinakos


Government Gazette text mining, cross linking, and codification[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

In the recent years plenty of attention has been gathering around analyzing public sector texts via text mining methods enabled by modern libraries, algorithms and practices and bought to to the forefront by open source projects such as textblob, spaCy, SciPy, Tensorflow and NLTK. These collaborative productive efforts seem to be a shift towards more efficient understanding of natural language by machines which can be used in conjunction with public documents in order to provide a more robust organization and codification in the legal sector. This project aims to extend the existing Government Gazette (GG) text mining code by implementing features in order to organize and cross)-link GG texts with legal texts and detect the signatories via heuristic and machine learning methods. This will enable elimination of bureaucratic processes and huge time savings for jurists who for example seek legal documents in the ISOKRATIS database of legal texts (which is an applicable case study).

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-3gm

Student[επεξεργασία | επεξεργασία κώδικα]

Marios Papachristou

Mentors[επεξεργασία | επεξεργασία κώδικα]

Diomidis Spinellis Alexios Zavras Sarantos Kapidakis Dionysios Moschopoulos


Libreoffice customization and creation of legal Templates for LibreOffice[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

A set of modules and templates for LibreOffice Suite that ease the transition from Microsoft Office as well as ready to use templates that automate the creation of Greek Legal Documents. Those templates aim to encounter time consuming tasks by removing the formatting and layout procedures from employee work-flow. Furthermore, an interface to access all those templates will be developed. All steps will be documented during the process and afterwards for future reference and development.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-librecust

Student[επεξεργασία | επεξεργασία κώδικα]

Christos Arvanitis

Mentors[επεξεργασία | επεξεργασία κώδικα]

Kostas Papadimas Theodoros Karounos Diomidis Spinellis


Software components and IP management[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

Clio is a web based system for maintaining (meta-)information on software components.

Nowadays every piece of software is including and using many other software components, each one coming with their own license.

The goal of this project is to build a simple web system to be able to (manually) input and maintain this information!

This is a brand-new project; some analysis has been done but no code is available yet.

More details in the separate page Clio.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-clio

Student[επεξεργασία | επεξεργασία κώδικα]

Gopalakrishnan.V

Mentors[επεξεργασία | επεξεργασία κώδικα]

Alexios Zavras, Georgia Kapitsaki


WSO2 Identity Server Userstore using Web Services to get claims[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

WSO2 Identity and Access Management Server is open source popular identity and access management server throughout the world, plus WSO2 Identity Server efficiently undertakes the complex task of identity management across enterprise applications, services, and APIs.

This project is based on the WSO2 Identity server version 5.4. Currently, the WSO2 identity server is consisting of SOAP services and in the near future, there will be REST API's which support for all functionalities and which is more effective. In current environment most It supports for different user stores like LDAP, JDBC, and MySQL as primary and secondary user stores.

WSO2 Identity server allows configuring multiple user stores to the system that are used to store users and roles. AS there are 2 types of user stores as a primary user store (mandatory) and secondary user store (optional). And all the user information is peristing on a single user store in this version. From this implementation it will separate as credential userstore and attribute user store. Attribute user store is simply used to store claims details which can be accessed by providing the user credential and secrete.With the having facility of creating a new user store the primary data which are saved to primary user store can be separated to different user stores as one for user details and other one is for user attribute (claims) details which can be accessed by providing user credentials and secrete.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-wso2

Student[επεξεργασία | επεξεργασία κώδικα]

Isuri Anuradha

Mentors[επεξεργασία | επεξεργασία κώδικα]

Panagiotis Kranidiotis Stamelos Ioannis


Python PenTest Library (PyPen)[επεξεργασία | επεξεργασία κώδικα]

A collection of tools supporting penetration testers.

Description[επεξεργασία | επεξεργασία κώδικα]

Development of a Python library for penetration testers. The library will include a set of tools for performing the basic tasks for attacking a remote host. It will include reconnaissance tools such as modules that will be able to collect data for a specific target either through the web or through user input. Moreover, other tools will be developed to create custom dictionaries for username and password attacks. Other attack techniques that will be supported include DoS attack, BruteForce attack as well as Inclusion attack. The library will also include various statistical functions for extracting additional information from a captured host.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-pypen

Student[επεξεργασία | επεξεργασία κώδικα]

Konstantinos Liosis

Mentors[επεξεργασία | επεξεργασία κώδικα]

Antonios Andreatos, Panagiotis Karampelas, Christos Pavlatos


Addition of Greek glyphs in the Open Source Fonts ArimaMadurai[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

This project aims to extend the collection of fonts supporting Greek script in the Google Fonts Catalog. Indeed, today 19 serif fonts, 6 monospace fonts and 10 sans-serif fonts supporting Greek script are available. Moreover, only 2 fonts are explicitly intended for display text.

Arima Madurai is a font created by Natanael Gana and Joana Correia of NDISCOVER — a Portuguese type foundry. It is a multiscripts display font with 8 weights from thin to black and have a strong calligraphic influence. It has a lot of personality so it can be recognisable in headlines or brand names uses. I value the quality of the design and thanks to its low contrasts, it allows a good legibility and rendering on screen.

Regarding the history of Greek script, it is interesting and challenging to design a typeface with a calligraphic feel: in terms of design but also in terms of study. There are remarkable examples of Greek punch cutting from the most talented historical figures. The challenge will be to respect that history while keeping a well anchored contemporary form.

Arima Madurai already supports Tamil, Malayalam and Latin scripts and I would like to add Greek script to the glyphset. The fact that the font already supports multi scripts is a real benefit to the project: Arima Madurai already acts in non latin typographic environment and therefore displays a large set of shapes that can be used to match the Greek glyphs with the other ones.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-arimamadurai

Student[επεξεργασία | επεξεργασία κώδικα]

Rosalie Wagner

Mentors[επεξεργασία | επεξεργασία κώδικα]

Alexios Zavras, Irene Vlachou Εmilios Τheofanous


Addition of Greek glyphs in the Open Source Fonts Cantarell[επεξεργασία | επεξεργασία κώδικα]

Description[επεξεργασία | επεξεργασία κώδικα]

Cantarell is a humanist sans serif typeface optimized for on-screen reading. It was originally developed by Dave Crossland in the MA Typeface Design class of 2009 at the University of Reading using free software. Subsequently, it was licensed under an SIL Open Font License and has been the standard UI typeface for the open-source desktop environment GNOME since version 3.0 in 2010.

The fonts have been redesigned for the release of GNOME 3.28 in March 2018. Post-script outline quality improved significantly, spacing has been reworked and new weights have been added.

The family is currently growing to support additional writing systems. After initially applying with extending another typeface I was invited to change my project and add Monotonic and Polytonic Greek to the three Roman masters of Cantarell during GSoC 2018.

GSOC-2018 repositories[επεξεργασία | επεξεργασία κώδικα]

https://github.com/eellak/gsoc2018-cantarell

Student[επεξεργασία | επεξεργασία κώδικα]

Florian Fecher

Mentors[επεξεργασία | επεξεργασία κώδικα]

Alexios Zavras, Irene Vlachou Εmilios Τheofanous