mdh.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Embedded high-resolution stereo-vision of high frame-rate and low latency through FPGA-acceleration
Mälardalens högskola, Akademin för innovation, design och teknik, Inbyggda system. (Robotik)ORCID-id: 000-0003-4907-9816
2020 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Autonomous agents rely on information from the surrounding environment to act upon. In the array of sensors available, the image sensor is perhaps the most versatile, allowing for detection of colour, size, shape, and depth. For the latter, in a dynamic environment, assuming no a priori knowledge, stereo vision is a commonly adopted technique. How to interpret images, and extract relevant information, is referred to as computer vision. Computer vision, and specifically stereo-vision algorithms, are complex and computationally expensive, already considering a single stereo pair, with results that are, in terms of accuracy, qualitatively difficult to compare. Adding to the challenge is a continuous stream of images, of a high frame rate, and the race of ever increasing image resolutions. In the context of autonomous agents, considerations regarding real-time requirements, embedded/resource limited processing platforms, power consumption, and physical size, further add up to an unarguably challenging problem.

This thesis aims to achieve embedded high-resolution stereo-vision of high frame-rate and low latency, by approaching the problem from two different angles, hardware and algorithmic development, in a symbiotic relationship. The first contributions of the thesis are the GIMME and GIMME2 embedded vision platforms, which offer hardware accelerated processing through FGPAs, specifically targeting stereo vision, contrary to available COTS systems at the time. The second contribution, toward stereo vision algorithms, is twofold. Firstly, the problem of scalability and the associated disparity range is addressed by proposing a segment-based stereo algorithm. In segment space, matching is independent of image scale, and similarly, disparity range is measured in terms of segments, indicating relatively few hypotheses to cover the entire range of the scene. Secondly, more in line with the conventional stereo correspondence for FPGAs, the Census Transform (CT) has been identified as a recurring cost metric. This thesis proposes an optimisation of the CT through a Genetic Algorithm (GA) - the Genetic Algorithm Census Transform (GACT). The GACT shows promising results for benchmark datasets, compared to established CT methods, while being resource efficient.

Abstract [sv]

Autonoma agenter är beroende av information från den omgivande miljön för att agera. I en mängd av tillgängliga sensorer är troligtvis bildsensorn den mest mångsidiga, då den möjliggör särskillnad av färg, storlek, form och djup. För det sistnämnda är, i en dynamisk miljö utan krav på förkunskaper, stereovision en vanligt tillämpad teknik. Tolkning av bildinnehåll och extrahering av relevant information går under benämningen datorseende. Datorseende, och specifikt stereoalgoritmer, är redan för ett enskilt bildpar komplexa och beräkningsmässigt kostsamma, och ger resultat som, i termer av noggrannhet, är kvalitativt svåra att jämföra. Problematiken utökas vidare av kontinuerlig ström av bilder, med allt högre bildfrekvens och upplösning. För autonoma agenter krävs dessutom överväganden vad gäller realtidskrav, inbyggda system/resursbegränsade beräkningsplattformar, strömförbrukning och fysisk storlek, vilket summerar till ett otvetydigt utmanande problem.

Den här avhandlingen syftar till att åstadkomma högupplöst stereovision med hög uppdateringsfrekvens och låg latens på inbyggda system. Genom att närma sig problemet från två olika vinklar, hårdvaru- och algoritmmässigt, kan ett symbiotiskt förhållande däremellan säkerställas.Avhandlingens första bidrag är GIMME och GIMME2 inbyggda visionsplattformar, som erbjuder FPGA-baserad hårdvaruaccelerering, med särskilt fokus på stereoseende, i kontrast till för tidpunkten kommersiellt tillgängliga system.Det andra bidraget, härrörande stereoalgoritmer, är tudelat.Först hanteras skalbarhetproblemet, sammankopplat med disparitetsomfånget, genom att föreslå en segmentbaserad stereoalgoritm.I segmentrymden är matchningen oberoende av bildupplösningen, samt att disparitetsomfånget definieras i termer av segment, vilket antyder att relativt få hypoteser behövs för att omfatta hela scenen.I det andra bidraget på algoritmnivå, mer i linje med konventionella stereoalgoritmer för FPGAer, har Censustransformen (CT) identifierats som ett återkommande kostnadsmått för likhet. Här föreslås en optimering av CT genom att tillämpa genetisk algoritm (GA) - Genetisk Algoritm Census Transform (GACT). GACT visar lovande resultat för referensdataset jämfört med etablerade CT-metoder, men är samtidigt resurseffektiv.

sted, utgiver, år, opplag, sider
Västerås: Mälardalen University , 2020.
Serie
Mälardalen University Press Dissertations, ISSN 1651-4238 ; 304
Emneord [en]
Computer vision, stereo vision, FPGA, embedded systems
HSV kategori
Forskningsprogram
datavetenskap
Identifikatorer
URN: urn:nbn:se:mdh:diva-46240ISBN: 978-91-7485-453-4 (tryckt)OAI: oai:DiVA.org:mdh-46240DiVA, id: diva2:1375116
Disputas
2020-01-28, Kappa, Mälardalens högskola, Västerås, 09:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2019-12-04 Laget: 2019-12-04 Sist oppdatert: 2020-01-10bibliografisk kontrollert
Delarbeid
1. GIMME - A General Image Multiview Manipulation Engine
Åpne denne publikasjonen i ny fane eller vindu >>GIMME - A General Image Multiview Manipulation Engine
Vise andre…
2011 (engelsk)Inngår i: Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig 2011), Los Alamitos, Calif: IEEE Computer Society, 2011Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper presents GIMME (General Image Multiview Manipulation Engine), a highly flexible reconfigurable stand-alone mobile two-camera vision platform with stereo-vision capability. GIMME relies on reconfigurable hardware (FPGA) to perform application-specific low to medium-level image-processing at video rate. The Qseven-extension enables additional processing power. Thanks to its compact design, low power consumption and standardized interfaces (power and communication), GIMME is an ideal vision platform for autonomous and mobile robot applications.

sted, utgiver, år, opplag, sider
Los Alamitos, Calif: IEEE Computer Society, 2011
Identifikatorer
urn:nbn:se:mdh:diva-13576 (URN)10.1109/ReConFig.2011.44 (DOI)2-s2.0-84856884110 (Scopus ID)978-076954551-6 (ISBN)
Konferanse
2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011;Cancun, Quintana Roo;30 November 2011through2 December 2011
Tilgjengelig fra: 2011-12-15 Laget: 2011-12-15 Sist oppdatert: 2019-12-04bibliografisk kontrollert
2. Towards an Embedded Real-Time High Resolution Vision System
Åpne denne publikasjonen i ny fane eller vindu >>Towards an Embedded Real-Time High Resolution Vision System
2014 (engelsk)Inngår i: ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT II / [ed] Bebis, G Boyle, R Parvin, B Koracin, D McMahan, R Jerald, J Zhang, H Drucker, SM Kambhamettu, C ElChoubassi, M Deng, Z Carlson, M, SPRINGER-VERLAG BERLIN , 2014, s. 541-550Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper proposes an approach to image processing for high performance vision systems. Focus is on achieving a scalable method for real-time disparity estimation which can support high resolution images and large disparity ranges. The presented implementation is a non-local matching approach building on the innate qualities of the processing platform which, through utilization of a heterogeneous system, combines low-complexity approaches into performing a high-complexity task. The complementary platform composition allows for the FPGA to reduce the amount of data to the CPU while at the same time promoting the available informational content, thus both reducing the workload as well as raising the level of abstraction. Together with the low resource utilization, this allows for the approach to be designed to support advanced functionality in order to qualify as part of unified image processing in an embedded system.

sted, utgiver, år, opplag, sider
SPRINGER-VERLAG BERLIN, 2014
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8888
HSV kategori
Identifikatorer
urn:nbn:se:mdh:diva-38383 (URN)000354700300052 ()2-s2.0-84916625525 (Scopus ID)978-3-319-14364-4 (ISBN)
Konferanse
10th International Symposium on Visual Computing (ISVC), DEC 08-10, 2014, Las Vegas, NV
Tilgjengelig fra: 2018-02-12 Laget: 2018-02-12 Sist oppdatert: 2019-12-04bibliografisk kontrollert
3. GIMME2 - An embedded system for stereo vision and processing of megapixel images with FPGA-acceleration
Åpne denne publikasjonen i ny fane eller vindu >>GIMME2 - An embedded system for stereo vision and processing of megapixel images with FPGA-acceleration
Vise andre…
2015 (engelsk)Inngår i: 2015 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2015, 2015Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper presents GIMME2, an embedded stereovision system, designed to be compact, power efficient, cost effective, and high performing in the area of image processing. GIMME2 features two 10 megapixel image sensors and a Xilinx Zynq, which combines FPGA-fabric with a dual-core ARM CPU on a single chip. This enables GIMME2 to process video-rate megapixel image streams at real-time, exploiting the benefits of heterogeneous processing.

Emneord
Cost effectiveness, Field programmable gate arrays (FPGA), Image processing, Pixels, Reconfigurable architectures, Reconfigurable hardware, Stereo vision, Video signal processing, Cost effective, FPGA fabric, Heterogeneous processing, Image streams, Power efficient, Process video, Single chips, Stereo-vision system, Stereo image processing
HSV kategori
Identifikatorer
urn:nbn:se:mdh:diva-31587 (URN)10.1109/ReConFig.2015.7393318 (DOI)000380437700038 ()2-s2.0-84964335178 (Scopus ID)9781467394062 (ISBN)
Konferanse
International Conference on ReConFigurable Computing and FPGAs, ReConFig 2015, 7 December 2015 through 9 December 2015
Tilgjengelig fra: 2016-05-13 Laget: 2016-05-13 Sist oppdatert: 2019-12-04bibliografisk kontrollert
4. Unbounded Sparse Census Transform using Genetic Algorithm
Åpne denne publikasjonen i ny fane eller vindu >>Unbounded Sparse Census Transform using Genetic Algorithm
2019 (engelsk)Inngår i: 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), IEEE , 2019, s. 1616-1625Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The Census Transform (CT) is a well proven method for stereo vision that provides robust matching, with respect to object boundaries, outliers and radiometric distortion, at a low computational cost. Recent CT methods propose patterns for pixel comparison and sparsity, to increase matching accuracy and reduce resource requirements. However, these methods are bounded with respect to symmetry and/or edge length. In this paper, a Genetic algorithm (GA) is applied to find a new and powerful CT method. The proposed method, Genetic Algorithm Census Transform (GACT), is compared with the established CT methods, showing better results for benchmarking datasets. Additional experiments have been performed to study the search space and the correlation between training and evaluation data.

sted, utgiver, år, opplag, sider
IEEE, 2019
Serie
IEEE Winter Conference on Applications of Computer Vision, ISSN 2472-6737
HSV kategori
Identifikatorer
urn:nbn:se:mdh:diva-44332 (URN)10.1109/WACV.2019.00177 (DOI)000469423400170 ()2-s2.0-85063571752 (Scopus ID)978-1-7281-1975-5 (ISBN)
Konferanse
19th IEEE Winter Conference on Applications of Computer Vision (WACV), JAN 07-11, 2019, Waikoloa Village, HI
Tilgjengelig fra: 2019-06-20 Laget: 2019-06-20 Sist oppdatert: 2019-12-18bibliografisk kontrollert
5. The Genetic Algorithm Census Transform
Åpne denne publikasjonen i ny fane eller vindu >>The Genetic Algorithm Census Transform
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
HSV kategori
Identifikatorer
urn:nbn:se:mdh:diva-46244 (URN)
Tilgjengelig fra: 2019-12-04 Laget: 2019-12-04 Sist oppdatert: 2019-12-04bibliografisk kontrollert

Open Access i DiVA

fulltext(17119 kB)53 nedlastinger
Filinformasjon
Fil FULLTEXT03.pdfFilstørrelse 17119 kBChecksum SHA-512
56a150295546cf621c691e48d5898a365c39adae05a07f05cd6e971da1d2d98253b328f5531e99602792cb25b54a7c4daf69eca778172af81ec4c6c89abe2404
Type fulltextMimetype application/pdf

Personposter BETA

Ahlberg, Carl

Søk i DiVA

Av forfatter/redaktør
Ahlberg, Carl
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 59 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 445 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf