# This file is a demo of using aiocsv and aiofiles libraries to speed up reading and parsing CSV files. # # Start reading this code from the entrypoint function main() below. # import asyncio import aiofiles from csv import QUOTE_NONNUMERIC from typing import AsyncGenerator from aiocsv import AsyncDictWriter, AsyncDictReader async def read_lines(file: str) -> AsyncGenerator[dict, None]: """ Read lines from CSV file. """ async with aiofiles.open(file, "r") as afp: async for row in AsyncDictReader(afp, delimiter=","): yield row async def parse_lines(generator: AsyncGenerator[dict, None]) -> AsyncGenerator[dict, None]: """ Parse lines from generator. """ async for line in generator: # do some parsing here, like that: line = line yield line async def save_lines(file: str, generator: AsyncGenerator[dict, None]): """ Save lines from generator to CSV file. """ async with aiofiles.open( file, mode="w", encoding="utf-8", newline="", ) as afp: rows = [] writer = None async for item in generator: if writer is None: header = list(item.keys()) writer = AsyncDictWriter( afp, header, quoting=QUOTE_NONNUMERIC, ) await writer.writeheader() # gather rows into a list # keep the list size reasonable according to your memory constraints rows.append(item) if len(rows) % 10000 == 0: await writer.writerows(rows) rows = [] await afp.flush() # write the rest of the rows if any if len(rows) > 0: await writer.writerows(rows) async def main(in_file, out_file): """ Main function that reads lines from in_file, parses them and saves to out_file. """ raw_line_generator = read_lines(in_file) parsed_line_generator = parse_lines(generator=raw_line_generator) await save_lines(file=out_file, generator=parsed_line_generator) in_file = "some_input_file.csv" out_file = "some_output_file.csv" asyncio.run(main(in_file, out_file))
Archives for : general
Foreword about addiction
I’m an addict and I can’t help myself. At least not in an easy way. I’m addicted to reading idiotic, moronic, hateful, homophobic comments posted by postimees.ee readers to Postimees website. FYI – Postimees is one of the oldest and biggest newspapers in Estonia.
So that’s why I was looking for a solution on how to remove comments or at least these deceptive links to comments by every article on postimees.ee website.
It only works in Firefox because that’s my main browser.
Let’s get hands dirty
First, open about:config (type it into address field). Firefox config opens after a warning.
Find the key
toolkit.legacyUserProfileCustomizations.stylesheets
Make sure the value is “true” (double click on the value). Close config.
Now open config of profiles.
Type about:profiles to address field.
Find your profile (not the development one) and there should be “Root Directory”. At the end of this line is button “Open in Finder” (or open I-dont-know-where in Windows). Whatever, just click it.
Your Firefox profile folder opens.
Inside that folder create a new folder named “chrome” (mind the lowercase name, case matters!).
Inside that “chrome” folder create an empty text file called “userContent.css”. Again – mind the naming.
Into that file add following lines:
@-moz-document domain(postimees.ee) { span.list-article__comment { display: none; } }
Save and close the file. Restart Firefox. Go to postimees.ee. Welcome to your new life!
Add the following aliases to your .bash_profile:
alias socks_on="ssh -D 8666 -C -N -f -M -S ~/.socks.socket $USER@<your_office_gateway>; networksetup -setsocksfirewallproxystate Wi-Fi on;" alias socks_off="networksetup -setsocksfirewallproxystate Wi-Fi off; ssh -S ~/.socks.socket -O exit $USER@<your_office_gateway>;"
Later you can start your tunnel with command
socks_on
and stop it with
socks_off
đ
Have you ever tried to return custom HTTP headers from your SailsJS backend REST API to your frontend AngularJS application and wondered why they don’t show up in AngularJS?
I had pretty standard case where I wanted to implement server side pagination for my data sets returned by the API. For that you need to return the total number of records in order to implement pagination properly in the frontend. I decided to return the total number of records in a custom header called “X-TotalRecords”. It is returned together with the response but it didn’t show up in AngularJS response:
..... .then(function(response){ $log.debug(response.headers()) //does not show my custom header }) .....
After some googling around I found a solution. You need to create a custom SailsJS policy and send a special header “Access-Control-Expose-Headers” there. Let’s call the policy sendCorsHeaders.
Create a file sendCorsHeaders.js in policies/ folder:
module.exports = function (req, res, next) { res.header('Access-Control-Expose-Headers', sails.config.cors.headers); next(); };
As you can see it re-uses headers defined in your cors.js under config/ folder.
From now on you can retrieve your custom header in AngularJS $http service.
I just struggled with a complex problem of uploading application/bdoc (digital signature container) files to a SailsJS app and I want to share my story. I hope it will make the life easier for those who are working with digidoc and Signwise.
We at Prototypely are creating a solution that heavily uses digital signatures. Signwise is the preferred partner for handling containers and signing process. Signwise process states that they create the container and their system makes a HTTP PUT request to target system to put the newly created container back.
Standard file uploads are handled very nicely in SailsJS by great Skipper library.
However when it comes to uploading quite rare mime types like application/bdoc or application/x-bdoc then it needs some tweaking.
Open config/http.js and add custom body parser there and you’ll be able to accept BDOC files:
bodyParser: function (options) { return function (req, res, next) { if (req.get('content-type') != 'application/bdoc') { return next(); } var bodyParser = require('body-parser').raw({type: 'application/bdoc'}); return bodyParser(req, res, next); } }
After that you’ll be able to save the file in your controller. Mind the req.body – this is the buffer that will be written down.
acceptBdocFile: function(req, res){ var fileId = req.param('fileId'); var tmpFile = process.cwd() + '/.tmp/' + fileId; fs.writeFileSync(tmpFile, req.body); return res.status(201).json(); }
Sometimes Magento gets stuck in “Maintenance mode”. It means that there is maintenance.flag file in Magento’s root folder.
The standard maintenance mode of Magento is a bit “too universal” – it sets Magento backend (admin) to maintenance mode also. Once you’re in maintenance mode, it’s hard to get out of this if you don’t have access server’s shell.
Anyway – there is one option if you have not removed Magento Connect Manager (a.k.a /downloader). This program is be impacted by the maintenance.flag file. Log in to Connect Manager at /downloader and check/uncheck checkbox ““.
That’s it.
Timezones are … difficult. I can say that based on my >20 years programming experience. They pop up here and there and cause a good amount of headache. I won’t spend too much time here for timezones but I just give a quick tip how to set your SailsJS (or any NodeJS) app to use UTC (GMT) timezone by default.
During the years I’ve learn that it’s best to have everything in UTC in the business and DB layers as a rule of thumb (there are exceptions, of course).
It’s really simple to make your NodeJS app to have UTC as default timezone. Just export an environment variable before you run your app:
export TZ="UTC" forever --watchDirectory ./ -l logs/log.log --watch app.js
When you’re packing Magento extensions in Magento admin and want to ignore (or include) a file or directory then there’s a special syntax for it. Let’s say you want to exclude folder “tests” from the tgz package. Number signs (#) are used as wildcard placeholders. Add following line to “Ignore” field:
#tests/#
Your tests folder will be excluded from the tgz package.
Foreword
We all know what garbage and littering is. We know it, we feel it, we see it, we – people with good kinderstube – Â despise it and fight it. We clean it up.
I feel the same way about digital garbage. Do you – a fellow e-citizen with good manners in digital environment – feel it, too?
I feel there is too much digital garbage. Itâs everywhere, it suffocates me. I feel really bad if I see people creating more and more useless, excessive data every day.
Painful experience
I used to work at a big corporation for many years. As in every big company this company hosts some morons, too, who make themselves useful by creating Powerpoint slideshows of tens and tens megabytes and then spread these files by e-mail to present often outdated and useless information to colleagues or customers. Nobody raises an eyebrow because itâs normal there. Iâm absolutely sure that this big corporation is not an exception. Itâs the rule. It happens everywhere, in almost every company.
It does not happen in my company. Otherwise Iâm pretty open minded and tolerant but I do not tolerate digital littering at MageFlow. Itâs a clearly stated policy and itâs repeated over and over again.
For me it comes down to three things: skills, ethics and energy.
Please continue reading if you care about cleaner e-nvironment.
Skills
Most people donât have the skills to behave correctly in digital environment. Theyâre like young calves in the spring. Nobody has told them how to handle data properly without creating another and another and another useless copy of it.
Iâve brought this example before but I guess itâs good enough to repeat it here. Please do a little math for me now and tell me how many copies and megabytes of one file with size of 1 megabytes there will be if you send this file to 2 of your friends by e-mail? 1? 2? 3? 5?
Correct answer is at least 4 assuming your friends donât save it to their harddisk and not including all the possible e-mail servers that may or may not keep an additional copy for whatever reason.
Howâs that possible then? Hereâs how:
1 – original file on your computerâs hard disk
2 – a new copy attached to the e-mail in your sent mail folder
3 – a new copy in your friend #1âs inbox
4 – a new copy in your friend #2âs inbox
Is that enough copies for you? For me itâs 3 too many. The files are stored somewhere, the files will be stored somewhere. Forever – I tend to think nowadays. Thereâs also question about file versions, integrity, consistency. I mean – can you tell me now which version of those 4 files is THE correct one, the master version? Can you? I canât!
Ethics
Actually I think itâs unethical, unfair to litter other peopleâs digital space the same way itâs unethical and even criminal to litter other peopleâs physical property. Itâs not right to make other people to buy more and more storage because you cannot send links instead files or you cannot use streaming instead of downloading.
Itâs like littering someoneâs backyard. You donât do that IRL. Why should you do it in the Internet? However, itâs not that simple always.
Sometimes I send images of my kid to my mom as attachments because I know she would call me otherwise and ask if she should open that (whateverish*box or *drive) link in that e-mail or is it a virus or ⊠Moooommmm!!! OehhhâŠ
Eventually itâs about skills. Itâs about education. Itâs about experience.
We – the responsible and aware e-citizens – should teach the less knowledgeable. Be it our parents, our brothers and sisters – we need to teach them behave in the modern digital environment – the e-nvironment. There are doâs and donâts exactly like in the real world. This is our responsibility to spread the world and behave as role models. It takes a lot of patience, though.
Energy
There are also energetical issues. Maybe those more at home at physics or information technology know the answer already but I donât. Feel free to comment if Iâm wrong here.
Anyway – we spend energy, a lot of energy on storing data, the bits and bytes, on different types of storage. What happens to that energy once we delete a file? Is it being freed? Where does it go? How to catch it, how to reuse it? I mean – almost metaphysically – what happens to that information that was just there – ⥠– and itâs gone. Where did it go? What did it become now?
ÂŻ\(Âș_o)/ÂŻ
Solution
The wrong way to handle data is to create more copies of it that are possibly false and outdated. The right way to handle data is to maintain an original and enable others to access it. Thank to all gods – Odin and fellows included – that there are tons of sharing solutions nowadays. It hasnât been the case always. And itâs not the case in the corporate networks because these big corporations are still shitting their pants when they hear words like âcloudâ, âsharingâ, âopennessâ and so on. They have their reasons but it doesnât change the fact.
The right way to act in movement towards a better world with less digital garbage is to lead by example. Act as a role model. Refuse to send a file by e-mail if someone asks you to do so. Politely explain your reasons and offer an alternative – sharing. Secure sharing if necessary.
Become an ambassador of clean e-nvironment and establish a policy of handling data at your workplace. Start small but start smart. Spread the word and explain the reasons. Be patient.
Final word
Huge amount data is downloaded from the Internet every day for entertainment or other reasons. Be it movies as torrents (legal or illegal – doesnât change the fact or amount of data) or MP3-s or e-books. Donât be part of that madness! Avoid unnecessary copies. Use streaming and sharing instead.
Can you imagine that everyone who consumes electricity from their wall outlets is forced to store that energy somewhere at their home? I can! This is exactly what downloading reminds me. Lots and lots of energy downloaded and wasted instead of just letting it flow thru and just catch your part from the flow.
Imagine a wind turbine working in and because of the flow of air versus a very big bag that is kept against the wind until itâs full and taken somewhere indoors where the wrong turbine is located. There the bag is pushed empty against that turbine to make it work and again and again and again ⊠Sounds stupid, right?
Here’s the situation.
MageFlow is developing Magento Extension called MageFlow Connector. It’s an integration extension that is used to connect a Magento Instance to MageFlow. We are using Agile development processes and state-of-the-art tools by Atlassian like JIRA, Confluence, BitBucket, Bamboo and others. Bamboo is used as build and deployment server that automates a lot of development operations incl building and packaging our software.
Packaging Magento extensions – standard method
Usually Magento extensions are packaged in Magento backend. There’s a form that needs to be filled in with information about specific extension like version number, release notes, files and folders that should be packaged etc. After that you need to upload your extension to Magento Connect and publish it.
It’s pretty good tool but it requires manual work in order to create a package. However in our case we just despise any manual work:) Everything should be automated to max. That’s also why we even started creating MageFlow. Anyway …
Packaging Magento extensions with Bamboo CI
Packaging an extension with Bamboo requires some manual labor too. That cannot be avoided in my opinion. However …
Here’s my wild guess … With a really tight integration between JIRA and Bamboo and some tricks/scripts full automation could be achieved on the way from clicking “Create release” in JIRA to a fully packaged extension as an artifact of a Bamboo build. But right now it’s bit too much work for too little to gain.
But – keeping my head cold and not aiming 100% automation yet I created a following way to package an extension with Bamboo CI.
Prerequisites
GIT. For extension development we also use modman – a very useful tool for every Magento developer provided by Colin Mollenhour. It is also assumed that you have development environment ready in your local machine or elsewhere where you can prepare the package metainfo.
You also need to have a (preferred – vanilla) Magento (DevMagento) codebase in a GIT repository and you need to have YourExtension in a GIT repository. Make sure your DevMagento works fine and you can log in to the backend (admin).
Let’s name these GIT repositories like that for example:
DevMagento –Â git@bitbucket.com:myaccount/magento.git
MyExtension:Â git@bitbucket.com:myaccount/my_extension.git
Prepare Magento extension and modman
First you need to add MyExtension to DevMagento by using modman. In terminal do:
cd /srv/vhosts/magento modman init modman clone git@bitbucket.com:myaccount/my_extension.git
Second you need to create MyExtension XML metadata for Magento Connect package. This is done with the Magento Connect tool in Magento backend. Go to System->Magento Connect->Package extensions. Fill in all required fields there. Add version, release notes, files and folders, authors, dependencies etc. Click on “Save” button.
No back to Terminal:
cd /srv/vhosts/magento ls public/var/connect
You see 2 very important files there now:
MyExtension.xml and package.xml
A mindnote here that package.xml will be overwritten if you package another extension in the backend of the same Magento while being in the same GIT branch.
Now create var/connect folder in your modmanned extension folder and move the files there:
cd /srv/vhosts/magento/.modman/my_extension mkdir -p var/connect mv ../../public/var/connect/MyExtension.xml ../../public/var/connect/package.xml var/connect/
Add these files to GIT and to your modman file.
Run modman update to re-create these files under Magento var/connect folder as symlinks:
cd /srv/vhosts/magento modman update my_extension
Run a packaging test from command line to see if it works:
cd /srv/vhosts/magento/public ./mage package /srv/vhosts/magento/public/var/connect/package.xml Done building package
It’s needless to say that you need to “git add”, “git commit” and “git push” your stuff etc. Now we’re almost ready to make Bamboo CI to create MyExtension-1.0.0.tgz. Also – see paragraph below about Magento bug after you finish next paragraph about configuring Bamboo.
Configure Bamboo CI
We are using standalone Bamboo CI in our own server. I’m not sure if everything looks the same in the ondemand solution…
Running unit tests on your Magento extension is not covered by this post. However I’m planning another post about this topic, too.
Install modman on your build server and add it as Bamboo executable if you haven’t done so yet.
In Bamboo create a new build plan called My Extension with plan key MYX. (Bamboo upper case plan keys and thus build paths with uppercase characters is where the Magento bug will strike in with its forced lowercase paths).
Add 2 repositories to your Build plan:
- DevMagento repository
- MyExtension repository
Make your build plan be triggered by periodically scanning the MyExtension repository for changes.
Configure build steps
Step 1: checkout source code from DevMagento repository. You may want to check out clean repo each time. It’s quite quick and you don’t build that extension too often. The reason is that otherwise next step will complain.
Step 2: run modman init. I.e choose modman executable and add “init” as command argument.
Step 3: run modman clone. Choose modman executable and add “clone git@bitbucket.org:myaccount/my_extension.git” as command argument
Step 4: Add a script as build step with ${bamboo.build.working.directory} as an argument to that command. The script is provided in this gist:
This script does 2 things:
- it runs mage script that creates Magento package
- it moves created tgz package from modmanned extension var/connect folder to Magento’s var/connect folder. The point is that Magento’s Package.php that creates the package uses realpath() PHP function. realpath() resolves symlinks created by modman and that’s why the .tgz package is created under .modman/my_extension/var/connect, not under public/var/connect.
After configuring build steps configure your build artifacts. Specify public/var/connect/ as artifact location and *.tgz as Copy pattern. You can make the artifacts shared so that a Bamboo Deployment plan can use the artifacts and deploy them to – let’s say – your test server;) But that’s another whole story …
Thanks!
Bug in Magento packaging scripts
NB! There’s a bug/misdesign in Magento packaging scripts. The solution is provided as a gist:
You may need to fix it in 2 places: under lib and under download. That’s why I recommended to use a vanilla Magento repository because you need to fix core library files. Create a special Magento repository, fix the bug there and use this Magento for packaging your extensions.
Alternatives
Alan Storm has created a good piece of software that enables creation of a Magento extension package from command line. However I found it too late and I’m not sure about 100% match with my problem.
Summary
It’s a pretty long post that covers quite complex issues. Feel free to leave a comment here or write me directly if you have any questions or thoughts about it.
Update: please see Part 2 of the same topic.