Discussion:
[fonc] x86_64...
Michael Haupt
2008-12-04 17:04:41 UTC
Permalink
Hi,

today I migrated to a new machine at work, a nice and fast dual-core
64 bit box. Building COLA works fine as long as only idc is concerned;
building jolt blows up (obviously):

-----
cp: cannot stat 'CodeGenerator-x86_64.st': No such file or directory
-----

I found an e-mail from Martin McClure saying that he was working on
code generation support for x86_64 (in August). Has there been some
progress?

Is there *any* way for me to make it work (32 bit chroots aside)?
Please don't say I have to implement the code generation myself, I
don't quite feel competent. ;-)

Best,

Michael
--
Dr.-Ing. Michael Haupt ***@hpi.uni-potsdam.de
Software Architecture Group Phone: ++49 (0) 331-5509-542
Hasso Plattner Institute for Fax: ++49 (0) 331-5509-229
Software Systems Engineering http://www.hpi.uni-potsdam.de/swa/
Prof.-Dr.-Helmert-Str. 2-3, D-14482 Potsdam, Germany

Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam
Amtsgericht Potsdam, HRB 12184
Geschäftsführung: Prof. Dr. Christoph Meinel
SainTiss
2008-12-04 17:32:26 UTC
Permalink
Hi Michael,

I know this is of no help, but on-topic at least:

I actually tried to run jolt in a 32-bit chroot once, and even that didn't
work. It compiled all right, but running it would crash immediately. I assume
that this is because of hard-coded assumptions on 32-bit addressing. (e.g.
there are often +4 occurences in jolt code, which would need to become +8
maybe?)

Cheers,

Hans
Post by Michael Haupt
Hi,
today I migrated to a new machine at work, a nice and fast dual-core
64 bit box. Building COLA works fine as long as only idc is concerned;
-----
cp: cannot stat 'CodeGenerator-x86_64.st': No such file or directory
-----
I found an e-mail from Martin McClure saying that he was working on
code generation support for x86_64 (in August). Has there been some
progress?
Is there *any* way for me to make it work (32 bit chroots aside)?
Please don't say I have to implement the code generation myself, I
don't quite feel competent. ;-)
Best,
Michael
--
If we cannot live so as to be happy, let us at least live so as to deserve it
-- Immanuel Hermann Fichte

A liberal is a person whose interests aren't at stake at the moment
-- Willis Player

Ark Linux - Linux for the Masses (http://arklinux.org)

Dr. Hans Schippers
Teaching Assistant
Formal Techniques in Software Engineering (FoTS)
University of Antwerp
Middelheimlaan 1
2020 Antwerpen - Belgium
Phone: +32 3 265 37 88
Fax: +32 3 265 37 77
Martin McClure
2008-12-04 18:20:40 UTC
Permalink
Hi,
today I migrated to a new machine at work, a nice and fast dual-core 64
bit box. Building COLA works fine as long as only idc is concerned;
-----
cp: cannot stat 'CodeGenerator-x86_64.st': No such file or directory
-----
I found an e-mail from Martin McClure saying that he was working on code
generation support for x86_64 (in August). Has there been some progress?
Is there *any* way for me to make it work (32 bit chroots aside)? Please
don't say I have to implement the code generation myself, I don't quite
feel competent. ;-)
Hi Michael,

So far, I've made very small progress on the x86-64 code generator
(limited mostly by lack of time to work on it). I'm still interested in
moving forward with this.

Since all of my machines are x86_64, when not working on the code
generator I've been working in 32-bit chroots, which seems to work well
in general. The exception is jolt. It ran into problems with execution
permission not being set on the memory into which it was placing the
generated code. I got jolt2 to work using the patch I posted in the "Fix
for CPUs with execution protection" thread on November 3. There are also
other suggested fixes in that thread.

Regards,

-Martin
Michael Haupt
2008-12-05 07:31:02 UTC
Permalink
Hi Martin,
Post by Martin McClure
Since all of my machines are x86_64, when not working on the code
generator I've been working in 32-bit chroots, which seems to work
well in general. The exception is jolt. It ran into problems with
execution permission not being set on the memory into which it was
placing the generated code. I got jolt2 to work using the patch I
posted in the "Fix for CPUs with execution protection" thread on
November 3. There are also other suggested fixes in that thread.
thanks, I will look at that thread. I've been browsing the fonc mails,
but did not come across this one, so extra thanks for the pointer. :-)

Best,

Michael
--
Dr.-Ing. Michael Haupt ***@hpi.uni-potsdam.de
Software Architecture Group Phone: ++49 (0) 331-5509-542
Hasso Plattner Institute for Fax: ++49 (0) 331-5509-229
Software Systems Engineering http://www.hpi.uni-potsdam.de/swa/
Prof.-Dr.-Helmert-Str. 2-3, D-14482 Potsdam, Germany

Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam
Amtsgericht Potsdam, HRB 12184
Geschäftsführung: Prof. Dr. Christoph Meinel
John Leuner
2008-12-04 18:44:00 UTC
Permalink
Post by Michael Haupt
Hi,
today I migrated to a new machine at work, a nice and fast dual-core
64 bit box. Building COLA works fine as long as only idc is concerned;
-----
cp: cannot stat 'CodeGenerator-x86_64.st': No such file or directory
-----
I found an e-mail from Martin McClure saying that he was working on
code generation support for x86_64 (in August). Has there been some
progress?
Is there *any* way for me to make it work (32 bit chroots aside)?
Please don't say I have to implement the code generation myself, I
don't quite feel competent. ;-)
I do all my development on a 64-bit machine, but I generate 32-bit ELF
object files and 32-bit executables. They run just fine and gdb handles
them fine too.

So if you compiled idc and jolt with flags to create 32-bit binaries it
might work ...

John
BGB
2008-12-04 20:26:29 UTC
Permalink
----- Original Message -----
From: "John Leuner" <***@subvert-the-dominant-paradigm.net>
To: "Fundamentals of New Computing" <***@vpri.org>
Sent: Friday, December 05, 2008 4:44 AM
Subject: Re: [fonc] x86_64...
Post by John Leuner
Post by Michael Haupt
Hi,
today I migrated to a new machine at work, a nice and fast dual-core
64 bit box. Building COLA works fine as long as only idc is concerned;
-----
cp: cannot stat 'CodeGenerator-x86_64.st': No such file or directory
-----
I found an e-mail from Martin McClure saying that he was working on
code generation support for x86_64 (in August). Has there been some
progress?
Is there *any* way for me to make it work (32 bit chroots aside)?
Please don't say I have to implement the code generation myself, I
don't quite feel competent. ;-)
I do all my development on a 64-bit machine, but I generate 32-bit ELF
object files and 32-bit executables. They run just fine and gdb handles
them fine too.
So if you compiled idc and jolt with flags to create 32-bit binaries it
might work ...
John
all of this had prompted me to look at the code for this project...


it is interesting, there is Coke which is a Scheme-like language and which
is apparently directly interfacing with C land.
my guess is that the interface is largely untyped?... (I infer this both
from the way this is done, and the presence of occasional explicit type
annotations).

the apparent addition of some elements of Smalltalk style syntax is also
interesting (granted, I personally have a far easier time understanding
Scheme than ST though...).

I have not looked, but it is presumably the case that people can compile ST
to Coke?...


apparently both this, and my effort, had independently discovered the idea
of having 2 different this/self values (in my case, this was due to the
issue of mixing delegation with class/instance, where I have delegation
methods which accept the self which recieved the original method call, and
normal virtual methods get the self from the object containing the method).
however beyond this I suspect the object systems differ notably (my system
is based mostly on the use of a class/instance system and interfaces).


some of this has implications for my effort:

hmm, I could forsake my recent idea of implementing a PostScript based
mini-language and implement a Scheme based one instead (potentially
borrowing a few ideas from Coke and similar, but adapting them more for use
in my case).

the projects themselves somewhat differ in that mine is written primarily in
C...


for my uses (and also to further my commitment to getting the JVM component
more written), I will probably target the thing to my modified/extended
JVM/JBC (likely, I will target the VM directly, rather than generating class
files and running them through the classloader, since this would save both
time and effort).

most of this VM is written, it just lacks some components (such as exception
handling, ...), and thus far I have made no attempt to make the classpath
work on it.


this will, however, potentially lead to a kind of hack used before in my
assembler/linker process, namely where to export object files it actually
translates from the representation in terms of the internal structures.


this would also probably ease retargeting my existing (incomplete)
JavaScript frontend, since I internally use a Scheme-like representation for
the upper end of the compiler.

so, pros (in my case):
Scheme is a much more enjoyable language to use than PostScript (... ever
tried hand-coding all that much PS code?...);
I already have the needed parser, printer, and core types (part of my core
typesystem lib, where I often use S-Exps as an internal data representation
in my projects);
this is would be similar to my existing JS frontend, thus reducing work to
retarget;
could be easily adapted to most other dynamic languages.

cons:
the compilation process from S-Exps to bytecode is more involved than that
from PS to bytecode;
complicates the path of eventually targetting C to this VM (requires either
changing upper-compiler to produce S-Exp output rather than RPN output, or
to produce class files).

other implications:
since PS syntax would not be used for JBC, explicitly targeting JBC would
require the use the JVM-assembler I wrote (uses a variation of Jasmin-style
syntax);
in time it may also make sense to add a JIT backend to this modified JVM;
it may make sense to try to find some good way to escape the JVM's stupid
little rule that all strings be an instance of 'java/lang/String';
..

note that my low-level assembler (x86 and x86-64) uses a variation of
NASM-style syntax (it mainly differs in that it lacks macros and similar,
but does support lumping multiple instructions onto the same line).

note that my primary target OS is Windows rather than Linux (hence, I don't
use dlsym and friends, but implement my own dynamic linking). however, on
Linux libdl is used when loading shared objects.

I don't remember for certain, but I don't think ELF support is complete in
my case (mostly I use COFF).
Post by John Leuner
_______________________________________________
fonc mailing list
http://vpri.org/mailman/listinfo/fonc
Aaron Gray
2008-12-05 16:47:50 UTC
Permalink
Post by BGB
apparently both this, and my effort, had independently discovered the idea
of having 2 different this/self values (in my case, this was due to the
issue of mixing delegation with class/instance, where I have delegation
methods which accept the self which recieved the original method call, and
normal virtual methods get the self from the object containing the
method). however beyond this I suspect the object systems differ notably
(my system is based mostly on the use of a class/instance system and
interfaces).
IDC has what is called "Lieberman prototypes" :-

http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html

Aaron
Aaron Gray
2008-12-05 16:53:43 UTC
Permalink
Post by BGB
apparently both this, and my effort, had independently discovered the idea
of having 2 different this/self values (in my case, this was due to the
issue of mixing delegation with class/instance, where I have delegation
methods which accept the self which recieved the original method call, and
normal virtual methods get the self from the object containing the
method). however beyond this I suspect the object systems differ notably
(my system is based mostly on the use of a class/instance system and
interfaces).
There is more about IDC's Lieberman prototype implementation here :-

http://piumarta.com/pepsi/prototypes.html

Aaron
BGB
2008-12-05 22:46:07 UTC
Permalink
----- Original Message -----
From: "Aaron Gray" <***@googlemail.com>
To: "Fundamentals of New Computing" <***@vpri.org>
Sent: Saturday, December 06, 2008 2:53 AM
Subject: Re: [fonc] x86_64...
Post by Aaron Gray
Post by BGB
apparently both this, and my effort, had independently discovered the
idea of having 2 different this/self values (in my case, this was due to
the issue of mixing delegation with class/instance, where I have
delegation methods which accept the self which recieved the original
method call, and normal virtual methods get the self from the object
containing the method). however beyond this I suspect the object systems
differ notably (my system is based mostly on the use of a class/instance
system and interfaces).
There is more about IDC's Lieberman prototype implementation here :-
http://piumarta.com/pepsi/prototypes.html
Aaron
yeah...

I have looked over the project a bit more.

it provides a few interesting ideas, and even points out a few things in the
Win32 API I was not aware of (namely, it is possible to introspect the app's
symbol table and similar without having to load and process the app's
binary, ...).


but, yes, it is a very different sort of project it seems.
most is written in SmallTalk, and seems to be a mostly centralized and
integrated project;
it seems that the implementation written in itself is actually meaningful,
in that the C version of the implementation seems to be largely the compiler
output from an ST->C compiler, which I suspect is also located in the
project.

more so, it has some things merged together which are in my projects several
different subsystems:
apparently it deals with machine code generation, low-level code generation
issues, ... all in a single place.

it is also so clear what separates one thing from another (there does not
appear to be any clear separation between subsystems, ...).


now, in my case, I have different libraries:
BGBASM, which provides the assembler and dynamic linker (also disassembler,
code for managing symbol lookup, ...), accepts data in a textual form
similar to NASM;
VRM, which provides low-level codegen (register allocation, largely
abstracts over processor level register and type handling, ...);
RPNIL, accepts RPN based language for describing the code to be compiled
(granted, the existing RPNIL compiler also includes many things since moved
to VRM);
..


I have not ran a line counter, but I suspect this project is some orders of
magnitude smaller than my project (maybe a 10x or more code-size
difference), or, at least, this project is a little smaller than 250 kloc...


I would suspect the are both advantages and disagvantages to having things
integrated or isolated.

integrated, pros:
there is much less code to work with, and so drastic changes can be done
more readily;
the same task can be accomplished with far less code, and via a potentially
faster process.

integrated, cons:
there is much less abstraction;
can become a horrible and unworkable mess in non-OO languages (such as C);
it is not really possible to reuse components in different contexts, since
the context is integrated with the component;
in this case, it is not possible to utilize alternate capabilities or
implications of a component, because as it so happens many of these
capabilities are not implemented (for example, writing the assembler does
not give one a disassembler almost for free, ...);
..

modular, pros:
components can be replaced with others to give new and different
functionality;
components can be used in a wider variety of contexts;
it may be possible to make use of far more elaborate and complex
transformation processes;
it provides an alternative to code duplication and modification;
it keeps one thing isolated from the inner workings of other things;
it is easy to provide a good number of possible "routes" and thus
drastically different behavior and results, as well as making it more
possible to accomodate alternate components with merged functionality;
..

modular, cons:
much more code may be needed;
often, lots of code is duplicated between modules;
general structures from one subsystem may be mirrored in another, even
though there is no direct interaction between these structures;
a task involving numerous modules and stages may run much slower than one
implemented as a single integrated component;
although the internals of accomplishing a task are kept flexible, and the
whole structure becomes flexible, the way in which the general process is
approached becomes fairly rigid;
the APIs become much like impassible walls;
..


as a result, with a modular system one has to be fairly careful with how
they design their APIs and what they expose and where. this is because,
things hidden well behind the wall can be changed as needed, any
functionality which touches or crosses the wall may become "set in stone"...

so, the design of specific APIs, subsystems, and rules of interaction,
becomes almost as much a central part of the project as the code itself.

one may end up larglely writing their own code for the primary reason that
most existing code does not do the right things in the right way (many
pieces of code don't like seeing themselves as a tiny and rigidly defined
piece of machinery embedded inside a much larger system, or they might do
the right thing in the wrong way, or in some rare cases the wrong thing in
the right way).

..



psychology may relate to all this as well, since apparently I am an ESTJ
(yes... people here can revile in horror...), and this may all relate to how
I approach coding...


BTW: I am considering the thoughts and implications of in my case doing
something similar to 'coke'...

the big hairy issue though is that I would be compiling it to the JVM, which
is slightly different from the normal way languages of this sort operate.


as well, considering ideas for representing a wider variety of data types in
S-Exps (at present, S-Exps force a fairly narrow type model). new syntax
will probably be related to serializing classes and instances, inline XML,
..

#X<foo bar="text">baz...<br/>again...</foo>


in the past, I had not done this, but more because I had usually been using
S-Exps as a convinient way of dumping internal data in a readable form (but
not as much for actual/useful data serialization).


as is, I lack any "capable" data serialization format.
S-Exps work, but only represent a narrow range of types (lists, arrays,
atomic values, symbols, keywords, ...).

XML can be used to implement a data serialization format, but is not in
itself such a format, and it is a pain to make it do so (and efficiently,
since I don't currently have support for SAX, and the use of DOM for data
serialization is slow and expensive).

I have not maintained any of my binary serialization formats (most are
narrow and only serialize the particular kind of data I am looking to
serialize at the time).

..


best bet is still almost to try to hack a much larger syntactic type model
onto S-Exps (as noted above).

as is, it can serialize objects from my older prototype system, but I could
lower these objects from their privledged place in the syntax (especially
since for many uses my newer class/instance system is likely to absorb many
of the use cases of this older system, however the systems are sufficiently
different to where each likely has meaningful use cases).

in particular:
{x: 3 y: 4}
may be downgraded to:
#{x: 3 y: 4}


#C and #O may be added and intended for serialized classes and instances.

#C<classname>{ stuff I have yet to decide on... }
#O<classname>{ key: value}

#O"app/Foo"{x: 3 y: 4}

most likely, I could use special serialization/deserialization handling for
classes and instances, given classes and instances are statically typed in
my framework...


note: classes and instances may also meaningfully absorb many uses of ad-hoc
structural types (which posed a sufficient pain for data serialization and
deserialization that in the past I had generally not bothered with any kind
of generalized data serialization...).


may use the "traditional" syntax for inline references:
'#<num>=' and '#<num>#', even though this syntax is horrid IMO.

in either case, it is uncertain how to efficiently encode references apart
from causing a potentially serious cost to serialization performance
(recursively checking from each compound element if it is used elsewhere
could quickly become an O(n^2) cost).

however, it could be possible by using a separate pass which would keep
track of every non-atom in an array, and transferring any duplicated element
to a separate array (making this essentially a linear complexity).

in this case, for the output pass, only the first occurence is serialized
(checking if it has occured in the table), and for all following occurences
it is referenced.

I am likely to require a strictly linear encoding process (AKA: no forward
references), although forward references could be allowed if a 2-pass
parsing scheme were used (references are initially given "placeholders", and
the correct values are substituted in place via a secondary "un-flattening"
pass).

potentially, I could also split out merged forms and serialize them
beforehand (makes more sense for automatically serialized data, and may
improve readability in many cases, avoiding producing a potentially massive
partial nested graph of objects followed by lots of smaller fragmentary
references).

first form:
{z: 5 _parent: #1={x: 3 y: 4}}
{w: 7 _parent: #1#}

if forward refs were possible:
{z: 5 _parent: #1#}
{w: 7 _parent: #1={x: 3 y: 4}}


if splitting is done, and the table is serialized first:
#;#1={x: 3 y: 4}
{z: 5 _parent: #1#}
{w: 7 _parent: #1#}

this would work because my parser still parses expression comments, but then
discards the results.


for now, I will probably not face the issue of method and function
serialization (it would be both ugly and unproductive to dump out masses of
disassembled functions, and far less certain that any such dumps could be
re-assembled into a working form anyways...). so, any methods which have
been compiled to machine code would probably be serialized as symbolic
references (C-side function name).

or such...
Post by Aaron Gray
_______________________________________________
fonc mailing list
http://vpri.org/mailman/listinfo/fonc
Michael Haupt
2008-12-05 07:33:00 UTC
Permalink
Hi John,
Post by John Leuner
So if you compiled idc and jolt with flags to create 32-bit binaries it
might work ...
OK... that's another option. Is there a concise list of all the places
where to insert -m32? Presumably, it's in each Makefile, right?

Best,

Michael
--
Dr.-Ing. Michael Haupt ***@hpi.uni-potsdam.de
Software Architecture Group Phone: ++49 (0) 331-5509-542
Hasso Plattner Institute for Fax: ++49 (0) 331-5509-229
Software Systems Engineering http://www.hpi.uni-potsdam.de/swa/
Prof.-Dr.-Helmert-Str. 2-3, D-14482 Potsdam, Germany

Hasso-Plattner-Institut für Softwaresystemtechnik GmbH, Potsdam
Amtsgericht Potsdam, HRB 12184
Geschäftsführung: Prof. Dr. Christoph Meinel
Continue reading on narkive:
Loading...